OpenSearch Lucene Study Group Meeting - Monday, February 5th, 2024

Sign up to join the meeting at Meetup:

Link to previous meeting’s post: OpenSearch Lucene Study Group Meeting - Monday, January 29th, 2024

Welcome to the OpenSearch Lucene Study Group!

Apache Lucene is the open-sourced search library that powers OpenSearch and many search applications large and small.

We start the meeting with a Lucene learning topic or Q&A session. In the second half of the meeting, we review recent developments in Apache Lucene and discuss their potential impact to OpenSearch, with a particular focus on new and exciting Lucene features that we can (and should) expose through OpenSearch. Since some changes require a deep dive to fully understand, we sometimes ask participants to volunteer for “homework” to dig deeper into changes and report back for the next meeting.

Standing Agenda:

  • Welcome / introduction (5 minutes)
  • Lucene learning series - someone will either present a Lucene-related talk or we will do Lucene Q&A (20 minutes, recorded)
  • Review assigned issues from last time (10 minutes)
  • Review new Lucene changes and assign homework (20 minutes)

By joining the OpenSearch Lucene Study Group Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.

This week’s Lucene change log entries:

Lucene 10.0.0API ChangesGITHUB#12831: Allow FSTCompiler to stream to any DataOutput while building, and make compile() only return the FSTMetadata. For on-heap (default) use case, please use FST.fromFSTReader(fstMetadata, fstCompiler.getFSTReader()) to create the FST.,
Lucene 10.0.0OtherPut Thread#sleep() on the list of forbidden APIs.
Lucene 9.10.0ImprovementsMake DEFAULT_STOP_TAGS in KoreanPartOfSpeechStopFilter immutable
Lucene 9.10.0OptimizationsAvoid set.removeAll(list) O(n^2) performance trap in the UpgradeIndexMergePolicy
Lucene 9.10.0OptimizationsOptimize counts on two clause term disjunctions.
Lucene 9.10.0Bug FixesFixed missing IndicNormalization and DecimalDigit filters in TeluguAnalyzer normalization
Lucene 9.10.0OtherGITHUB#13038, GITHUB#13040, GITHUB#13042, GITHUB#13047, GITHUB#13048, GITHUB#13049, GITHUB#13050, GITHUB#13051, GITHUB#13039: Code cleanups and optimizations.,
Lucene 9.10.0OtherMinor AnyQueryNode code cleanup

Regarding, Forbidden Thread.sleep API by shubhamvishu · Pull Request #13001 · apache/lucene · GitHub, we should create an issue in OpenSearch to similarly forbid Thread.sleep. @andrross.