OpenSearch Lucene Study Group Meeting - Monday, June 17th, 2024

Sign up to join the meeting at Meetup:

Link to previous meeting’s post: OpenSearch Lucene Study Group Meeting - Friday, May 31st, 2024

I failed to set up a meeting on Friday, June 7th, as I came down with a cold that week and wanted to do my best to recover before traveling to the Berlin Buzzwords conference. I was mostly successful, so I think it was worth it.

Since it’s been two weeks since our last meetup, let’s move back to our regular Monday time slot.

Welcome to the OpenSearch Lucene Study Group!

Apache Lucene is the open-sourced search library that powers OpenSearch and many search applications large and small.

We start the meeting with a Lucene learning topic or Q&A session. In the second half of the meeting, we review recent developments in Apache Lucene and discuss their potential impact to OpenSearch, with a particular focus on new and exciting Lucene features that we can (and should) expose through OpenSearch. Since some changes require a deep dive to fully understand, we sometimes ask participants to volunteer for “homework” to dig deeper into changes and report back for the next meeting.

Standing Agenda:

  • Welcome / introduction (5 minutes)
  • Lucene learning series - someone will either present a Lucene-related talk or we will do Lucene Q&A (20 minutes, recorded)
  • Review assigned issues from last time (10 minutes)
  • Review new Lucene changes and assign homework (20 minutes)

By joining the OpenSearch Lucene Study Group Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.

Here are this week’s Lucene changes for review:

Lucene 10.0.0API ChangesRemoved Scorer#getWeight
Lucene 10.0.0New FeaturesSparse index: optional skip list on top of doc values which is exposed via the DocValuesSkipper abstraction. A new flag is added to that configures whether to create a "skip index" for doc values.
Lucene 10.0.0OtherMerges all immutable attributes in FieldInfos.FieldNumbers into one Hashmap saving memory when writing big indices. Fixes an exotic bug when calling clear where not all attributes were cleared.
Lucene 9.12.0API ChangesMark COSINE VectorSimilarityFunction as deprecated.
Lucene 9.12.0OptimizationsAvoid unnecessary memory allocation in PackedLongValues#Iterator.
Lucene 9.12.0OptimizationsRewrite SortedNumericDocValuesRangeQuery to MatchNoDocsQuery when the upper bound is smaller than the lower bound.
Lucene 9.12.0OptimizationsImplement Weight#count for vector values in the FieldExistsQuery.
Lucene 9.12.0OptimizationsMultiTermQuery returns null ScoreSupplier in cases where no query terms are present in the index segment
Lucene 9.12.0OptimizationsReplace TreeMap and use compiled Patterns in Japanese UserDictionary.
Lucene 9.12.0OptimizationsDon't preserve auxiliary buffer contents in LSBRadixSorter if it grows.
Lucene 9.11.0New FeaturesAdd new option when calculating scalar quantiles. The new option of setting `confidenceInterval` to `0` will now dynamically determine the quantiles through a grid search over multiple quantiles calculated by multiple intervals.
Lucene 9.11.0OptimizationsReplace Map<Character> by CharObjectHashMap and Set<Character> by CharHashSet.

Rewrite newSlowRangeQuery to MatchNoDocsQuery when upper > lower by ioanatia · Pull Request #13425 · apache/lucene · GitHub – this has an issue with the link. This should be the link.

Here is what we discussed this week:

We also talked a little bit about how Lucene 10 changes are increasingly making use of newer Java features, like switch-expressions, records, type inference of local variables, etc. The change seems to have happened once discussion about releasing Lucene 10 started, so people no longer see the main branch as a waypoint to 9.x. On OpenSearch, our main still primarily exists as a stop on the way to 2.x, so we’re not ready to embrace new Java language features yet.

1 Like