OpenSearch Lucene Study Group Meeting - Monday, April 1st, 2024

Sign up to join the meeting at Meetup:

Link to previous meeting’s post (including video link in the comments): OpenSearch Lucene Study Group Meeting - Monday, March 25th, 2024

Welcome to the OpenSearch Lucene Study Group!

Apache Lucene is the open-sourced search library that powers OpenSearch and many search applications large and small.

We start the meeting with a Lucene learning topic or Q&A session. In the second half of the meeting, we review recent developments in Apache Lucene and discuss their potential impact to OpenSearch, with a particular focus on new and exciting Lucene features that we can (and should) expose through OpenSearch. Since some changes require a deep dive to fully understand, we sometimes ask participants to volunteer for “homework” to dig deeper into changes and report back for the next meeting.

Standing Agenda:

  • Welcome / introduction (5 minutes)
  • Lucene learning series - someone will either present a Lucene-related talk or we will do Lucene Q&A (20 minutes, recorded)
  • Review assigned issues from last time (10 minutes)
  • Review new Lucene changes and assign homework (20 minutes)

By joining the OpenSearch Lucene Study Group Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.

There’s a pretty long list of Lucene changes this week.

Combined with the changes from last week that we didn’t review, I think this week will mostly focus on changelog review.

Lucene 10.0.0API ChangesConvert IOContext, MergeInfo, and FlushInfo to record classes.
Lucene 10.0.0API ChangesThe `readOnce`, `load` and `random` flags on `IOContext` have been replaced with a new `ReadAdvice` enum.
Lucene 10.0.0API ChangesReplace `IOContext.READ` with `IOContext.DEFAULT`.
Lucene 10.0.0New FeaturesAdd RomanianNormalizationFilter
Lucene 10.0.0ImprovementsUpgrade snowball to 26db1ab9.
Lucene 10.0.0ImprovementsUpdate Romanian stopwords list to include the modern unicode forms.
Lucene 10.0.0Changes in Backwards Compatibility PolicyRemove the Kp and Lovins snowball algorithms which are not supported or intended for general use.
Lucene 9.11.0New FeaturesAdd support for posix_madvise to MMapDirectory: If running on Linux/macOS and Java 21 or later, MMapDirectory uses IOContext to pass suitable MADV flags to kernel of operating system. This may improve paging logic especially when large segments are merged under memory pressure.
Lucene 9.11.0OptimizationsSpeed up dynamic pruning by breaking point estimation when threshold get exceeded.
Lucene 9.11.0OptimizationsSpeed up writeGroupVInts
Lucene 9.11.0OptimizationsUse singleton for all-zeros DirectMonotonicReader.Meta
Lucene 9.11.0OptimizationsIntroduce singleton for PackedInts.NullReader of size 256
Lucene 9.11.0OptimizationsBinary search the BlockTree terms dictionary entries when all suffixes have the same length in a leaf block, speeding up cases like primary key lookup on an id field when all ids are the same length.
Lucene 9.11.0Bug FixesSubtract deleted file size from the cache size of NRTCachingDirectory.

@rishabhmaurya volunteered to review Support getMaxScore of DisjunctionSumScorer for non top level scoring clause by mrkm4ntr · Pull Request #13066 · apache/lucene · GitHub.

@harshavamsi : Add new parallel merge task executor for parallel actions within a single merge action by benwtrent · Pull Request #13124 · apache/lucene · GitHub

@reta will look into enabling the appropriate JVM arg for Add support for posix_madvise to Java 21 MMapDirectory by uschindler · Pull Request #13196 · apache/lucene · GitHub

@msfroh will look more at Break point estimate when threshold exceeded by gf2121 · Pull Request #13199 · apache/lucene · GitHub and New structure for numeric dynamic pruning by gf2121 · Pull Request #13217 · apache/lucene · GitHub.

Sandesh will look into Grouped Varints to explain them in a future meeting - related to Speed up writeGroupVInts by easyice · Pull Request #13203 · apache/lucene · GitHub

Kiran Reddy will learn how term enums are encoded, related to [Fix] Binary search the entries when all suffixes have the same length in a leaf block. by vsop-479 · Pull Request #11888 · apache/lucene · GitHub

1 Like