OpenSearch Lucene Study Group Meeting - Monday, April 15th, 2024

Sign up to join the meeting at Meetup:

Link to previous meeting’s post (including video link in the comments): OpenSearch Lucene Study Group Meeting - Monday, April 1st, 2024

Welcome to the OpenSearch Lucene Study Group!

Apache Lucene is the open-sourced search library that powers OpenSearch and many search applications large and small.

We start the meeting with a Lucene learning topic or Q&A session. In the second half of the meeting, we review recent developments in Apache Lucene and discuss their potential impact to OpenSearch, with a particular focus on new and exciting Lucene features that we can (and should) expose through OpenSearch. Since some changes require a deep dive to fully understand, we sometimes ask participants to volunteer for “homework” to dig deeper into changes and report back for the next meeting.

Standing Agenda:

  • Welcome / introduction (5 minutes)
  • Lucene learning series - someone will either present a Lucene-related talk or we will do Lucene Q&A (20 minutes, recorded)
  • Review assigned issues from last time (10 minutes)
  • Review new Lucene changes and assign homework (20 minutes)

By joining the OpenSearch Lucene Study Group Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.

We didn’t have a meeting last week, as I was off for Spring break, so we have two week’s worth of Lucene changes to discuss:

Lucene 10.0.0API ChangesConvert `BooleanClause` class to record class.
Lucene 10.0.0API ChangesRemove Accountable interface on KnnVectorsReader.
Lucene 10.0.0API ChangesRemoved deprecated constructors from DoubleField, FloatField, IntField, LongField, and LongPoint. Additionally, deprecated methods have been removed from ByteBuffersIndexInput, BooleanQuery and others. Please refer to for further details.
Lucene 10.0.0ImprovementsSimplify bytes comparison as long comparison in NumericComparator.
Lucene 10.0.0Changes in Runtime BehaviorGITHUB#13264: IOContext now uses ReadAdvice#RANDOM by default for read operations. An implication is that `MMapDirectory` will use POSIX_MADV_RANDOM on POSIX systems. To fallback to OS default behaviour, pass system property via ``. This may be useful on systems with lots of RAM as this increases read-ahead.,
Lucene 10.0.0Changes in Runtime BehaviorAuto I/O throttling is now disabled by default on ConcurrentMergeScheduler.
Lucene 10.0.0Changes in Runtime BehaviorConcurrentMergeScheduler now allows up to 50% of the threads of the host to be used for merging.
Lucene 9.11.0New FeaturesExpand support for new scalar bit levels for HNSW vectors. This includes 4-bit vectors and an option to compress them to gain a 50% reduction in memory usage.
Lucene 9.11.0New FeaturesAdd ability for UnifiedHighlighter to highlight a field based on combined matches from multiple fields.
Lucene 9.11.0ImprovementsUpgrade icu4j to version 74.2.
Lucene 9.11.0ImprovementsEarly terminate graph and exact searches of AbstractKnnVectorQuery to follow timeout set from IndexSearcher#setTimeout(QueryTimeout).
Lucene 9.11.0ImprovementsMove most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development.
Lucene 9.11.0OptimizationsMade PointRangeQuery faster, for some segment sizes, by reducing the amount of virtual calls to IntersectVisitor::visit(int).
Lucene 9.11.0OptimizationsFloatTaxonomyFacets can now collect values into a sparse structure, like IntTaxonomyFacets already could.
Lucene 9.11.0OptimizationsPer-field doc values and knn vectors readers now use a HashMap internally instead of a TreeMap.
Lucene 9.11.0Bug FixesAggregation facets no longer assume that aggregation values are positive.
  1. Talked about query caching, including possibility of count caching.
  2. Talked a fair bit about OpenSearch aggregations versus Lucene faceting, with reference to [DISCUSS] Identifying Gaps in Lucene’s Faceting · Issue #12553 · apache/lucene · GitHub. As a follow-up @sandesh and others will comment on that issue to discuss ideas about how to share OpenSearch’s aggregations logic with Lucene.
  3. Talked about MADVISE stuff.
  4. Brief mention of early termination on BKD traversal when not scoring, similar in implementation to Break point estimate when threshold exceeded by gf2121 · Pull Request #13199 · apache/lucene · GitHub.