Apache Lucene is the open-sourced search library that powers OpenSearch and many search applications large and small.
We start the meeting with a Lucene learning topic or Q&A session. In the second half of the meeting, we review recent developments in Apache Lucene and discuss their potential impact to OpenSearch, with a particular focus on new and exciting Lucene features that we can (and should) expose through OpenSearch. Since some changes require a deep dive to fully understand, we sometimes ask participants to volunteer for “homework” to dig deeper into changes and report back for the next meeting.
Standing Agenda:
Welcome / introduction (5 minutes)
Lucene learning series - someone will either present a Lucene-related talk or we will do Lucene Q&A (20 minutes, recorded)
Review assigned issues from last time (10 minutes)
Review new Lucene changes and assign homework (20 minutes)
By joining the OpenSearch Lucene Study Group Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.
Removed deprecated constructors from DoubleField, FloatField, IntField, LongField, and LongPoint. Additionally, deprecated methods have been removed from ByteBuffersIndexInput, BooleanQuery and others. Please refer to MIGRATE.md for further details.
GITHUB#13264: IOContext now uses ReadAdvice#RANDOM by default for read operations. An implication is that `MMapDirectory` will use POSIX_MADV_RANDOM on POSIX systems. To fallback to OS default behaviour, pass system property via `-Dorg.apache.lucene.store.defaultReadAdvice=normal`. This may be useful on systems with lots of RAM as this increases read-ahead.
Expand support for new scalar bit levels for HNSW vectors. This includes 4-bit vectors and an option to compress them to gain a 50% reduction in memory usage.
Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development.