OpenSearch Lucene Study Group Meeting - Monday, April 15th, 2024

msfroh · April 15, 2024, 3:06pm

Sign up to join the meeting at Meetup:

Link to previous meeting’s post (including video link in the comments): OpenSearch Lucene Study Group Meeting - Monday, April 1st, 2024

Welcome to the OpenSearch Lucene Study Group!

Apache Lucene is the open-sourced search library that powers OpenSearch and many search applications large and small.

We start the meeting with a Lucene learning topic or Q&A session. In the second half of the meeting, we review recent developments in Apache Lucene and discuss their potential impact to OpenSearch, with a particular focus on new and exciting Lucene features that we can (and should) expose through OpenSearch. Since some changes require a deep dive to fully understand, we sometimes ask participants to volunteer for “homework” to dig deeper into changes and report back for the next meeting.

Standing Agenda:

Welcome / introduction (5 minutes)
Lucene learning series - someone will either present a Lucene-related talk or we will do Lucene Q&A (20 minutes, recorded)
Review assigned issues from last time (10 minutes)
Review new Lucene changes and assign homework (20 minutes)

By joining the OpenSearch Lucene Study Group Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.

msfroh · April 15, 2024, 3:10pm

We didn’t have a meeting last week, as I was off for Spring break, so we have two week’s worth of Lucene changes to discuss:

Version	Category	Description	Link
Lucene 10.0.0	API Changes	Convert `BooleanClause` class to record class.	https://github.com/apache/lucene/issues/13261
Lucene 10.0.0	API Changes	Remove Accountable interface on KnnVectorsReader.	https://github.com/apache/lucene/issues/13241
Lucene 10.0.0	API Changes	Removed deprecated constructors from DoubleField, FloatField, IntField, LongField, and LongPoint. Additionally, deprecated methods have been removed from ByteBuffersIndexInput, BooleanQuery and others. Please refer to MIGRATE.md for further details.	https://github.com/apache/lucene/issues/13262
Lucene 10.0.0	Improvements	Simplify bytes comparison as long comparison in NumericComparator.	https://github.com/apache/lucene/issues/13246
Lucene 10.0.0	Changes in Runtime Behavior	GITHUB#13264: IOContext now uses ReadAdvice#RANDOM by default for read operations. An implication is that `MMapDirectory` will use POSIX_MADV_RANDOM on POSIX systems. To fallback to OS default behaviour, pass system property via `-Dorg.apache.lucene.store.defaultReadAdvice=normal`. This may be useful on systems with lots of RAM as this increases read-ahead.	https://github.com/apache/lucene/issues/13244,
Lucene 10.0.0	Changes in Runtime Behavior	Auto I/O throttling is now disabled by default on ConcurrentMergeScheduler.	https://github.com/apache/lucene/issues/13293
Lucene 10.0.0	Changes in Runtime Behavior	ConcurrentMergeScheduler now allows up to 50% of the threads of the host to be used for merging.	https://github.com/apache/lucene/issues/13293
Lucene 9.11.0	New Features	Expand support for new scalar bit levels for HNSW vectors. This includes 4-bit vectors and an option to compress them to gain a 50% reduction in memory usage.	https://github.com/apache/lucene/issues/13197
Lucene 9.11.0	New Features	Add ability for UnifiedHighlighter to highlight a field based on combined matches from multiple fields.	https://github.com/apache/lucene/issues/13268
Lucene 9.11.0	Improvements	Upgrade icu4j to version 74.2.	https://github.com/apache/lucene/issues/13239
Lucene 9.11.0	Improvements	Early terminate graph and exact searches of AbstractKnnVectorQuery to follow timeout set from IndexSearcher#setTimeout(QueryTimeout).	https://github.com/apache/lucene/issues/13202
Lucene 9.11.0	Improvements	Move most of the responsibility from TaxonomyFacets implementations to TaxonomyFacets itself. This reduces code duplication and enables future development.	https://github.com/apache/lucene/issues/12966
Lucene 9.11.0	Optimizations	Made PointRangeQuery faster, for some segment sizes, by reducing the amount of virtual calls to IntersectVisitor::visit(int).	https://github.com/apache/lucene/issues/13149
Lucene 9.11.0	Optimizations	FloatTaxonomyFacets can now collect values into a sparse structure, like IntTaxonomyFacets already could.	https://github.com/apache/lucene/issues/12966
Lucene 9.11.0	Optimizations	Per-field doc values and knn vectors readers now use a HashMap internally instead of a TreeMap.	https://github.com/apache/lucene/issues/13284
Lucene 9.11.0	Bug Fixes	Aggregation facets no longer assume that aggregation values are positive.	https://github.com/apache/lucene/issues/12966

msfroh · April 15, 2024, 4:55pm

Talked about query caching, including possibility of count caching.
Talked a fair bit about OpenSearch aggregations versus Lucene faceting, with reference to [DISCUSS] Identifying Gaps in Lucene’s Faceting · Issue #12553 · apache/lucene · GitHub. As a follow-up @sandesh and others will comment on that issue to discuss ideas about how to share OpenSearch’s aggregations logic with Lucene.
Talked about MADVISE stuff.
Brief mention of early termination on BKD traversal when not scoring, similar in implementation to Break point estimate when threshold exceeded by gf2121 · Pull Request #13199 · apache/lucene · GitHub.

Topic		Replies	Views
OpenSearch Lucene Study Group Meeting - Monday, April 1st, 2024 Community community-meeting	2	231	April 1, 2024
OpenSearch Lucene Study Group Meeting - Monday, March 18th, 2024 Community community-meeting	4	192	March 19, 2024
OpenSearch Lucene Study Group Meeting - Monday, February 5th, 2024 Community community-meeting	2	282	February 5, 2024
OpenSearch Lucene Study Group Meeting - Monday, January 15th, 2024 Community community-meeting	2	230	January 18, 2024
OpenSearch Lucene Study Group Meeting - Monday, March 4th, 2024 Community community-meeting	2	233	March 4, 2024

OpenSearch Lucene Study Group Meeting - Monday, April 15th, 2024

Related topics