OpenSearch Lucene Study Group Meeting - Monday, April 29th, 2024

Sign up to join the meeting at Meetup:

Link to previous meeting’s post: OpenSearch Lucene Study Group Meeting - Monday, April 15th, 2024

Welcome to the OpenSearch Lucene Study Group!

Apache Lucene is the open-sourced search library that powers OpenSearch and many search applications large and small.

We start the meeting with a Lucene learning topic or Q&A session. In the second half of the meeting, we review recent developments in Apache Lucene and discuss their potential impact to OpenSearch, with a particular focus on new and exciting Lucene features that we can (and should) expose through OpenSearch. Since some changes require a deep dive to fully understand, we sometimes ask participants to volunteer for “homework” to dig deeper into changes and report back for the next meeting.

Standing Agenda:

  • Welcome / introduction (5 minutes)
  • Lucene learning series - someone will either present a Lucene-related talk or we will do Lucene Q&A (20 minutes, recorded)
  • Review assigned issues from last time (10 minutes)
  • Review new Lucene changes and assign homework (20 minutes)

By joining the OpenSearch Lucene Study Group Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.

Not a lot of Lucene changes in the past two weeks:

VersionCategoryDescriptionLink
Lucene 10.0.0Changes in Runtime BehaviorIndexWriter treats any java.lang.Error as tragic.https://github.com/apache/lucene/issues/13277
Lucene 10.0.0OtherConvert the FieldEntry, a static nested class, into a record.https://github.com/apache/lucene/issues/13296
Lucene 9.11.0New FeaturesMake HNSW and Flat storage vector formats easier to extend with new FlatVectorScorer interface. Add new Hnsw format for binary quantized vectors.https://github.com/apache/lucene/issues/13288
Lucene 9.11.0OptimizationsReplace handwritten loops compare with Arrays.compareUnsigned in SegmentTermsEnum.https://github.com/apache/lucene/issues/13252

Possible discussion topics for this week:

  1. Continue the discussion from last time around porting OpenSearch’s aggregations to Lucene (integrating with the facets framework). Related issue: [DISCUSS] Identifying Gaps in Lucene’s Faceting · Issue #12553 · apache/lucene · GitHub.
  2. Somewhat related to that, there’s the idea of adding support for OLAP-style dimensional rollups at indexing / merge time: Support for building materialized views using Lucene formats · Issue #13188 · apache/lucene · GitHub.
  3. We had assigned homework items 4 weeks ago: OpenSearch Lucene Study Group Meeting - Monday, April 1st, 2024 - #2 by msfroh. We can follow up on those if folks have learning to share.
  4. I’ve been mulling thoughts about how Lucene data formats would look if they were designed specifically for cloud storage, instead of files on disk. What would change (if anything)?

Also, I’ll probably give a quick recap of things I saw/learned at the Haystack conference last week.