OpenSearch Lucene Study Group Meeting - Monday, February 12th, 2024

Sign up to join the meeting at Meetup:

Link to previous meeting’s post: OpenSearch Lucene Study Group Meeting - Monday, February 5th, 2024

Welcome to the OpenSearch Lucene Study Group!

Apache Lucene is the open-sourced search library that powers OpenSearch and many search applications large and small.

We start the meeting with a Lucene learning topic or Q&A session. In the second half of the meeting, we review recent developments in Apache Lucene and discuss their potential impact to OpenSearch, with a particular focus on new and exciting Lucene features that we can (and should) expose through OpenSearch. Since some changes require a deep dive to fully understand, we sometimes ask participants to volunteer for “homework” to dig deeper into changes and report back for the next meeting.

Standing Agenda:

  • Welcome / introduction (5 minutes)
  • Lucene learning series - someone will either present a Lucene-related talk or we will do Lucene Q&A (20 minutes, recorded)
  • Review assigned issues from last time (10 minutes)
  • Review new Lucene changes and assign homework (20 minutes)

By joining the OpenSearch Lucene Study Group Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.

This week’s Lucene change log entries:

Lucene 10.0.0ImprovementsLUCENE-10621: Upgrade to OpenNLP 2.3.2.,
Lucene 9.10.0New FeaturesIndex additional data per facet label in the taxonomy.
Lucene 9.10.0New FeaturesAdd support for the final release of Java foreign memory API in Java 22 (and later). Lucene's MMapDirectory will now mmap Lucene indexes in chunks of 16 GiB (instead of 1 GiB) starting from Java 19. Indexes closed while queries are running can no longer crash the JVM. Support for vectorized implementations of VectorUtil based on jdk.incubator.vector APIs was added for exactly Java 22. Therefore, applications started with command line parameter "java --add-modules jdk.incubator.vector" will automatically use the new vectorized implementations if running on a supported platform (Java 20/21/22 on x86 CPUs with AVX2 or later or ARM NEON CPUs). This is an opt-in feature and requires explicit Java command line flag! When enabled, Lucene logs a notice using java.util.logging. Please test thoroughly and report bugs/slowness to Lucene's mailing list.
Lucene 9.10.0ImprovementsUse native byte order varhandles to spare CPU's byte swapping. Tests are running with random byte order to ensure that the order does not affect correctness of code. Native order was enabled for LZ4 compression.
Lucene 9.10.0OptimizationsSpeedup concurrent multi-segment HNWS graph search
Lucene 9.10.0OptimizationsPrevent humongous allocations in ScalarQuantizer when building quantiles.

Meeting notes:

We talked about Stefan Vodita’s taxonomy changes.

Uwe’s changes: Project panama changes to support NMA APIs. Remote Store in OpenSearch not necessarily impacted by this change. Support for Neural Search in OpenSearch using Vector APIs to be incubated in 2.12.

Does KNN plugin in OpenSearch support concurrent segment search? @Navneet @sohami rel: Speedup concurrent multi-segment HNWS graph search 2 by mayya-sharipova · Pull Request #12962 · apache/lucene · GitHub

@msfroh talked about the quantiles change, could be related to histogram bucketing. Need to dive in further.

Adrien Grand proposed a release of Lucene 9.10 later this week.

@msfroh cheap weight.count() proposal continuation. ref:
Should we use a SparseFixedBitSet when deletes are sparse? · Issue #13084 · apache/lucene · GitHub

We talked about – changes related to optimizing lucene’s postings format.

rel: @msfroh’s changes to move synonym map off-heap – Move synonym map off-heap for SynonymGraphFilter by msfroh · Pull Request #13054 · apache/lucene · GitHub. Slower off-heap than on-heap as one would expect but lowers heap usage by 36MB.

Bump release to Java 21 by ChrisHegarty · Pull Request #12753 · apache/lucene · GitHub – Lucene 10 plans to move to Java 21 as the default which will require OpenSearch and all plugins to update as well

1 Like