Apache Lucene is the open-sourced search library that powers OpenSearch and many search applications large and small.
We start the meeting with a Lucene learning topic or Q&A session. In the second half of the meeting, we review recent developments in Apache Lucene and discuss their potential impact to OpenSearch, with a particular focus on new and exciting Lucene features that we can (and should) expose through OpenSearch. Since some changes require a deep dive to fully understand, we sometimes ask participants to volunteer for “homework” to dig deeper into changes and report back for the next meeting.
Standing Agenda:
Welcome / introduction (5 minutes)
Lucene learning series - someone will either present a Lucene-related talk or we will do Lucene Q&A (20 minutes, recorded)
Review assigned issues from last time (10 minutes)
Review new Lucene changes and assign homework (20 minutes)
By joining the OpenSearch Lucene Study Group Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.
Add support for the final release of Java foreign memory API in Java 22 (and later). Lucene's MMapDirectory will now mmap Lucene indexes in chunks of 16 GiB (instead of 1 GiB) starting from Java 19. Indexes closed while queries are running can no longer crash the JVM. Support for vectorized implementations of VectorUtil based on jdk.incubator.vector APIs was added for exactly Java 22. Therefore, applications started with command line parameter "java --add-modules jdk.incubator.vector" will automatically use the new vectorized implementations if running on a supported platform (Java 20/21/22 on x86 CPUs with AVX2 or later or ARM NEON CPUs). This is an opt-in feature and requires explicit Java command line flag! When enabled, Lucene logs a notice using java.util.logging. Please test thoroughly and report bugs/slowness to Lucene's mailing list.
Use native byte order varhandles to spare CPU's byte swapping. Tests are running with random byte order to ensure that the order does not affect correctness of code. Native order was enabled for LZ4 compression.
Uwe’s changes: Project panama changes to support NMA APIs. Remote Store in OpenSearch not necessarily impacted by this change. Support for Neural Search in OpenSearch using Vector APIs to be incubated in 2.12.