OpenSearch Lucene Study Group Meeting - Monday, January 15th, 2024

msfroh · January 11, 2024, 5:42pm

Sign up to join the meeting at Meetup:

Link to previous meeting’s post: OpenSearch Lucene Study Group Meeting - Monday, January 8th, 2024

Welcome to the OpenSearch Lucene Study Group!

Apache Lucene is the open-sourced search library that powers OpenSearch and many search applications large and small.

We start the meeting with a Lucene learning topic or Q&A session. In the second half of the meeting, we review recent developments in Apache Lucene and discuss their potential impact to OpenSearch, with a particular focus on new and exciting Lucene features that we can (and should) expose through OpenSearch. Since some changes require a deep dive to fully understand, we sometimes ask participants to volunteer for “homework” to dig deeper into changes and report back for the next meeting.

Standing Agenda:

Welcome / introduction (5 minutes)
Lucene learning series - someone will either present a Lucene-related talk or we will do Lucene Q&A (20 minutes, recorded)
Review assigned issues from last time (10 minutes)
Review new Lucene changes and assign homework (20 minutes)

By joining the OpenSearch Lucene Study Group Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.

msfroh · January 15, 2024, 4:31pm

Here are the Lucene changes since last Monday:

Version	Category	Description	Link
Lucene 10.0.0	New Features	For indices newly created as of 10.0.0 onwards, IndexWriter preserves document blocks indexed via IndexWriter#addDocuments or IndexWriter#updateDocuments also when index sorting is configured. Document blocks are maintained alongside their parent documents during sort and merge. IndexWriterConfig now requires a parent field to be specified if index sorting is used together with document blocks.	https://github.com/apache/lucene/issues/12829
Lucene 10.0.0	Changes in Backwards Compatibility Policy	IndexWriter#addDocuments or IndexWriter#updateDocuments now require a parent field name to be specified in IndexWriterConfig is documents blocks are indexed and index time sorting is configured.	https://github.com/apache/lucene/issues/12829
Lucene 9.10.0	Improvements	Use Automaton for SurroundQuery prefix/pattern matching	https://github.com/apache/lucene/issues/12999
Lucene 9.10.0	Optimizations	Avoid reset BlockDocsEnum#freqBuffer when indexHasFreq is false.	https://github.com/apache/lucene/issues/12997
Lucene 9.10.0	Bug Fixes	Fixed the bug that JapaneseReadingFormFilter cannot convert some hiragana to romaji.	https://github.com/apache/lucene/issues/12885

msfroh · January 18, 2024, 11:20pm

While we didn’t have a formal “learning” topic this week, we ended up having a great impromptu chat and code dive, trying to figure out exactly how phrase queries do their position-matching.

Essentially, a phrase query like “quick brown fox” starts like a BooleanQuery for “quick AND brown AND fox”, skipping through doc IDs for each term until it finds a document with all three terms. Then it tries to find the terms in consecutive positions. I had previously guessed that positions were stored as a skip-list, like the doc IDs, but it looks like positions don’t support skipping – just one-by-one iteration. @radu.gheorghe cleared up the confusion by pointing us to the implementation in ExactPhraseMatcher::nextMatch, which does use “skipping” logic by calling the advancePosition method, which is implemented as a while loop.

It was a fun investigation and I think we all learned a bit about how phrase queries work.

Topic		Replies	Views
OpenSearch Lucene Study Group Meeting - Monday, January 22nd, 2024 Community community-meeting	2	201	January 27, 2024
OpenSearch Lucene Study Group Meeting - Monday, February 5th, 2024 Community community-meeting	2	282	February 5, 2024
OpenSearch Lucene Study Group Meeting - Monday, March 18th, 2024 Community community-meeting	4	192	March 19, 2024
OpenSearch Lucene Study Group Meeting - Monday, January 29th, 2024 Community community-meeting	3	323	January 29, 2024
OpenSearch Lucene Study Group Meeting - Monday, April 15th, 2024 Community community-meeting	2	153	April 15, 2024

OpenSearch Lucene Study Group Meeting - Monday, January 15th, 2024

Related topics