OpenSearch Lucene Study Group Meeting - Monday, January 29th, 2024

Sign up to join the meeting at Meetup:

Link to previous meeting’s post: OpenSearch Lucene Study Group Meeting - Monday, January 22nd, 2024

Welcome to the OpenSearch Lucene Study Group!

Apache Lucene is the open-sourced search library that powers OpenSearch and many search applications large and small.

We start the meeting with a Lucene learning topic or Q&A session. In the second half of the meeting, we review recent developments in Apache Lucene and discuss their potential impact to OpenSearch, with a particular focus on new and exciting Lucene features that we can (and should) expose through OpenSearch. Since some changes require a deep dive to fully understand, we sometimes ask participants to volunteer for “homework” to dig deeper into changes and report back for the next meeting.

Standing Agenda:

  • Welcome / introduction (5 minutes)
  • Lucene learning series - someone will either present a Lucene-related talk or we will do Lucene Q&A (20 minutes, recorded)
  • Review assigned issues from last time (10 minutes)
  • Review new Lucene changes and assign homework (20 minutes)

By joining the OpenSearch Lucene Study Group Meeting, you grant OpenSearch, and our affiliates the right to record, film, photograph, and capture your voice and image during the OpenSearch Community Meeting (the “Recordings”). You grant to us an irrevocable, nonexclusive, perpetual, worldwide, royalty-free right and license to use, reproduce, modify, distribute, and translate, for any purpose, all or any part of the Recordings and Your Materials. For example, we may distribute Recordings or snippets of Recordings via our social media outlets.

Proposed topic for discussion: Could we build a Solr compatibility layer on top of OpenSearch?

Abstract: Solr is a search server that interfaces really well with Lucene (which makes sense, since they were a single Apache project for years). The Solr component abstractions are nice and powerful. There are lots of reasons why folks have adopted Solr over the years. On the other hand, OpenSearch is probably a better distributed solution than SolrCloud (though I’m obviously biased). What if people could have both? I think it would be possible (but not easy) to create an OpenSearch plugin that “runs” Solr on top of OpenSearch. Suppose the plugin exposes a special “create index” API that takes a Solr schema and config, translates the schema into an OpenSearch mapping, and creates an OpenSearch index. (The config is harder, but UpdateRequestProcessors could be integrated with ingest pipelines and SearchComponents could be integrated with some combination of search plugins and search pipelines. :wave: wavy hands :wave:) At query time, the plugin could use Solr itself (as a library) to translate Solr requests into Lucene logic that we would execute against OpenSearch index shards.

Here is this week’s list of Lucene changes. The most urgent items are the bugs found in 9.9.1, which led to the release of 9.9.2 today. @reta has already created the pull request to update OpenSearch: Update to Apache Lucene 9.9.2 by reta · Pull Request #12063 · opensearch-project/OpenSearch · GitHub.

Lucene 9.10.0ImprovementsSupport getMaxScore of ConjunctionScorer for non top level scoring clause.
Lucene 9.10.0OptimizationsPointRangeQuery now exits earlier on segments whose values don't intersect with the query range. When a PointRangeQuery is a required clause of a boolean query, this helps save work on other required clauses of the same boolean query.
Lucene 9.10.0OptimizationsReqOptSumScorer will now propagate minimum competitive scores to the optional clause if the required clause doesn't score. In practice, this will help boolean queries that consist of a mix OF FILTER clauses and SHOULD clauses.
Lucene 9.10.0Bug FixesFix a bug in ShapeTestUtil.
Lucene 9.10.0Bug FixesScorerSupplier created by QueryProfilerWeight will propagate topLevelScoringClause to the sub ScorerSupplier.
Lucene 9.9.2Bug FixesFix NPE when sampling for quantization in Lucene99HnswScalarQuantizedVectorsFormat
Lucene 9.9.2Bug FixesRollback the tmp storage of BytesRefHash to -1 after sort

Suggestion from Erik Hatcher: Look into the Solr tagger, since nobody else has that functionality.