I’m building a system where users enter a natural language query, and an LLM converts it into an OpenSearch DSL query. My workflow looks like this:
-
Provide the model with an OpenSearch index schema and 1–2 sample documents.
-
Pass the user’s natural language query.
-
Model generates the corresponding DSL.
-
Validate the DSL in OpenSearch.
-
Store valid query–DSL pairs for future fine-tuning.
I’m specifically looking for open-source models that:
-
Handle structured JSON output well (e.g., match, bool, filter, sort, aggregations).
-
Are easy to fine-tune with domain-specific DSL patterns.
-
Can run locally for testing but also scale in production.
-
Work well with schema + sample doc prompts.
If you’ve successfully fine-tuned any open-source model for OpenSearch DSL generation