Versions:
OpenSearch Version: 3.2.0
OpenSearch Dashbiards Version: 3.2.0
OpenSearch Security Version: 3.2.0.0
Docker container on Docker Desktop 4.36.0 (175267)
Windows 10 (10.0.19045 N/A Build 19045)
Describe the issue:
I’m trying to combine two processors inside a search pipeline:
- ML inference request processor - rewrites the user’s natural question into a more search-friendly query.
- RAG response processor - uses the search results as context and generates a final answer with an LLM.
The reasoning is:
- When users ask a question, the query text isn’t always ideal for document retrieval (for example, “What were the latest film nominations?” vs. a keyword query like film nominations 2024).
- In a typical RAG setup, you want the agent/LLM to see the original question for generation, but the search engine to see a refined query for better recall.
- So the ML inference processor takes the original question, rewrites it into a compact search query, and replaces the
query_textfield. - The RAG processor should then pick up the retrieved “sources” and generate the final answer based on them.
In principle this would let OpenSearch handle the whole flow: user question → query rewriting → document retrieval → answer generation.
However, when I try to run both processors in the same pipeline, the RAG step fails with runtime errors.
With both processors active, I see this:
{
"error": {
"root_cause": [
{
"type": "class_cast_exception",
"reason": "class org.opensearch.searchpipelines.questionanswering.generative.ext.GenerativeQAParameters cannot be cast to class java.util.Map (org.opensearch.searchpipelines.questionanswering.generative.ext.GenerativeQAParameters is in unnamed module of loader java.net.URLClassLoader @3609b8f2; java.util.Map is in module java.base of loader 'bootstrap')"
}
],
"type": "class_cast_exception",
"reason": "class org.opensearch.searchpipelines.questionanswering.generative.ext.GenerativeQAParameters cannot be cast to class java.util.Map (org.opensearch.searchpipelines.questionanswering.generative.ext.GenerativeQAParameters is in unnamed module of loader java.net.URLClassLoader @3609b8f2; java.util.Map is in module java.base of loader 'bootstrap')"
},
"status": 500
}
With ?verbose_pipeline=true I also get JSON generation errors:
{
"error": {
"root_cause": [
{
"type": "exception",
"reason": "com.fasterxml.jackson.core.JsonGenerationException: Can not start an object, expecting field name (context: Object)"
}
],
"type": "exception",
"reason": "com.fasterxml.jackson.core.JsonGenerationException: Can not start an object, expecting field name (context: Object)",
"caused_by": {
"type": "json_generation_exception",
"reason": "Can not start an object, expecting field name (context: Object)",
"suppressed": [
{
"type": "illegal_state_exception",
"reason": "Failed to close the XContentBuilder",
"caused_by": {
"type": "i_o_exception",
"reason": "Unclosed object or array found"
}
}
]
}
},
"status": 500
}
The combined (RAG + ML inference) search pipeline looks like this:
{
"vector_enhanced_rag_pipeline": {
"request_processors": [
{
"ml_inference": {
"model_id": "crVrY5kBiXJL4rgg9_KU",
"function_name": "remote",
"model_input": """{ "parameters": { "messages": [ { "role": "system", "content": "This is the users question, return me ONLY the keyword-like query string for Google search. YOU RETURN ONLY THE QUERY STRING." }, { "role": "user", "content": "${input_map.user_query}" } ] } }""",
"input_map": [
{
"user_query": "query.neural.text_vector.query_text"
}
],
"output_map": [
{
"query.neural.text_vector.query_text": "inference_results[0].output[0].dataAsMap.choices[0].message.content"
}
],
"full_response_path": true,
"tag": "enhanced_rag_pipeline",
"description": "RAG pipeline with query rewriting",
"ignore_missing": false
}
}
],
"response_processors": [
{
"retrieval_augmented_generation": {
"tag": "enhanced_rag_pipeline",
"description": "RAG pipeline with query rewriting",
"model_id": "crVrY5kBiXJL4rgg9_KU",
"context_field_list": [
"text_string"
],
"system_prompt": "You are a helpful assistant.",
"user_instructions": "Answer concisely in less than 100 words."
}
}
]
}
}
And the query is this:
GET /vector_test_index/_search?search_pipeline=vector_enhanced_rag_pipeline&verbose_pipeline=true
{
"query": {
"neural": {
"text_vector": {
"query_text": "What were the latest film nominations?",
"model_id": "bbVrY5kBiXJL4rgg5fLn",
"min_score": 0.6
}
}
},
"ext": {
"generative_qa_parameters": {
"llm_model": "gpt-5-nano",
"llm_question": "What were the latest film nominations?"
}
},
"_source": "text_string",
"size": 5
}
What I expected
The ML inference processor would intercept the query, rewrite query.neural.text_vector.query_text into a keyword-like search query, then the “sources” search will happend after which the RAG processor would run as usual with better retrieval context.
What happens instead
The ML inference alone works. The RAG processor alone works.
But as soon as the RAG response processor is added to the ML inference request processor, execution fails with casting or JSON serialization errors.
Questions
- Is this combination of request + response processors supported in OpenSearch 3.2.0?
- Is this a limitation of the current RAG implementation?
- Or does it look like a bug where the RAG processor cannot consume the pipeline output after ML inference rewrites the query, (or other cause…)?
Any guidance would help - I want to know if this should work, or if I need to treat it as an unsupported workflow for now.
Please help me understand whether there is a problem with my implementation (or idea) or if it is some kind of bug or unsupported operation.
Configuration:
Here is the “vanilla” RAG pipeline that works fine:
{
"vector_rag_pipeline": {
"response_processors": [
{
"retrieval_augmented_generation": {
"tag": "rag_pipeline",
"description": "Retrieval Augmented Generation pipeline",
"model_id": "P7VTY5kBiXJL4rggY-w1",
"context_field_list": [
"text_string"
],
"system_prompt": "You are a helpful assistant",
"user_instructions": "Generate a concise and informative answer in less than 100 words for the given question"
}
}
]
}
}
Here is the “query_rewrite” pipeline with only the ML inference processor to update the query that also works as expected:
{
"query_rewrite_pipeline": {
"request_processors": [
{
"ml_inference": {
"model_id": "crVrY5kBiXJL4rgg9_KU",
"function_name": "remote",
"model_input": """{ "parameters": { "messages": [ { "role": "system", "content": "This is the users question, return me ONLY the keyword-like query string for Google search. YOU RETURN ONLY THE QUERY STRING." }, { "role": "user", "content": "${input_map.user_query}" } ] } }""",
"input_map": [
{
"user_query": "query.neural.text_vector.query_text"
}
],
"output_map": [
{
"query.neural.text_vector.query_text": "inference_results[0].output[0].dataAsMap.choices[0].message.content"
}
],
"full_response_path": true,
"tag": "query_rewriter",
"description": "Pipeline with ML inference query rewriting",
"ignore_missing": false
}
}
]
}
}
And this is the working index mappings:
{
"vector_test_index": {
"mappings": {
"properties": {
"guid": {
"type": "keyword"
},
"text_string": {
"type": "text"
},
"text_vector": {
"type": "knn_vector",
"dimension": 384,
"method": {
"engine": "faiss",
"space_type": "cosinesimil",
"name": "hnsw",
"parameters": {}
}
}
}
}
}
}