Adding search query term to the input for ml_inference processor

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
2.17 AWS OS

Describe the issue:
I’d like to add query search term as input in the ml_inference processor. I followed the example here: ML inference (response) - OpenSearch Documentation

My processor is as follows:

PUT /_search/pipeline/my_pipeline_request_review_llm
{
  "response_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run llm",
        "model_id": "GzeEVZMB-kzw4OgFUNaB",
        "function_name": "REMOTE",
        "input_map": [
          {
            "context": "title",
            "question": "$._request.query.term.title.value"
          }
        ],
        "output_map": [
          {
            "ext.ml_inference.llm_response": "generated_text"
          }
        ],
        "model_config": {
          "prompt": "Based on following context: ${parameters.context.toString()}. Answer the question: ${parameters.question.toString()} ?"
        },
        "ignore_missing": false,
        "ignore_failure": false
      }
    }
  ]
}

Running search:

GET test_tech_news2/_search?search_pipeline=my_pipeline_request_review_llm
{
  "_source": ["title"],
  "query": {
    "term": {
      "title": {
        "value": "Sample Document"
      }
    }
  }
}

I get the following response:

  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "cannot find all required input fields: [title, $._request.query.term.title.value] in hit:{\n  \"_index\" : \"test_tech_news2\",\n  \"_id\" : \"K8PXopMBNsmyby02l4zY\",\n  \"_score\" : 0.2876821,\n  \"_source\" : {\n    \"title\" : \"Sample Document\"\n  }\n}"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "cannot find all required input fields: [title, $._request.query.term.title.value] in hit:{\n  \"_index\" : \"test_tech_news2\",\n  \"_id\" : \"K8PXopMBNsmyby02l4zY\",\n  \"_score\" : 0.2876821,\n  \"_source\" : {\n    \"title\" : \"Sample Document\"\n  }\n}"
  },
  "status": 400
}

For some reason it’s not letting me use $._request.query.term.title.value here. I tried a few workaround but none have worked so far.

Please help

Configuration:

Relevant Logs or Screenshots:

Hi @spork,

I tried to replicate your use case, I found out that it’s your query didn’t return any documents, that’s why it’s not able to get the required field “title” from the processors.

please notice because you’re using a term query, which is case-sensitive and looks for exact matches. By default, OpenSearch applies standard analysis to string fields, which includes lowercasing the text.

You can test it out, when creating an index when capital case,

PUT http://localhost:9200/test_tech_news2/_doc/1
{
    "title": "Sample Document"
}

And you try to use term query with the same query text, you will get 0 hit,

GET http://localhost:9200/test_tech_news2/_search
{
  "_source": ["title"],
  "query": {
    "term": {
      "title": {
        "value": "Sample Document"
      }
    }
  }
}
{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

But if you use lower case, and search with single word, you will get a hit:

GET http://localhost:9200/test_tech_news2/_search

{
  "_source": ["title"],
  "query": {
    "term": {
      "title": {
        "value": "sample"
      }
    }
  }
}
{
    "took": 8,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "test_tech_news2",
                "_id": "1",
                "_score": 0.2876821,
                "_source": {
                    "title": "Sample Document"
                }
            }
        ]
    }
}

Can you please confirm the query you are searching for, there are search hits return, then try to search with search pipeline? Hope this work for you.

1 Like

As @mingshi said, term query is usually used when you want to filter by keyword.

it’s not appropriate to apply search_pipeline for vector search. You can use match or neural query instead.

1 Like

@yeonghyeonKo: “it’s not appropriate to apply search_pipeline for vector search. You can use match or neural query instead.”

I’m confused by this statement -
(1) why isn’t it appropriate to use search_pipeline for vector search?
(2) doesn’t using “neural query” imply that we’d be searching against vector fields?