Set up default model id for neural_query_enricher with nested knn field

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 2.13.0

Describe the issue:
I’ve faced problems with configuring default_model_id along with using chunk_processor. My text field transformed to chunks through using Text chunking processor and those chunks are converted into vectors. Now I want to perform search on that nested vector field using neural search, but without required definition of model_id in every search request.

Configuration:

My ml-pipeline
PUT /_ingest/pipeline/ml-pipeline

{
        "processors": [
            {
                "text_chunking": {
                    "algorithm": {
                        "fixed_token_length": {
                           "token_limit": 384,
                            "max_chunk_limit" : -1
                        }
                    },
                    "field_map": {
                        "text": "passage_chunk"
                    }
                }
            },
            {
                "text_embedding": {
                    "field_map": {
                        "passage_chunk": "chunk_passage_embedding"
                    },
                    "model_id": "{model_id}" 
                }
            }
        ]
}

My index settings:
PUT /test-ml-index

{
  "settings": {
    "index.knn": true, 
    "default_pipeline": "ml-pipeline"
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "chunk_passage_embedding": {
                "type": "nested",
                "properties": {
                    "knn": {
                        "type": "knn_vector",
                        "dimension": 384,
                        "method": {
                            "engine": "lucene",
                            "space_type": "l2",
                            "name": "hnsw",
                            "parameters": {}
                        }
                    }
                }
            },
      "text": {
        "type": "text"
      }
    }
  }
}

How I try to define default search model:

PUT /_search/pipeline/default_model_search_pipeline

{
    "request_processors": [
        {
            "neural_query_enricher": {
                "default_model_id": "{{model_id}}"
            }
        }
    ]
}

and then:

PUT /test-ml-index/_settings

{
  "index.search.default_pipeline" : "default_model_search_pipeline"
}

Relevant Logs or Screenshots:
When I perform search request:

{
    "query": {
        "nested": {
            "score_mode": "max",
            "path": "chunk_passage_embedding",
            "query": {
                "neural": {
                    "chunk_passage_embedding.knn": {
                        "query_text": "cat",
                        "k": 100
                    }
                }
            }
        }
    }
}

I get the error:

{
    "error": {
        "root_cause": [
            {
                "type": "null_pointer_exception",
                "reason": "modelId is marked non-null but is null"
            }
        ],
        "type": "null_pointer_exception",
        "reason": "modelId is marked non-null but is null"
    },
    "status": 500
}

Thanks for bringing up this issue! I have tested that:

  1. Search nested field with default model id: Error
  2. Search nested field with specified model id: OK
  3. Search non-nested field with default model id: OK
    It may take some time for us to resolve this issue. A quick fix for you is to always specify model id in your search request.

Please note, this is also broken for Hybrid Search. Same error when performing this sort of search.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.