Set up default model id for neural_query_enricher with nested knn field

annakprnv · May 6, 2024, 7:27pm

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 2.13.0

Describe the issue:
I’ve faced problems with configuring default_model_id along with using chunk_processor. My text field transformed to chunks through using Text chunking processor and those chunks are converted into vectors. Now I want to perform search on that nested vector field using neural search, but without required definition of model_id in every search request.

Configuration:

My ml-pipeline
PUT /_ingest/pipeline/ml-pipeline

{
        "processors": [
            {
                "text_chunking": {
                    "algorithm": {
                        "fixed_token_length": {
                           "token_limit": 384,
                            "max_chunk_limit" : -1
                        }
                    },
                    "field_map": {
                        "text": "passage_chunk"
                    }
                }
            },
            {
                "text_embedding": {
                    "field_map": {
                        "passage_chunk": "chunk_passage_embedding"
                    },
                    "model_id": "{model_id}" 
                }
            }
        ]
}

My index settings:
PUT /test-ml-index

{
  "settings": {
    "index.knn": true, 
    "default_pipeline": "ml-pipeline"
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "chunk_passage_embedding": {
                "type": "nested",
                "properties": {
                    "knn": {
                        "type": "knn_vector",
                        "dimension": 384,
                        "method": {
                            "engine": "lucene",
                            "space_type": "l2",
                            "name": "hnsw",
                            "parameters": {}
                        }
                    }
                }
            },
      "text": {
        "type": "text"
      }
    }
  }
}

How I try to define default search model:

PUT /_search/pipeline/default_model_search_pipeline

{
    "request_processors": [
        {
            "neural_query_enricher": {
                "default_model_id": "{{model_id}}"
            }
        }
    ]
}

and then:

PUT /test-ml-index/_settings

{
  "index.search.default_pipeline" : "default_model_search_pipeline"
}

Relevant Logs or Screenshots:
When I perform search request:

{
    "query": {
        "nested": {
            "score_mode": "max",
            "path": "chunk_passage_embedding",
            "query": {
                "neural": {
                    "chunk_passage_embedding.knn": {
                        "query_text": "cat",
                        "k": 100
                    }
                }
            }
        }
    }
}

I get the error:

{
    "error": {
        "root_cause": [
            {
                "type": "null_pointer_exception",
                "reason": "modelId is marked non-null but is null"
            }
        ],
        "type": "null_pointer_exception",
        "reason": "modelId is marked non-null but is null"
    },
    "status": 500
}

yuye-aws · May 8, 2024, 3:23am

Thanks for bringing up this issue! I have tested that:

Search nested field with default model id: Error
Search nested field with specified model id: OK
Search non-nested field with default model id: OK
It may take some time for us to resolve this issue. A quick fix for you is to always specify model id in your search request.

tombombadil · May 22, 2024, 1:45pm

Please note, this is also broken for Hybrid Search. Same error when performing this sort of search.

system · July 21, 2024, 1:46pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Neural search not working with nested vector field mappings OpenSearch releases , discuss , troubleshoot , configure	0	175	September 6, 2024
Default search pipeline support with sparse encoding indices Machine Learning	3	329	December 26, 2023
Ingestion pipeline for a nested field OpenSearch troubleshoot	3	889	September 19, 2024
Hybrid search on nested fields OpenSearch troubleshoot , configure , feature-request	2	89	June 19, 2025
Knn and neural search pipeline OpenSearch troubleshoot	5	730	February 29, 2024

Set up default model id for neural_query_enricher with nested knn field

Related topics