Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 3.0.0
Describe the issue :
I need to use hybrid search with inner hits in order to isolate the relevant chunks in a document. I already implemented an index structure that works (each record has a nested field “chunks” that has elements each containing a text field and a knn vector field to keep the chunk’s text and embedding matched). With Opensearch 2.19.1 I was able to use hybrid search to get the record (without the inner hits so I had no information on the chunk that actually matched the search). I also know that the 3.0.0 version includes changes that make retrieving inner hits in hybrid queries possible so I first updated the server without changing the query structure.
Now I have another problem. For some searches, I get the error: “Sub-iterators of ConjunctionDISI are not on the same document!” while doing the exact same thing I was doing before. I found an issue that seems to talk about the error but most of the conversation regards an interval query. Only the last comment mentions hybrid search without further information. I post here the link to the issue:
opened 02:47PM - 09 May 24 UTC
bug
Search
### Describe the bug
When performing an intervals query on a field with a custo… m mapping and analyzer, I get an illegal argument exception with the following reason: `Sub-iterators of ConjunctionDISI are not on the same document!`. I am not sure if the error is due to an issue with our custom mapping or analyzers, or if it caused by a bug somewhere. Any insight is greatly appreciated.
### Related component
Search
### To Reproduce
1. Run OpenSearch in a new Docker container
```sh
docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=password_here" -e 'DISABLE_SECURITY_PLUGIN=true' opensearchproject/opensearch:latest
```
2. Create a new index with the custom mapping and analyzer
```
PUT http://localhost:9200/my_index
{
"mappings": {
"dynamic_templates": [
{
"SearchableFilter": {
"match": "*_SearchableFilter",
"match_mapping_type": "string",
"mapping": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"search_analyzer": "text_general_search",
"type": "text"
}
}
}
]
},
"settings": {
"analysis": {
"analyzer": {
"text_general_search": {
"filter": [
"stop",
"lowercase"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}
```
3. Create a new document
```
POST http://localhost:9200/my_index/_doc
{
"siblings_SearchableFilter": [
"a Sister"
]
}
```
4. Perform the following intervals query
```
POST http://localhost:9200/my_index/_search
{
"query": {
"intervals": {
"siblings_SearchableFilter": {
"all_of": {
"intervals": [
{
"any_of": {
"intervals": [
{
"match": {
"query": "a"
}
},
{
"match": {
"query": "b"
}
}
]
}
},
{
"match": {
"query": "sister"
}
}
],
"max_gaps": 30,
"ordered": true
}
}
}
}
}
```
5. Notice you get a 400 response code with the error mentioned above.
### Expected behavior
I would expect the intervals query to succeed and return the document created in the reproduction steps.
### Additional Details
- The OpenSearch version is the latest since we are running the Docker image `opensearchproject/opensearch:latest`.
- In my local testing, I ran the image in Docker on WSL running on Windows 11. However, this error also occurs on machines running Ubuntu 22 whose GET response is the following:
```json
{
"name": "pss-cluster-coordinating-01",
"cluster_name": "prod_cluster",
"cluster_uuid": "FFQa8G3RRwqNdct-KMih4g",
"version": {
"distribution": "opensearch",
"number": "2.11.0",
"build_type": "tar",
"build_hash": "4dcad6dd1fd45b6bd91f041a041829c8687278fa",
"build_date": "2023-10-13T02:55:55.511945994Z",
"build_snapshot": false,
"lucene_version": "9.7.0",
"minimum_wire_compatibility_version": "7.10.0",
"minimum_index_compatibility_version": "7.0.0"
},
"tagline": "The OpenSearch Project: https://opensearch.org/"
}
```
- In my testing, there are a few things that remove the error, each of which also prevent the query from functioning properly and returning the desired document:
1. Removing `max_gaps` from the query or setting it to -1.
2. Removing the custom `text_general_search` analyzer from the mapping.
3. Removing either `stop` or `lowercase` from the analyzer filter.
Example of query that returns the error:
{
"query": {
"hybrid": {
"queries": [
{
"nested": {
"path": "chunks",
"query": {
"query_string": {
"fields":["chunks.text"],
"query": "tipi~1 AND pagamento~1"
}
}
}
},
{
"nested": {
"path": "chunks",
"query": {
"neural": {
"chunks.embedding": {
"model_id": "sWzKyJYBCnSNyPkYXI9N",
"query_text": "tipi di pagamento"
}
}
}
}
}
]
}
}
}
Example of query that work just fine:
{
"query": {
"hybrid": {
"queries": [
{
"nested": {
"path": "chunks",
"query": {
"query_string": {
"fields":["chunks.text"],
"query": "scheda~1 AND tecnica~1 AND prodotto~1"
}
}
}
},
{
"nested": {
"path": "chunks",
"query": {
"neural": {
"chunks.embedding": {
"model_id": "sWzKyJYBCnSNyPkYXI9N",
"query_text": "scheda tecnica prodotto"
}
}
}
}
}
]
}
}
}
Seeing that the query structure is the same, it seems to be a problem with the records in the result…
(I did not add the inner_hits part as I wanted to resolve the hybrid query problem first)
Can anyone help?
1 Like
Have the same issue.
Did anyone help you?
no, and I did not see anything regarding it in the upcoming fixes for future versions
I have ditto same problem. Nested chunk structure, works with 2.19 but it’s failing after upgrade to 3.0.0
I just tried downloading the 3.1.0 version that just came out and it seems to work. I am going to do more tests just to be sure but I advise you to try upgrading.
I ran into the same issue.
It was fine with 2.17.1 but not with 3.0.0.
After upgrading to 3.2.0, the exactly same query run without this issue.