Opensearch 2.13 Text chunking test error

jooung · August 16, 2024, 12:44am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
opensearch version : 2.13

Describe the issue:
Text chunking is in progress by referring to the link below.
(Text chunking - OpenSearch Documentation)

An error occurs during Step 3.

Step 1: Create a pipeline

PUT _ingest/pipeline/text-chunking-embedding-ingest-pipeline
{
  "description": "A text chunking and embedding ingest pipeline",
  "processors": [
    {
      "text_chunking": {
        "algorithm": {
          "fixed_token_length": {
            "token_limit": 10,
            "overlap_rate": 0.2,
            "tokenizer": "standard"
          }
        },
        "field_map": {
          "passage_text": "passage_chunk"
        }
      }
    },
    {
      "text_embedding": {
        "model_id": "GkyElI8BKGaMwVo6PeZn",
        "field_map": {
          "passage_chunk": "passage_chunk_embedding"
        }
      }
    }
  ]
}

Step 2: Create an index for ingestion

PUT testindex
{
  "settings": {
    "index": {
      "knn": true,
      "default_pipeline": "text-chunking-embedding-ingest-pipeline"
    }
  },
  "mappings": {
    "properties": {
      "passage_text": {
        "type": "text"
      },
      "passage_chunk_embedding": {
        "type": "nested",
        "properties": {
          "knn": {
            "type": "knn_vector",
            "dimension": 768
          }
        }
      }
    }
  }
}

Step 3: Ingest documents into the index

POST testindex/_doc?pipeline=text-chunking-embedding-ingest-pipeline
{
  "passage_text": "This is an example document to be chunked. The document contains a single paragraph, two sentences and 24 tokens by standard tokenizer in OpenSearch."
}

The following error occurs:

{
  "error": {
    "root_cause": [
      {
        "type": "index_not_found_exception",
        "reason": "no such index [testindex]",
        "index": "testindex",
        "index_uuid": "lE1DNT22ShW-eD_0gSkxUg"
      }
    ],
    "type": "index_not_found_exception",
    "reason": "no such index [testindex]",
    "index": "testindex",
    "index_uuid": "lE1DNT22ShW-eD_0gSkxUg"
  },
  "status": 404
}

The model used “GkyElI8BKGaMwVo6PeZn” is a TEXT_EMBEDDING model

Is there something I did wrong?

yeonghyeonKo · August 16, 2024, 2:36am

Hi, can you show me the list of indices?

ex. GET _cat/indices?v&s=index:desc

There isn’t any error I could find. Your version is adequate(>2.13) to use Text Chunking processor.

jooung · August 16, 2024, 4:45am

Hi

Searching for “GET _cat/indices?v&s=index:desc” results in:

health status index                                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   testindex                                   lE1DNT22ShW-eD_0gSkxUg   1   1          0            0       416b           208b

This happens even though the index exists.

This is the search result of “GET testindex/_mapping”.

{
  "testindex": {
    "mappings": {
      "properties": {
        "passage_chunk_embedding": {
          "type": "nested",
          "properties": {
            "knn": {
              "type": "knn_vector",
              "dimension": 768
            }
          }
        },
        "passage_text": {
          "type": "text"
        }
      }
    }
  }
}

Did you have any problems following the document in the link below?
(Text chunking - OpenSearch Documentation)

help me please

yuye-aws · September 4, 2024, 4:04am

How much nodes do you have in your OpenSearch cluster? From your cat index result, there are two indices, 1 primary and 1 replica. If your cluster has 3 nodes, there maybe a bug when getting the index setting within the text chunking processor in 2.13. You can either solve it by:

Upgrade your OpenSearch to any version starting from 2.14 (recommended)
Properly set the number of primary shards. For example, the number of primary shards should be divisible by the number of cluster nodes.

Topic		Replies	Views
Append processor for vector field OpenSearch configure	6	87	October 12, 2024
Provided Text Chunking Example fails with Neural Sparse! OpenSearch	0	40	May 9, 2025
[Feedback] Neural Search plugin - experimental release General Feedback releases	42	3652	July 18, 2023
How to do chunking of dataset before sending into index OpenSearch configure	1	428	June 11, 2024
Regarding storing vectors k-NN troubleshoot	3	319	February 6, 2024

Opensearch 2.13 Text chunking test error

Related topics