Neural Search Plugin Chunking For Large Text

asfoorial · July 31, 2023, 7:37am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
OpenSearch 2.9

Describe the issue:
Does the Neural Search Plugin ingestion pipeline support chunking for large text? So if I have let us say, a complete page or even the text of a full document, would it generate a pooled embedding for each chunk in the document?

Thanks

ylwu · August 1, 2023, 7:15am

Can you elaborate more ? I guess you mean such steps

Split a big document into smaller chunks
Calculate embedding for each chunk
Calculate pooled embedding for all chunks

Is that correct ? If yes, neural search doesn’t support this feature now. Why generate pooled embedding, rather than save embedding for each chunk?

asfoorial · August 1, 2023, 8:22am

Yes what you said is correct. Sometimes you don’t want to store embeddings for all chunks to save storage space.You only want relevant text blocks in one embedding. Also, neural search currently does not support nested documents. In addition, storing each chunk means I have to repeat the document (a user manual for instance) metadata for each chunk which will be unnecessary waste of storage size.

ylwu · August 1, 2023, 4:57pm

Thanks for the explaination. Can you cut a Github issue for feature request on this repo Issues · opensearch-project/ml-commons · GitHub ?

system · September 30, 2023, 4:58pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to get score per chunk so that i can retrieve the most relevant chunk from the document? OpenSearch	2	225	January 16, 2025
How to do chunking of dataset before sending into index OpenSearch configure	1	428	June 11, 2024
ML Tools support for nested fields Machine Learning	1	200	July 6, 2024
Provided Text Chunking Example fails with Neural Sparse! OpenSearch	0	40	May 9, 2025
[Feedback] Neural Search plugin - experimental release General Feedback releases	42	3652	July 18, 2023

Neural Search Plugin Chunking For Large Text

Related topics