Describe the issue:
Does the Neural Search Plugin ingestion pipeline support chunking for large text? So if I have let us say, a complete page or even the text of a full document, would it generate a pooled embedding for each chunk in the document?
Yes what you said is correct. Sometimes you don’t want to store embeddings for all chunks to save storage space.You only want relevant text blocks in one embedding. Also, neural search currently does not support nested documents. In addition, storing each chunk means I have to repeat the document (a user manual for instance) metadata for each chunk which will be unnecessary waste of storage size.