Multiple ingest pipelines for an index

I am working on a scenario to use multiple search types on a single index. In the same index, I would like to perform Neural Sparse, Neural dense (kNN) and Keyword (match) search using pipelines.

Indexing: This requires configuring separate ingest pipeline for Neural Sparse and Neural dense search as the respective features (kNN vector for Neural dense and rank_features for Neural sparse) are provided by different remote models in my case.

Searching: On the search side, I am trying to use normalization-processor and combination processor using a hybrid query to combine the scores coming from any 2 or all 3 search types.

Questions:

  1. How to use multiple ingest pipelines for the same index, one for a search type ? or should I have separate index and pipeline for Neural Sparse and Neural Dense search ? (having separate indices is causing 100% increase in storage)
  2. Can I combine more than 2 queries in hybrid query (Please point to any samples in case) ?

Thanks.

Hi @Praveen
For ingestion what you can do is you can use 1 single pipeline and then define multiple ingest processors in that 1 pipeline. Please check the below example:

PUT /_ingest/pipeline/nlp-ingest-pipeline-more-than-1-processor
{
  "description": "A pipeline with more than 1 processor",
  "processors": [
    {
      "sparse_encoding": {
        "model_id": "SPARSE_MODEL_ID",
        "field_map": {
          "passage_text": "passage_sparse"
        }
      }
    },
    {
      "text_embedding": {
        "model_id": "DENSE_VECTOR_EMBEDDING_MODEL",
        "field_map": {
          "passage_text": "passage_dense_vector"
        }
      }
    }
  ]
}
PUT /my-nlp-index
{
  "settings": {
    "default_pipeline": "nlp-ingest-pipeline-more-than-1-processor"
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "passage_sparse": {
        "type": "rank_features"
      },
      "passage_text": {
        "type": "text"
      },
      "passage_dense_vector": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "engine": "lucene",
          "space_type": "l2",
          "name": "hnsw",
          "parameters": {}
        }
      },
    }
  }
}

For Hybrid Search, you can use more than 2 queries. Just make sure that the queries defined in hybrid search are not same.

I hope this ans your question.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.