Implementing RAG workflow without semantic search

dswitzer2 · November 25, 2024, 10:55pm

I’m working on trying to figure out the best way to implement a RAG based workflow in our current environment.

For some of our data we’ve already implemented a semantic search and the RAG-based approach works pretty well. However, that dataset is pretty small and changes infrequently, so the cost of creating the vector embeddings required for a neural search was acceptable.

However, we have a much larger set of data that we’d like to incorporate with our RAG workflow. This data grows much more rapidly and changes frequently. The dataset is also magnitudes larger in scale and based on my testing, the inference process of creating the vector embeddings is going to be problematic (especially if we want to keep the models up-to-date as the models improve).

Are there any “tricks” that could be used to get semantic-like search results w/out actually using something like a sparse_neural search over vector embeddings?

For example, if I run the search query through the /_plugins/_ml/_predict/sparse_encoding/ endpoint it returns the keywords returned as vectors, which essentially creates the alternative keywords:

POST '/_plugins/_ml/_predict/sparse_encoding/my-neural-sparse-encoding-doc-model?pretty'

'{"text_docs":["printer not printing"]}'

Returns something like:

{
  "inference_results" : [
    {
      "output" : [
        {
          "name" : "output",
          "dataAsMap" : {
            "response" : [
              {
                "plane" : 0.13548197,
                "refusing" : 0.2285405,
                "ps" : 0.08148531,
                "staple" : 0.017857138,
                "unavailable" : 0.39448622,
                "hp" : 0.458073,
                "none" : 0.026089909,
                "without" : 0.4188605,
                "canon" : 0.1624532,
                "compatible" : 0.06958102,
                "problem" : 0.054268926,
                "stuck" : 0.008346202,
                "##writer" : 0.010040492,
                "press" : 0.05793321,
                "nobody" : 0.26224786,
                "inability" : 0.053707708,
                "still" : 0.5137019,
                "issue" : 0.097482756,
                "cartridges" : 0.26044956,
                "printer" : 1.2386335,
                "eps" : 0.36194655,
                "printed" : 0.70225865,
                "fail" : 0.13925655,
                "cartridge" : 0.35146973,
                "stop" : 0.4058991,
                "stopping" : 0.24806167,
                "presses" : 0.0027390127,
                "invalid" : 0.11070499,
                "stops" : 0.14465772,
                "brother" : 0.042845514,
                "##fu" : 0.024613153,
                "carriage" : 0.10397118,
                "nope" : 0.12446236,
                "##pi" : 0.06875015,
                "stamp" : 0.018560074,
                "cycle" : 0.11187978,
                "out" : 0.11581144,
                "computer" : 0.06991433,
                "pressing" : 0.031078983,
                "refuse" : 0.17876413,
                "mis" : 0.14851154,
                "seldom" : 0.013753176,
                "stopped" : 0.38881075,
                "refused" : 0.40206233,
                "reprint" : 0.08413739,
                "missed" : 0.017528972,
                "refuses" : 0.23228438,
                "produce" : 0.02841626,
                "off" : 0.5107239,
                "printing" : 0.94587445,
                "wrong" : 0.17873272,
                "p" : 0.09269763,
                "##ps" : 0.07366625,
                "t" : 0.5519738,
                "machine" : 0.17027827,
                "barely" : 0.064711004,
                "cannot" : 0.6810034,
                "quit" : 0.13328628,
                "mail" : 0.0033408564,
                "cop" : 0.22482851,
                "unable" : 0.5299461,
                "copies" : 0.096331626,
                "publication" : 0.024400862,
                "un" : 0.17762037,
                "always" : 0.059352845,
                "lack" : 0.04366269,
                "##ider" : 0.122690976,
                "never" : 0.6442409,
                "failure" : 0.13500433,
                "publisher" : 0.057925675,
                "page" : 0.049830217,
                "rarely" : 0.13114671,
                "fa" : 0.0628478,
                "hardly" : 0.063973024,
                "crop" : 0.17125338,
                "problems" : 0.009528161,
                "no" : 0.77709275,
                "production" : 0.14042057,
                "unclear" : 0.05732778,
                "##jet" : 0.49433687,
                "non" : 0.56473887,
                "error" : 0.1907664,
                "papers" : 0.18240266,
                "accidentally" : 0.11125144,
                "##lim" : 0.029441293,
                "miss" : 0.037266716,
                "not" : 1.2390096,
                "nowhere" : 0.04574121,
                "paper" : 0.4667224,
                "load" : 0.04023573,
                "laser" : 0.029455416,
                "engine" : 0.10904598,
                "missing" : 0.049891345,
                "trouble" : 0.035828467,
                "photographer" : 0.016123502,
                "##rmin" : 0.015775522,
                "fails" : 0.040917967,
                "##print" : 0.23750506,
                "on" : 0.10091863,
                "printers" : 0.98327684,
                "ink" : 0.14815213,
                "reading" : 0.08239229,
                "prints" : 0.67032325,
                "postage" : 0.19471379,
                "print" : 0.899323,
                "pc" : 0.24175878,
                "reprinted" : 0.033356067,
                "martin" : 0.10163642,
                "neither" : 0.20043628,
                "writer" : 0.18240723,
                "avoid" : 0.14979815,
                "pl" : 0.17285673
              }
            ]
          }
        }
      ]
    }
  ]
}

In theory, the vector keys could be used to generate a BM25 search string like, so instead of searching:

printer not printing

It generates something like:

(printer OR hp OR canon OR brother) and (issue OR trouble OR...)

It’s obviously not the same, but I’m wondering if there would be a way to get a search that’s “semantic” enough without having to use vector embeddings.

Is this just something not even worth pursuing?

Is anyone else using RAG-based workflows with their search results that are not using semantic search over their corpus?

Thanks!
-Dan

austinlee · November 28, 2024, 12:49am

You can certainly do RAG with BM25 (no vector embeddings), but you would have to construct the search query in the format that OpenSearch expects, not a natural language query. If you want your users to type natural language questions, then you would need a translation layer that converts that to OpenSearch queries. A search request processor (query rewriting) might be one way to accomplish that.

system · January 27, 2025, 12:49am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is there a way to create Sparse Neural index that uses raw vectors? Machine Learning	6	314	May 18, 2024
[Feedback] Conversational Search and Retrieval Augmented Generation Using Search Pipeline - Experimental Release General Feedback discuss	12	1566	March 30, 2024
[Feedback] Neural Search plugin - experimental release General Feedback releases	42	3629	July 18, 2023
ML Tools support for nested fields Machine Learning	1	197	July 6, 2024
Efficient k-NN filtering with Neural Search OpenSearch	0	63	February 5, 2025

Implementing RAG workflow without semantic search

Related topics