Implementing RAG workflow without semantic search

I’m working on trying to figure out the best way to implement a RAG based workflow in our current environment.

For some of our data we’ve already implemented a semantic search and the RAG-based approach works pretty well. However, that dataset is pretty small and changes infrequently, so the cost of creating the vector embeddings required for a neural search was acceptable.

However, we have a much larger set of data that we’d like to incorporate with our RAG workflow. This data grows much more rapidly and changes frequently. The dataset is also magnitudes larger in scale and based on my testing, the inference process of creating the vector embeddings is going to be problematic (especially if we want to keep the models up-to-date as the models improve).

Are there any “tricks” that could be used to get semantic-like search results w/out actually using something like a sparse_neural search over vector embeddings?

For example, if I run the search query through the /_plugins/_ml/_predict/sparse_encoding/ endpoint it returns the keywords returned as vectors, which essentially creates the alternative keywords:

POST '/_plugins/_ml/_predict/sparse_encoding/my-neural-sparse-encoding-doc-model?pretty'

'{"text_docs":["printer not printing"]}'

Returns something like:

{
  "inference_results" : [
    {
      "output" : [
        {
          "name" : "output",
          "dataAsMap" : {
            "response" : [
              {
                "plane" : 0.13548197,
                "refusing" : 0.2285405,
                "ps" : 0.08148531,
                "staple" : 0.017857138,
                "unavailable" : 0.39448622,
                "hp" : 0.458073,
                "none" : 0.026089909,
                "without" : 0.4188605,
                "canon" : 0.1624532,
                "compatible" : 0.06958102,
                "problem" : 0.054268926,
                "stuck" : 0.008346202,
                "##writer" : 0.010040492,
                "press" : 0.05793321,
                "nobody" : 0.26224786,
                "inability" : 0.053707708,
                "still" : 0.5137019,
                "issue" : 0.097482756,
                "cartridges" : 0.26044956,
                "printer" : 1.2386335,
                "eps" : 0.36194655,
                "printed" : 0.70225865,
                "fail" : 0.13925655,
                "cartridge" : 0.35146973,
                "stop" : 0.4058991,
                "stopping" : 0.24806167,
                "presses" : 0.0027390127,
                "invalid" : 0.11070499,
                "stops" : 0.14465772,
                "brother" : 0.042845514,
                "##fu" : 0.024613153,
                "carriage" : 0.10397118,
                "nope" : 0.12446236,
                "##pi" : 0.06875015,
                "stamp" : 0.018560074,
                "cycle" : 0.11187978,
                "out" : 0.11581144,
                "computer" : 0.06991433,
                "pressing" : 0.031078983,
                "refuse" : 0.17876413,
                "mis" : 0.14851154,
                "seldom" : 0.013753176,
                "stopped" : 0.38881075,
                "refused" : 0.40206233,
                "reprint" : 0.08413739,
                "missed" : 0.017528972,
                "refuses" : 0.23228438,
                "produce" : 0.02841626,
                "off" : 0.5107239,
                "printing" : 0.94587445,
                "wrong" : 0.17873272,
                "p" : 0.09269763,
                "##ps" : 0.07366625,
                "t" : 0.5519738,
                "machine" : 0.17027827,
                "barely" : 0.064711004,
                "cannot" : 0.6810034,
                "quit" : 0.13328628,
                "mail" : 0.0033408564,
                "cop" : 0.22482851,
                "unable" : 0.5299461,
                "copies" : 0.096331626,
                "publication" : 0.024400862,
                "un" : 0.17762037,
                "always" : 0.059352845,
                "lack" : 0.04366269,
                "##ider" : 0.122690976,
                "never" : 0.6442409,
                "failure" : 0.13500433,
                "publisher" : 0.057925675,
                "page" : 0.049830217,
                "rarely" : 0.13114671,
                "fa" : 0.0628478,
                "hardly" : 0.063973024,
                "crop" : 0.17125338,
                "problems" : 0.009528161,
                "no" : 0.77709275,
                "production" : 0.14042057,
                "unclear" : 0.05732778,
                "##jet" : 0.49433687,
                "non" : 0.56473887,
                "error" : 0.1907664,
                "papers" : 0.18240266,
                "accidentally" : 0.11125144,
                "##lim" : 0.029441293,
                "miss" : 0.037266716,
                "not" : 1.2390096,
                "nowhere" : 0.04574121,
                "paper" : 0.4667224,
                "load" : 0.04023573,
                "laser" : 0.029455416,
                "engine" : 0.10904598,
                "missing" : 0.049891345,
                "trouble" : 0.035828467,
                "photographer" : 0.016123502,
                "##rmin" : 0.015775522,
                "fails" : 0.040917967,
                "##print" : 0.23750506,
                "on" : 0.10091863,
                "printers" : 0.98327684,
                "ink" : 0.14815213,
                "reading" : 0.08239229,
                "prints" : 0.67032325,
                "postage" : 0.19471379,
                "print" : 0.899323,
                "pc" : 0.24175878,
                "reprinted" : 0.033356067,
                "martin" : 0.10163642,
                "neither" : 0.20043628,
                "writer" : 0.18240723,
                "avoid" : 0.14979815,
                "pl" : 0.17285673
              }
            ]
          }
        }
      ]
    }
  ]
}

In theory, the vector keys could be used to generate a BM25 search string like, so instead of searching:

printer not printing

It generates something like:

(printer OR hp OR canon OR brother) and (issue OR trouble OR...)

It’s obviously not the same, but I’m wondering if there would be a way to get a search that’s “semantic” enough without having to use vector embeddings.

Is this just something not even worth pursuing?

Is anyone else using RAG-based workflows with their search results that are not using semantic search over their corpus?

Thanks!
-Dan

You can certainly do RAG with BM25 (no vector embeddings), but you would have to construct the search query in the format that OpenSearch expects, not a natural language query. If you want your users to type natural language questions, then you would need a translation layer that converts that to OpenSearch queries. A search request processor (query rewriting) might be one way to accomplish that.