[Feedback] Conversational Search and Retrieval Augmented Generation Using Search Pipeline - Experimental Release

In OpenSearch 2.10, we launched two new features that bring capabilities of GenAI to OpenSearch. The first one is Memory and it is meant to serve as a building block for search applications and agents to store and retrieve conversational history. The second is a new search processor that handles RAG (Retrieval Augmented Generation) which combines search results and large language models and conversational memory to answer users’ questions. RAG in OpenSearch relies on the remote inference framework and the connector feature. When you put all of these pieces together to have conversations over your data, we also recommend that you try it using Hybrid Search combining both BM25 and KNN to get the most out of OpenSearch.

We are looking forward to getting the community’s feedback on these features. We are excited to make them available in the 2.10 release and have people try out conversational search. We think this new mode of interacting with the data enables users to get better search results. Please, try it out and help us make it even better.

For a more detailed discussion on this, you can check out our RFC - https://github.com/opensearch-project/ml-commons/issues/1150.

How can I build a RAG with a model other than Open AI, Cohere and Sagemaker.
Can i use hugging face transformer or BERT model for predicting sentences? without having any hugging face key

i am using version 2.10

If yes how to build a http connector for it? or how can i load the model

I tried this below one

POST /_plugins/_ml/models/_upload

  "name": "huggingface/TheBloke/vicuna-13B-1.1-GPTQ",
  "version": "1.0.1",
  "model_format": "TORCH_SCRIPT"

But after having the model, i get this below error. Can i load any other LLM model for RAG?

    "error": {
        "root_cause": [
                "type": "m_l_exception",
                "reason": "plugins.ml_commons.rag_pipeline_feature_enabled is not enabled."
        "type": "m_l_exception",
        "reason": "plugins.ml_commons.rag_pipeline_feature_enabled is not enabled."
    "status": 500

I have already enabled rag, still i get the above error

  "persistent": {
    "plugins.ml_commons.rag_pipeline_feature_enabled": "true"

This is fixed by adding trusted endpoint

Some examples are wrong in the conversational search documentation; please correct them.

  1. Extra space on OpenAI Connector request_body in temperature
    Awesome Screenshot

  2. The model group’s creation endpoint is wrong

POST /_plugins/_ml/model_groups/_register


It would be nice to have a RAG that uses the results of the Highlight as well as the _source.

1 Like

@sribalajivelan , these two are fixed now.

I’m attempting to use the Anthrophic Claude via Bedrock connector with Conversational Search.
When defining the blueprint the Connector specifies that the prompt is populated with ${parameters.inputs}. (ml-commons/docs/remote_inference_blueprints/bedrock_connector_anthropic_claude_blueprint.md at 2.x · opensearch-project/ml-commons · GitHub)

I have defined my pipeline:

PUT /_search/pipeline/rag_pipeline2
  "response_processors": [
      "retrieval_augmented_generation": {
        "tag": "claude_rag",
        "description": "Demo pipeline Using Claude Connector",
        "model_id": "eFbAQo4BZvVG6ToieVK8",
        "context_field_list": ["text"],
  			"llm_model": "anthropic.claude-v2",
        "system_prompt": "You are a helpful assistant",
        "user_instructions": "Generate a concise and informative answer in less than 100 words for the given question"

When using the RAG pipeline with a simple search I get the error:

GET /tweet-index/_search?search_pipeline=rag_pipeline2
	"query" : {
    "match": {
      "text": "simple"
	"ext": {
		"generative_qa_parameters": {
			"llm_question": "Was Abraham Lincoln a good politician"
    "type": "illegal_argument_exception",
    "reason": "Some parameter placeholder not filled in payload: inputs"

I cannot find how to instruct the search pipeline to populate the ‘inputs’ field.
I have also tried changing the Bedrock connector to use ${parameters.messages} as the OpenAI connector uses, but this then returns

    "type": "illegal_argument_exception",
    "reason": "Invalid JSON in payload"

How do you customise a search / search pipeline to format the prompt and match the input params of the connector?

I tried it with an own internal server that offers an OpenAi-compatible http interface to a model. I added it to the trusted connector endpoints, thus the server Url is accepted now. Nevertheless I get now the error that it is a private IP address:
[ERROR][o.o.m.e.h.MLHttpClientFactory] [port-4106] Remote inference host name has private ip address: serv-3329
[ERROR][o.o.m.e.a.r.HttpJsonConnectorExecutor] [port-4106] Fail to execute http connector
java.lang.IllegalArgumentException: serv-3329

How can I connect to my model server now?


I had (almost) no problem using this from the opensearch dashboard web ui / Dev Tools. But now I am trying it out via the Javascript SDK and running into a problem once I try to send a query:

GenerativeQAResponseProcessor failed in precessing response

In my server logs I see:

java.lang.IllegalArgumentException: Invalid payload: "{ "model": "gpt-3.5-turbo", "messages": [{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"Generate a concise and informative answer in less than 100 words for the given question"},{"role":"user","content":"QUESTION: Tell me about 10Fold"},{"role":"user","content":"ANSWER:"}], "temperature": 0 }"

I see I have something wrong in my setup, I just can’t figure out what

Are you saying this happens with a connector that works when you use it from the OS dashboards?

Can you comment on this issue - [BUG] externally hosted model can not have a private ip address · Issue #2142 · opensearch-project/ml-commons · GitHub

For now, you can specify “bedrock” as a model prefix when creating a search pipeline:

You won’t have to do that starting OpenSearch 2.13.

Sorry for the confusion, no I am recreating the same setup as the tutorial, here is my code:

export async function recreateIndex() {
  console.log("--- Recreating index ---");
  const client = getClient();

  try {
    const response1 = await client.indices.delete({
    console.log("response1 :>>", JSON.stringify(response1.body, null, 2));
  } catch (error) {
    // Ignore if index does not exist

  const response2 = await client.http.post({
    path: "/_plugins/_ml/connectors/_create",
    body: {
      name: "OpenAI Chat Connector",
      description: "The connector to public OpenAI model service for GPT 3.5",
      version: 2,
      protocol: "http",
      parameters: {
        endpoint: "api.openai.com",
        model: "gpt-3.5-turbo",
        temperature: 0,
      credential: {
        openAI_key: "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",
      actions: [
          action_type: "predict",
          method: "POST",
          url: "https://${parameters.endpoint}/v1/chat/completions",
          headers: {
            Authorization: "Bearer ${credential.openAI_key}",
            '"{ "model": "${parameters.model}", "messages": ${parameters.messages}, "temperature": ${parameters.temperature} }"',
  const { connector_id } = response2.body;
  console.log("connector_id :>> ", connector_id);

  const response3 = await client.http.post({
    path: "/_plugins/_ml/models/_register",
    body: {
      name: "openAI-gpt-3.5-turbo",
      function_name: "remote",
      description: "test model",
  const { model_id } = response3.body;
  console.log("model_id :>> ", model_id);

  const response4 = await client.http.post({
    path: `/_plugins/_ml/models/${model_id}/_deploy`,
  console.log("response4 :>> ", response4.body);

  const response5 = await client.http.put({
    path: "/_search/pipeline/rag_pipeline",
    body: {
      response_processors: [
          retrieval_augmented_generation: {
            tag: "openai_pipeline_demo",
            description: "Demo pipeline Using OpenAI Connector",
            context_field_list: ["textContent"],
            system_prompt: "You are a helpful assistant",
              "Generate a concise and informative answer in less than 100 words for the given question",
  console.log("response5 :>>", JSON.stringify(response5.body, null, 2));

  const response6 = await client.http.put({
    path: `/${index}`,
    body: {
      settings: {
        "index.search.default_pipeline": "rag_pipeline",
      mappings: {
        properties: {
          textContent: {
            type: "text",
  console.log("response6 :>>", JSON.stringify(response6.body, null, 2));