Unable to connect to the remote service via RAG connector

Nagpraveen · October 3, 2025, 6:22am

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser): 2.19

Describe the issue: I am trying to configure a RAG agent. As a pre-rq, I am trying to create a connector to an externally hosted llm endpoint for chat completion based on lama.

However, the connector is not able to acquire the connection with in 10 seconds - shows below error with 500 status when trying to do a test run with predict api command.

But the same endpoint is tested and works fine when independently tested on postman, so no issues with the LLM endpoint.

Tried with different client_config params increasing timeouts, connections etc., but still no luck.

Configuration:

PUT /_plugins/_ml/connectors/mUXmoJkBoHKSTJSf8rbd
{
  "name": "Llama-3.3-70B-Instruct Connector",
  "description": "Connector for Llama-3.3-70B-Instruct",
  "protocol": "http",
  "version": 1,
  "parameters": {
    "model": "meta-llama/Llama-3.3-70B-Instruct",
    "temperature": 0.7,
    "max_tokens": 500,
    "endpoint": "api-int-hawkeye-dev.carbon.com"
  },
  "credential": {
    "api_key": "<api token>"
  },
  "client_config" : {
    "read_timeout": 60000,
    "connection_timeout": 30000,
    "max_connection": 256,
    "max_retry_times": 3,
    "retry_backoff_policy": "exponential_full_jitter"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://${parameters.endpoint}/llama70b/v1/chat/completions",
      "headers": {
        "Content-Type": "application/json",
        "Authorization": "Bearer ${credential.api_key}"
      },
      "request_body": "{ \"model\": \"${parameters.model}\",  \"messages\": ${parameters.messages}, \"temperature\": ${parameters.temperature}, \"max_tokens\": ${parameters.max_tokens}, \"stream\": false }"
    }
  ]
}

Relevant Logs or Screenshots:

Nagpraveen · October 8, 2025, 12:10pm

Any help here would be appreciated. Thanks!

pablo · October 9, 2025, 11:15am

@Nagpraveen Are you referring to the self-hosted OpenSearch RAG?

Could you elaborate a bit more on your design?

Nagpraveen · October 9, 2025, 11:33am

@pablo : I am referring to building connectors to connect to externally hosted models.

I am trying out a POC to build a connector on my opensearch which is hosted on GCP.

and the remote llm is deployed on a different on prem server.

pablo · October 10, 2025, 7:30pm

@Nagpraveen I was able to create the Ollama connector and run the predict API successfully. I used this GitHub documentation.

github.com/opensearch-project/ml-commons

docs/remote_inference_blueprints/ollama_connector_chat_blueprint.md

d1e53af58

# Ollama (OpenAI compatible) connector blueprint example for chat

This is an AI connector blueprint for Ollama or any other local/self-hosted LLM as long as it is OpenAI compatible (Ollama, llama.cpp, vLLM, etc)

## 1. Add connector endpoint to trusted URLs

Adjust the Regex to your local IP. The following example allows all URLs.

```json
PUT /_cluster/settings
{
    "persistent": {
        "plugins.ml_commons.trusted_connector_endpoints_regex": [
            ".*$"
        ]
    }
}
```

## 2. Enable private addresses

This file has been truncated. show original

These were my steps:

Configure the cluster to run Ollama with private IP addressing

PUT /_cluster/settings
{
  "persistent": {
    "plugins": {
      "ml_commons": {
        "only_run_on_ml_node": "false",
        "trusted_connector_endpoints_regex": [
          ".*$",
          "^http://ollama\\.pablo\\.local(:[0-9]+)?/.*$"          
        ],
        "model_access_control_enabled": "true",
        "native_memory_threshold": "99",
        "allow_registering_model_via_local_file": "true",
        "connector": {
          "private_ip_enabled": "true"
        }
      }
    }
  }
}

Create a connector to local Ollama and local model llama3.1:8b. API key didn’t matter as I was using local model.

POST /_plugins/_ml/connectors/_create
{
  "name": "Llama-3.3-70B-Instruct Connector",
  "description": "Connector for Llama-3.3-70B-Instruct",
  "protocol": "http",
  "version": 1,
  "parameters": {
    "model": "llama3.1:8b",
    "temperature": 0.7,
    "max_tokens": 500,
    "endpoint": "192.168.1.7:11434"
  },
  "credential": {
    "api_key": "123456789123456789123456789"
  },
  "client_config" : {
    "read_timeout": 60000,
    "connection_timeout": 30000,
    "max_connection": 256,
    "max_retry_times": 3,
    "retry_backoff_policy": "exponential_full_jitter"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "http://${parameters.endpoint}/v1/chat/completions",
      "headers": {
        "Content-Type": "application/json",
        "Authorization": "Bearer ${credential.api_key}"
      },
      "request_body": "{ \"model\": \"${parameters.model}\",  \"messages\": ${parameters.messages}, \"temperature\": ${parameters.temperature}, \"max_tokens\": ${parameters.max_tokens}, \"stream\": false }"
    }
  ]
}

Create a connector to the local Ollama and cloud model gpt-oss:120b-cloud
This will require running ollama sigin and registering the cloud model with ollama run gpt-oss:120b-cloud and creating API and Ollama keys in ollama.com

POST /_plugins/_ml/connectors/_create
{
  "name": "ollama-gpt-oss-cloud",
  "description": "Connector for Llama-3.3-70B-Instruct",
  "protocol": "http",
  "version": 1,
  "parameters": {
    "model": "gpt-oss:120b-cloud",
    "temperature": 0.7,
    "max_tokens": 500,
    "endpoint": "192.168.1.7:11434"
  },
  "credential": {
    "api_key": "75ea119c4cd9427bb240d..."
  },
  "client_config" : {
    "read_timeout": 60000,
    "connection_timeout": 30000,
    "max_connection": 256,
    "max_retry_times": 3,
    "retry_backoff_policy": "exponential_full_jitter"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "http://${parameters.endpoint}/v1/chat/completions",
      "headers": {
        "Content-Type": "application/json",
        "Authorization": "Bearer ${credential.api_key}"
      },
      "request_body": "{ \"model\": \"${parameters.model}\",  \"messages\": ${parameters.messages}, \"temperature\": ${parameters.temperature}, \"max_tokens\": ${parameters.max_tokens}, \"stream\": false }"
    }
  ]
}

Register a model with the Ollama connector and local model lama3.1:8b

POST /_plugins/_ml/models/_register
{
  "name": "ollama",
  "function_name": "remote",
  "connector_id": "0RDozpkBC2q961VMpSvy"
}

Register a model with the Ollama connector and cloud model gpt-oss:120b-cloud

POST /_plugins/_ml/models/_register
{
  "name": "ollama-cloud",
  "function_name": "remote",
  "connector_id": "1BByz5kBC2q961VMkC2H"
}

Test the model with _predict

POST /_plugins/_ml/models/2hBzz5kBC2q961VMVi0C/_predict
{
  "parameters": {
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!What is your name???"
      }
    ]
  }
}

Response from cloud model:

{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "dataAsMap": {
            "id": "chatcmpl-14",
            "object": "chat.completion",
            "created": 1760124242,
            "model": "gpt-oss:120b-cloud",
            "system_fingerprint": "fp_ollama",
            "choices": [
              {
                "index": 0,
                "message": {
                  "role": "assistant",
                  "content": "Hello! I’m ChatGPT, your friendly AI assistant. How can I help you today?",
                  "reasoning": """We need to respond. The user says "Hello!What is your name??". We should greet and give name. According to system, we are ChatGPT. The user just asks name. So respond friendly."""
                },
                "finish_reason": "stop"
              }
            ],
            "usage": {
              "prompt_tokens": 73,
              "completion_tokens": 72,
              "total_tokens": 145
            }
          }
        }
      ],
      "status_code": 200
    }
  ]
}

Response from a local model:

{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "dataAsMap": {
            "id": "chatcmpl-679",
            "object": "chat.completion",
            "created": 1760124278,
            "model": "llama3.1:8b",
            "system_fingerprint": "fp_ollama",
            "choices": [
              {
                "index": 0,
                "message": {
                  "role": "assistant",
                  "content": """I don't have a personal name, but I'm often referred to as "Assistant" or "AI Assistant". Some people also call me "Nova" (just a nickname!). My main goal is to assist you with any questions, tasks, or topics you'd like to discuss. How can I help you today?"""
                },
                "finish_reason": "stop"
              }
            ],
            "usage": {
              "prompt_tokens": 28,
              "completion_tokens": 65,
              "total_tokens": 93
            }
          }
        }
      ],
      "status_code": 200
    }
  ]
}

Nagpraveen · October 15, 2025, 6:15am

Thank you for the POC on local and cloud.

This could be more of a firewall or whitelisting issue on my side while contact LLM endpoint.

I will try to deploy the model on local machine.

Topic		Replies	Views
Create a connector in order to Connecting to remote models OpenSearch	7	1201	April 27, 2024
IllegalStateException when executing RAG agent Machine Learning discuss , troubleshoot , configure	1	91	July 25, 2025
[Feedback] Conversational Search and Retrieval Augmented Generation Using Search Pipeline - Experimental Release General Feedback discuss	12	1806	March 30, 2024
Connecting to externally hosted models error "illegal_argument_exception" Machine Learning	4	1153	May 1, 2024
Create connector is failing with 502 and permission error for openai embedding OpenSearch configure	1	741	March 14, 2024

Unable to connect to the remote service via RAG connector

Related topics