What does a model have to return from sagemaker for opensearch to be able to use it?

Versions (relevant - OpenSearch/Dashboard/Server OS/Browser):
Opensearch 2.9

Describe the issue:
I am trying to use a customer connector to have a model running in sagemaker do vector encoding. I can’t find anywhere documented what the inference endpoint needs to return to opensearch for either creating a vector of a search term, or creating vectors via a pipeline when ingesting data.

I have had the encoder return a list which fails, and also tried a json array that has the vector and that also fails. I am not sure what exactly needs to be returned from the model and how / what needs to be done at the connector level.

Configuration:

Relevant Logs or Screenshots:
My inference code is:

import json
from sentence_transformers import SentenceTransformer

def model_fn(model_dir):
  # Load model from HuggingFace Hub
    model = SentenceTransformer(model_dir)
    return model

def predict_fn(data, model):
    # Tokenize sentences
    print(data)
    
    input_texts = data.pop("inputs", data)
    embeddings_sentence_transformer = model.encode(input_texts, normalize_embeddings=True)

    return {"vectors": json.dumps(embeddings_sentence_transformer.tolist())}

I am trying to call the model via

POST /_plugins/_ml/models/gwq5Go4BcDeh12u6M7Fa/_predict
{
  "parameters": {"inputs": "hello world"},
  "return_number": true,
  "target_response": ["sentence_embedding"]
}

and get:

Failed to parse object: expecting token of type [START_OBJECT] but found [VALUE_NUMBER]

my connector code is:

"name": "multilingual-e5-large",
  "description": "multilingual-e5-large",
  "version": 1,
  "protocol": "aws_sigv4",
  "credential": {
    "roleArn": "arn:aws:iam::xxxxx:role/opensearch-sagemaker-role"
  },
  "parameters": {
    "region": "us-east-1",
    "service_name": "sagemaker"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/xxxxxxx/invocations",
      "headers": {
        "content-type": "application/json"
      },
      "post_process_function": "return params['vectors']",
      "request_body": "{\"inputs\":\"${parameters.inputs}\"}"
    }
  ]

Hi @jtrollin ,

Could you please take a look at this doc and see if that solves your issue?

Thanks
Dhrubo

I think I am close, but now I get this error from opensearch

"reason": "arraycopy: element type mismatch: can not cast one of the elements of java.lang.Object[] to the type of the destination array, java.lang.Number"

I am forcing the array to be of float32 in the inference, but no matter what I do, the connector does not see the array as the right type

def predict_fn(data, model):
	# Tokenize sentences
	print(data)

	#input_texts = data.pop("inputs", data)
	#data = json.loads(input_texts)
 
	# Iterate through the JSON array
	print(data)

	x = 0
	for item in data:
		data[x] = "query: " + data[x]
		x = x + 1

	print(data)
	
	results = model.encode(data, normalize_embeddings=True)
	print(results)
	print(results.__class__)
	print(results.dtype)
	returnVal = results.astype('float32')
	print(returnVal)
	print(returnVal.__class__)
	print(returnVal.dtype)
	print("before return")
	return {"inference_results": returnVal}

the returnVal.dtype is float32 right before the return, any ideas what I am doing wrong?

What do we have in data and model? Sorry I didn’t quite follow your predict_fn. Are you facing any issue with connector to get embeddings?

data is what is passed from the connector so like [“hello world”, “Goodbye world”]

the model is the intfloat/multilingual-e5-large · Hugging Face model

so the connector passes the values in the inference fine, I get the embeddings from the model, as when I print out what the embeddings look like they match what the doc you sent me says it should look like, but then when the connector (I am assuming it is the connector) gets the response from the inference then it throws the error about the arraycopy. I just don’t know what is going on there that is causing it.

This is the actual stack trace, looking at the code I think the issue is the examples you showed me are for 2.12, looking at the 2.9 code, it doesn’t seem to expect more than one encoding coming back, that code has changed a lot. Let me upgrade and see what happens

[2024-03-09T00:21:30,653][WARN ][r.suppressed             ] [0cf0d01da504393454e1bf71adc34484] path: __PATH__ params: {pretty=true, model_id=jQqVII4BcDeh12u6s7He}
java.lang.ArrayStoreException: arraycopy: element type mismatch: can not cast one of the elements of java.lang.Object[] to the type of the destination array, java.lang.Number
	at __PATH__(ArrayList.java:400)
	at org.opensearch.ml.common.connector.MLPostProcessFunction.lambda$buildModelTensorList$0(MLPostProcessFunction.java:50)
	at __PATH__(ArrayList.java:1511)
	at org.opensearch.ml.common.connector.MLPostProcessFunction.lambda$buildModelTensorList$1(MLPostProcessFunction.java:44)
	at org.opensearch.ml.engine.utils.ScriptUtils.executeBuildInPostProcessFunction(ScriptUtils.java:29)
	at org.opensearch.ml.engine.algorithms.remote.ConnectorUtils.processOutput(ConnectorUtils.java:160)
	at org.opensearch.ml.engine.algorithms.remote.AwsConnectorExecutor.invokeRemoteModelInManagedService(AwsConnectorExecutor.java:141)
	at org.opensearch.ml.engine.algorithms.remote.RemoteConnectorExecutor.preparePayloadAndInvokeRemoteModel(RemoteConnectorExecutor.java:79)
	at org.opensearch.ml.engine.algorithms.remote.RemoteConnectorExecutor.executePredict(RemoteConnectorExecutor.java:49)
	at org.opensearch.ml.engine.algorithms.remote.RemoteModel.predict(RemoteModel.java:56)
	at org.opensearch.ml.task.MLPredictTaskRunner.lambda$predict$5(MLPredictTaskRunner.java:219)
	at org.opensearch.ml.model.MLModelManager.trackPredictDuration(MLModelManager.java:1170)
	at org.opensearch.ml.task.MLPredictTaskRunner.predict(MLPredictTaskRunner.java:219)
	at org.opensearch.ml.task.MLPredictTaskRunner.lambda$executeTask$4(MLPredictTaskRunner.java:194)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:858)
	at __PATH__(ThreadPoolExecutor.java:1136)
	at __PATH__(ThreadPoolExecutor.java:635)
	at __PATH__(Thread.java:833)

upgraded to 2.11 (newest version on AWS) and got the connector and model to work, however query does not work, nor does trying to use a pipeline to index a document. Both produce:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Invalid JSON in payload"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Invalid JSON in payload"
  },
  "status": 400
}

With no errors in the logs. It doesn’t seem to get the the inference at all, like the connector fails, but there is nothing in the logs as to why… Looking at the code, I don’t think it ever worked until 2.12 (not sure if it even works there as I can’t deploy 2.12 to test)

Could you please write in details how can I reproduce this issue in my end?

I am struggling with SageMaker connector too with some errors common to the ones of @jtrollin.
I have this model (dangvantuan/sentence-camembert-large · Hugging Face) hosted on an endpoint.

The model accepts the following JSON input :
{ "inputs": ["Hello world"] }

The model returns the following JSON output ((3-dimensional array of 1024 items))
[[[[...], [...], [...]]]]

Performing predict request works well

POST /_plugins/_ml/models/xaXuCo8B_2CaR-HdFLr4/_predict
{
  "parameters": {
    "input": "Hello world"
  }
}

and returns this :

{
  "inference_results": [
    {
      "output": [
        {
          "name": "response",
          "dataAsMap": {
            "response": [
              [
                [
                  [...],
                  [...],
                  [...]
                ]
              ]
            ]
          }
        }
      ],
      "status_code": 200
    }
  ]
}

I have tried all the following connector configurations without success :

  1. Configuration : No pre_process_function | No post_process_function :
    Response : illegal_argument_exception → Invalid JSON in payload

  2. Configuration : No pre_process_function | post_process_function (default) :
    Response : array_store_exception → array_store_exception: arraycopy: element type mismatch: can not cast one of the elements of java.lang.Object to the type of the destination array, java.lang.Number

  3. Configuration : No pre_process_function | post_process_function (custom)
    Response : illegal_argument_exception → Invalid JSON in payload

  4. Configuration : pre_process_function (default) | No post_process_function
    Response : illegal_argument_exception → Invalid JSON in payload

  5. Configuration : pre_process_function (custom) | No post_process_function
    Response : illegal_state_exception → failed while calling model, check error log for details

  6. Configuration : pre_process_function (custom) | post_process_function (custom)
    Response : illegal_state_exception → failed while calling model, check error log for details

  7. Configuration : pre_process_function (default) | post_process_function (default)
    Response : array_store_exception → array_store_exception: arraycopy: element type mismatch: can not cast one of the elements of java.lang.Object to the type of the destination array, java.lang.Number

  8. Configuration : pre_process_function (default) | post_process_function (custom)
    Response : illegal_argument_exception → Invalid JSON in payload

  9. Configuration : pre_process_function (custom) | post_process_function (default)
    Response : array_store_exception → array_store_exception: arraycopy: element type mismatch: can not cast one of the elements of java.lang.Object to the type of the destination array, java.lang.Number

Legend

  • pre_process_function (default)
    "pre_process_function": "connector.pre_process.default.embedding"
  • post_process_function (default)
    "post_process_function": "connector.post_process.default.embedding"
  • pre_process_function (custom)
"pre_process_function": """
    StringBuilder builder = new StringBuilder();
    builder.append("\"");
    String first = params.text_docs[0];
    builder.append(first);
    builder.append("\"");
    def parameters = "{" +"\"input\":" + builder + "}";
    return  "{" +"\"parameters\":" + parameters + "}";"""
  • post_process_function (custom)
"post_process_function": """
      def name = "sentence_embedding";
      def dataType = "FLOAT32";
      if (params.inference_results == null || params.inference_results.length == 0) {
        return params.message;
      }
      def shape = [params.inference_results[0].output[0].dataAsMap.response[0][0][0][0].length];
      def json = "{" +
                 "\"name\":\"" + name + "\"," +
                 "\"data_type\":\"" + dataType + "\"," +
                 "\"shape\":" + shape + "," +
                 "\"data\":" + params.inference_results[0].output[0].dataAsMap.response[0][0][0][0] +
                 "}";
      return json;
    """

What version of OpenSearch are you using? I am not sure it it supports the return being an array of arrays until 2.13 when document chunking was added.

So for example you return data should look like
[[xx.x,xx.x,xx.x,xx.x]] where it is a single array of float32 values that make up a single vector.

I’m on 2.11 the latest version available on AWS.
Is it possible to modify SageMaker model output ?

I would change your model to return a single vector then, that should work. my model returns:

[[0.22301740944385529, -0.247028186917305, -0.03395825996994972, 0.0799521654844284, ...]]

and my inference code returns that as a float32 and it all works.

2 Likes

Yes … returning of [ [ x,x,x,]] will work … you can use the sample from the blog … and change the code to return just embeddings Deploy BGE Embedding Models via AWS Sagemaker | by Dominik Müller | Medium

POST /_plugins/_ml/connectors/_create
{
“name”: “use-case-default-parameters”,
“description”: “description”,
“version”: 1,
“protocol”: “aws_sigv4”,
“credential”: {
“roleArn”: “arn:aws:iam::XXXXXXX:role/XXXX”
},
“parameters”: {
“region”: “us-east-1”,
“service_name”: “sagemaker”
},
“actions”: [
{
“action_type”: “predict”,
“method”: “POST”,
“headers”: {
“content-type”: “application/json”
},
“url”: “https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/bge-base-en/invocations”,
“request_body”: “”“{ “inputs”: “${parameters.inputText}” }”“”,
“pre_process_function”: “”"
StringBuilder builder = new StringBuilder();
builder.append(“"”);
String first = params.text_docs[0];
builder.append(first);
builder.append(“"”);
def parameters = “{” + “"inputText":” + builder + “}”;
return “{” +“"parameters":” + parameters + “}”;“”",
“post_process_function”: “connector.post_process.default.embedding”
}
]
}