Semantic field type not working

Hi all,

I am trying to follow the guide in Semantic - OpenSearch Documentation but it did not work. No errors but no embeddings generated either!

Note that I deployed a model locally and tested it .. working fine! But nothing happens when using semantic field type. It only generates _semantic_info field but no embedding is generated
Appreciate your help.

Thanks

@asfoorial Could you share your index settings?
Which model did you use?

Below is the full index details. Note that it worked for me in Windows and Rocky Linux 8. But did not work on RHEL 8!

Note also that insert the vector values explicitly to it works fine! And neural search against it works fine too. It it just when I index a document it does not generate embeddings automatically!

GET semantic_index3

{
  "semantic_index3": {
    "aliases": {},
    "mappings": {
      "properties": {
        "passage": {
          "type": "semantic",
          "model_id": "aoLdE5gBsdSQ3XOxh-Uv",
          "raw_field_type": "text"
        },
        "passage_semantic_info": {
          "properties": {
            "embedding": {
              "type": "knn_vector",
              "dimension": 384,
              "method": {
                "engine": "faiss",
                "space_type": "l2",
                "name": "hnsw",
                "parameters": {}
              }
            },
            "model": {
              "properties": {
                "id": {
                  "type": "text",
                  "index": false
                },
                "name": {
                  "type": "text",
                  "index": false
                },
                "type": {
                  "type": "text",
                  "index": false
                }
              }
            }
          }
        }
      }
    },
    "settings": {
      "index": {
        "replication": {
          "type": "DOCUMENT"
        },
        "number_of_shards": "1",
        "provided_name": "semantic_index3",
        "knn": "true",
        "creation_date": "1752680057614",
        "number_of_replicas": "1",
        "uuid": "e24cY2zWQXmofKbIhqRCvw",
        "version": {
          "created": "137227827"
        },
        "knn.derived_source": {
          "enabled": "true"
        }
      }
    }
  }
}

Full model details:

{
        "_index": ".plugins-ml-model",
        "_id": "aoLdE5gBsdSQ3XOxh-Uv",
        "_version": 6,
        "_seq_no": 163,
        "_primary_term": 2,
        "_score": 0,
        "_source": {
          "last_deployed_time": 1752680013579,
          "model_version": "9",
          "created_time": 1752679941935,
          "deploy_to_all_nodes": true,
          "model_format": "TORCH_SCRIPT",
          "is_hidden": false,
          "description": "",
          "model_state": "DEPLOYED",
          "planning_worker_node_count": 1,
          "total_chunks": 8,
          "model_content_hash_value": "999d400b542ac0f866fc7462eb44da1f96a723cb3e94fbe5d1aab037b6fb72fa",
          "model_config": {
            "all_config": "",
            "model_type": "bert",
            "embedding_dimension": 384,
            "framework_type": "SENTENCE_TRANSFORMERS",
            "additional_config": {
              "space_type": "l2"
            }
          },
          "auto_redeploy_retry_times": 0,
          "last_updated_time": 1752680013579,
          "last_registered_time": 1752679953746,
          "name": "sentence-transformers/paraphrase-MiniLM-L3-v2",
          "current_worker_node_count": 1,
          "model_group_id": "MdSrE5gB1AiCYvc75tPe",
          "model_content_size_in_bytes": 70401535,
          "planning_worker_nodes": [
            "8Nm29zDURO-Cp5CLikecag"
          ],
          "algorithm": "TEXT_EMBEDDING"
        }
      }

Hello again,

I just discovered that if I change the cluster manager to become ingest (node.roles: [cluster_manager, ingest], then system ingest pipelines work and the embedding gets generated automatically!

Is this an expected behaviour!? And is there a way to make the ml node to act as both ml and ingest while cluster managers stay as cluster managers only?

@asfoorial As per documentation it is.

Do you still have different behaviour on RHEL8 and Windows?

The documentation says that I must have one nodes as ingest node. In fact I do have 2 cluster_manager nodes, two data nodes and two nodes as [ml, ingest] but it does not work with this setup. It only works when I set the last two nodes as [cluster_manager, ml, ingest]!

In other words, system ingest pipelines (in specific) only work in nodes that are both cluster manager and also an ingest node [cluster_manager, ingest]! None-system ingest pipeline work fine in nodes that are [ml, ingest]

Can your try it on your side?

Do you mean it can automatically generate the embedding in Windows and Rocky Linux 8 when nodes are just with [ml, ingest], but it cannot work for RHEL 8?

This doesn’t make sense since the code doesn’t check the operation system to decide the ingestion logic. It only requires at least one node with the ingest role and simply try to use an ingest node to do the ingestion work based on this code.

Is it possible that somehow your cluster were not able to find your two ingest nodes so it could not generate the embedding? I think we can verify it by removing the ingest role for your cluster_manager nodes and try indexing. Then add the cluster_manager role to your two ingest nodes and try indexing again. If adding the cluster_manager role to the ingest nodes doesn’t help then it can be a node connection issue.

Adding cluster_manager role to the ingest node made it work! I won’t be using the auto-ingest feature anyway since it recomputes embeddings upon updating any field! It is too costy.

Then it’s really weird that we behave differently on different operation system. We plan to support reusing the exist embedding for the semantic field in the OpenSearch 3.2 which you can keep an eye on it.

I think I found the root cause and we can fix it in OpenSearch 3.2. Check this.

2 Likes