Special Characters in generated _id field

We have LogStash sending GELF to OpenSearch. It was sending to Elastic previously, and we are trialling OpenSearch.

We are not specifying the _id field in the logstash-opensearch-output plugin, and so I beleive OpenSearch is generating the ID. I’m not sure though. Maybe it’s in the GELF data already?

The ID comes through with an URLencoded 1:1: prefix:

E.g.: id: 1%3A1%3Am7X0UJIBRBPGnq12qh3C

This is causing problems in AWS OpenSearch such as, “Failed to load the anchor document” when I try to View Surrounding Documents. Google tells me it is due to special characters, such as the percent marks.

%3A is an URL encoded colon “:”

Hi @BrockHenry ,

Could you please share your pipelines.yml for Logstash ?

Where did you find id: 1%3A1%3Am7X0UJIBRBPGnq12qh3C ? There are a few id parameters in the ingestion pipeline from GELF to OpenSearch.

GELF id paramater is recommended to define:

In the OpenSearch cluster, ID parameter is unique identifier for a new document. For the logstash-opensearch-output plugin, I think it’s auto generated value:

input {
  gelf {
    host => "172.20.20.81"
    use_tcp => true
    port_tcp => 12201
  }
}

output {
  opensearch {
    hosts => "https://xxxxxxxxxxxxxx.ap-southeast-2.aoss.amazonaws.com:443"
    ecs_compatibility => 'disabled'
    index => 'graylog'
    auth_type => {
      type => 'aws_iam'
      aws_access_key_id => 'xxxxxxxxxxxxxxxxxxx'
      aws_secret_access_key => 'xxxxxxxxxxxxxxxxxx'
      region => 'ap-southeast-2'
      service_name => 'aoss'
    }
    default_server_major_version => 2
    legacy_template => false
  }
}

_id: 1%3A1%3Am7X0UJIBRBPGnq12qh3C is in the opensearch document itself.

image

When I click on “View Single Document”, or “View Surrounding Documents”, the ID is in the URL, but that page fails to open correctly:

Cannot find document
No documents match that ID.

I thought it would be autogenerated by OpenSearch on ingestion, but why would it be generated with this invalid 1:1: prefix?

Thanks for your reply.

Soo… long story short.

We changed the collection from time series to search, and it’s all working correctly.

Thanks for your help.