It is possible to explode a string in opensearch?

Hello there!

New to Opensearch here. I was simply wondering if it is possible to explode a string in opensearch the same way I could do in SQL for instance…

The classic example being a document with a variable containing topics codes separated by a delimiter such as

document    topic
doc1        ALERT;INFO;MARKETS

Then, I would explode this variable (depending on the SQL language this can be done with a LATERAL JOIN or EXPLODE) to create three rows from that document, one for each topic.

document    topic_exploded
doc1        ALERT
doc1        INFO
doc1        MARKETS

Is this something that can be done in Opensearch? I saw the split processor page but I am unsure how would that work and whether it would return the data as expected.

Could someone please let me know?
Thanks!

For instance, one classic use-case would be to count the number of documents in the data for each topic. This would be some kind of aggregation but based on a split of a text variable. Can we do that in Opensearch?

Thanks!

You can already do this actually. Here is a sample for VPC Flow Logs

select

metadata.product.version AS product_version,

[metadata.product.name](https://metadata.product.name) AS product_name,

[metadata.product.feature.name](https://metadata.product.feature.name) AS feature_name,

metadata.product.vendor_name AS vendor_name,

metadata.version AS metadata_version,

cloud.account_uid AS account_uid,

cloud.region AS region,

cloud.zone AS zone,

cloud.provider AS provider,

src_endpoint.port AS src_port,

src_endpoint.svc_name AS src_svc_name,

src_endpoint.ip AS src_ip,

src_endpoint.interface_uid AS src_interface_uid,

src_endpoint.vpc_uid AS src_vpc_uid,

src_endpoint.instance_uid AS src_instance_uid,

src_endpoint.subnet_uid AS src_subnet_uid,

dst_endpoint.port AS dst_port,

dst_endpoint.svc_name AS dst_svc_name,

dst_endpoint.ip AS dst_ip,

dst_endpoint.interface_uid AS dst_interface_uid,

dst_endpoint.vpc_uid AS dst_vpc_uid,

dst_endpoint.instance_uid AS dst_instance_uid,

dst_endpoint.subnet_uid AS dst_subnet_uid,

connection_info.protocol_num AS protocol_num,

connection_info.tcp_flags AS tcp_flags,

connection_info.protocol_ver AS protocol_ver,

connection_info.boundary_id AS boundary_id,

connection_info.boundary AS boundary,

connection_info.direction_id AS direction_id,

connection_info.direction AS direction,

traffic.packets AS packets,

traffic.bytes AS bytes,

FROM_UNIXTIME(time) as `@timestamp`,

FROM_UNIXTIME(start_time) as `start_time`,

FROM_UNIXTIME(end_time) as `end_time`,

status_code as status_code

from zero_etl_amazons3.`vpcflow-db`.vpc_flow