New to Opensearch here. I was simply wondering if it is possible to explode a string in opensearch the same way I could do in SQL for instance…
The classic example being a document with a variable containing topics codes separated by a delimiter such as
document topic
doc1 ALERT;INFO;MARKETS
Then, I would explode this variable (depending on the SQL language this can be done with a LATERAL JOIN or EXPLODE) to create three rows from that document, one for each topic.
document topic_exploded
doc1 ALERT
doc1 INFO
doc1 MARKETS
Is this something that can be done in Opensearch? I saw the split processor page but I am unsure how would that work and whether it would return the data as expected.
For instance, one classic use-case would be to count the number of documents in the data for each topic. This would be some kind of aggregation but based on a split of a text variable. Can we do that in Opensearch?
You can already do this actually. Here is a sample for VPC Flow Logs
select
metadata.product.version AS product_version,
[metadata.product.name](https://metadata.product.name) AS product_name,
[metadata.product.feature.name](https://metadata.product.feature.name) AS feature_name,
metadata.product.vendor_name AS vendor_name,
metadata.version AS metadata_version,
cloud.account_uid AS account_uid,
cloud.region AS region,
cloud.zone AS zone,
cloud.provider AS provider,
src_endpoint.port AS src_port,
src_endpoint.svc_name AS src_svc_name,
src_endpoint.ip AS src_ip,
src_endpoint.interface_uid AS src_interface_uid,
src_endpoint.vpc_uid AS src_vpc_uid,
src_endpoint.instance_uid AS src_instance_uid,
src_endpoint.subnet_uid AS src_subnet_uid,
dst_endpoint.port AS dst_port,
dst_endpoint.svc_name AS dst_svc_name,
dst_endpoint.ip AS dst_ip,
dst_endpoint.interface_uid AS dst_interface_uid,
dst_endpoint.vpc_uid AS dst_vpc_uid,
dst_endpoint.instance_uid AS dst_instance_uid,
dst_endpoint.subnet_uid AS dst_subnet_uid,
connection_info.protocol_num AS protocol_num,
connection_info.tcp_flags AS tcp_flags,
connection_info.protocol_ver AS protocol_ver,
connection_info.boundary_id AS boundary_id,
connection_info.boundary AS boundary,
connection_info.direction_id AS direction_id,
connection_info.direction AS direction,
traffic.packets AS packets,
traffic.bytes AS bytes,
FROM_UNIXTIME(time) as `@timestamp`,
FROM_UNIXTIME(start_time) as `start_time`,
FROM_UNIXTIME(end_time) as `end_time`,
status_code as status_code
from zero_etl_amazons3.`vpcflow-db`.vpc_flow