Loading from Spark to OpenSearch with IAM auth (message signing)

Hi,

I was wondering if anyone has attempted to load data into an OpenSearch cluster from Spark or other frameworks provided by the elasticsearch-hadoop libraries but using the AWS message signing to leverage IAM authentication? All of the clients we work with that use OpenSearch in AWS are loading data from Apache Spark and this is currently the only thing stopping them from using IAM auth in place of basic/key based auth.

I found a blog post stating that OpenSearch will at some point release their own version of elasticsearch-hadoop at which point hopefully the ability to add dynamic headers to each request will be possible, however it seems Elasticsearch are not interested in adding this functionality since there’s been an open Github issue for over 6 years: Append dynamic custom headers for http requests · Issue #626 · elastic/elasticsearch-hadoop · GitHub

I was wondering if anyone had found a nice way to achieve this via the existing Elasticsearch libraries? It seems to me it would be a little bit tricky since there are several levels of classes you’d need to go down to switch out the existing static header functionality which is provided in src/main/java/org/elasticsearch/hadoop/rest/HeaderProcessor.java.

I was hoping someone had found some nice workaround for this or has any advice.

Is it common to load data using a special service account which uses basic auth and then do all other client requests from applications using IAM, where it’s much easier to add a http callback to the regular ES Java client? Is it even possible to mix IAM and basic auth on a single cluster?

Thanks in advance,

Will

Hey @willbo
Have you found any solution or workaround to this? I’m currently facing the same issue and cannot find a way to make this work :frowning:

I would be very much interested in any update.

Thank you
Ann-Kathrin

Hey @Ann-Kathrin ,

No sorry, at my company we pushed supporting this back in the hope that an open search version of elasticsearch-hadoop would be added at some point in the not-too-distant future. However, I have no info on where that might be and whether it would include a nice interface for IAM auth. I know there’s a some nice java classes in the AWS core sdk for creating the signing tokens - but you’d need to do a bunch of extending classes and overriding constructors in order to be able to programmatically add dynamic headers AFAIK. I may be missing some really neat way of doing it, but this is what I would do for an interim solution if you’re really stuck without this functionality:

// Existing hierarchy
RestRepository > RestClient > NetworkClient > CommonsHttpTransportFactory > CommonsHttpTransport > HeaderProcessor

// New class hierarchy - may not need to be exactly like this
DynamicHeaderRestRepository 
	public void registerDynamicHeaderProcessor()
	> DynamicHeaderRestClient > ... > ... > DynamicHeaderProcessor

The other thing that put me off this signing stuff is that it includes message size and current timestamp so it’s getting changed for literally each bulk request that is sent to elastic when IMO an authentication token would be plenty.

Cheers,

Will