How to do search results highlighting sanitization with opensearch-java

Hi there,

we are using OpenSearch to provide search pages in websites. On these search pages, we want to display the results, while making use of the highlight query matches feature.

In the backend, we are using opensearch-java (2.10.3), so we have configured our search request to include highlight definitions for the relevant fields:

new SearchRequest.Builder()
        .highlight(new Highlight.Builder()
                .fields(Map.of(
                        "field1", new HighlightField.Builder()....build(),
                        "field2", new HighlightField.Builder()....build()
                ))
                .build())
        .build();

We have however noticed that this can lead to XSS vulnerabilities. Say, we index content which looks like this:

This is some content with a malicious <script>...</script> tag.

Then, when we search for e.g. “content” using the above SearchRequest, OpenSearch will return a result like:

This is some <em>content</em> with a malicious <script>...</script> tag.

While we can try to sanitize this ourselves, it would of course be much preferable if the result returned by OpenSearch looked like this in the first place:

This is some <em>content</em> with a malicious &lt;script&gt;...&lt;/script&gt; tag.

I was able to find this official article which describes this exact problem and suggests using “snippets” to get sanitized search results, which sounds exactly like what we need.

Unfortunately, though, I can’t figure out how to configure my SearchRequest to achieve this with the opensearch-java client.

Questions:

  • How can I configure result_fields and get my fields in snippet format using the opensearch-java client?

Cheers and thanks a lot,
Sven

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.