Hello,
I am getting duplicate records from Elastic search, although I have unique records in database from where I am performing indexation. Please suggest how to resolve this issue.
Hello,
I am getting duplicate records from Elastic search, although I have unique records in database from where I am performing indexation. Please suggest how to resolve this issue.
What are your queries? Are you using a library or direct HTTP requests?
Are you getting duplicate records when using one on the scroll /search after/pagination apis?
And are you using Opensearch or Elasticsearch? Which version?
In my experience, getting duplicate records is either because you have some duplicates (check if unique count=count on the field you think is unique on Kibana, or use the relevant aggs), or using a scroll api with sorting on a non unique field.
Hello @hagayg,
I am using GET queries with direct HTTP requests.
Yes, when I am using pagination that time I am getting duplicate records.
Using Elasticsearch version 7.9.3 .
I have few aggregate function with some fields but still issue is persist. Will check according to your suggestions.
Many thanks for replying.
Hello,
Anyone has idea that how to resolve this duplicate issue arriving due to pagination. When I am hitting request in one go it is giving me unique results but on pagination it is duplicating.
Please suggest solution.
Thanks,
Tejashri Rokade
Hey, try paginating over a sorted unique field (you can sort by _id as an example), this should stop the duplicate results.
Opensearch/Elasticsearch both hold no context of the previous request when you use pagination, and therefore have a hard time knowing where to start from when using a non unique field as a sorting mechanism with pagination.
Alternatively, you could use the scroll API which does hold context, but I highly recommend against it due to the additional overhead incurred.
Your best bet if you need deep pagination (over 10k results) is to actually use the search_after api.
If you don’t wish to incur any extra development overhead on the Opensearch side, try filtering out known documents by their _id on the client side instead.