I use pages to query DB data and use PK as _idx. I also use helpers.bulk to batch upload data to OpenSearch. The print message shows that I successfully uploaded 971,658 pieces of data, but I can only query 800,000 pieces of data. I would like to ask what might be the reason for this.
That 800K number is dubiously round. I don’t know Python that well (I would probably write what you did in 10x the size) but are you sure the last batch for each thread is actually submitted? I’ve seen a few times that if the batch size doesn’t reach the target (which is usually a multiple of 1000), it doesn’t get submitted.
Another possibility is that OpenSearch didn’t refresh. Can you force a refresh at the end to be sure?