Elasticsearch performance degrades after upgrading from 6.7 to 7.10

We are trying to migrate the elasticsearch from 6.7 to 7.10 by using amazon/opendistro-for-elasticsearch which are 0.9.0 and 1.13.2 and we plan to move to OpenSearch afterwards.

The indices are reindexed in the new elasticsearch cluster.

And we did some comparisons.

simple load test is slow

A simple load test indicates that the same set of queries uses 91ms per request in 6 while it takes 112ms in 7.

An example query looks like this: request.json · GitHub

profile shows slower parts

The profile detail shows a lot of queries take 100% longer in 7 than it in 6.

{
  "type": "PointInSetQuery",
  "description": "brandIds:{59 (...omitted 100 brands) 31389}",
  "time_6_baseline": "1.2531 ms",
  "time_7_baseline": "4.4807 ms"
}

Breakdowns with a same query have huge differences.


The left one comes from ES6 and the right one is ES7.

page cache is used

We know that the elasticsearch 7 has a off-heap changes and it moves terms index out of the heap. so we checked the page cache: sudo lsof +D /mnt/elasticsearch/data/nodes/0/indices/wu47nPk0TEuVBzLo3WEOsQ and it looks like this:

COMMAND   PID          USER   FD   TYPE DEVICE  SIZE/OFF     NODE NAME
java    30779 elasticsearch  mem    REG  259,0 108409303 15728663 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/7/index/_5.cfs
java.........................................................................cfs
java.........................................................................doc
java.........................................................................dvd
java.........................................................................kdd
java.........................................................................kdi
java.........................................................................nvd
java.........................................................................tim
java.........................................................................tip(these files are loaded in mem)
java    30779 elasticsearch  mem    REG  259,0     71445 15729187 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/4/index/_9_Lucene84_0.tip
java    30779 elasticsearch  373r   REG  259,0  33969522 15728698 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/1/index/_d_Lucene84_0.pos
java.........................................................................ckp
java.........................................................................fdt
java.........................................................................fdx
java.........................................................................lock
java.........................................................................pos
java.........................................................................tlog(these files are not loaded in mem)
java    30779 elasticsearch  731w   REG  259,0        88 15728658 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/7/translog/translog.ckp

we tried with different index store settings(change the store type to mmapfs, preload custom files) and they load more files into the mem like elasticsearch 6 but the overall performance is still not as good as 6.

query optimization

Query optimization should improve the performance but it doesn’t cover the gaps between 6 and 7.

We tried using filter to replace must and have some improvement but it doesn’t fill the gap between 6 and 7.

index and cluster brief

The testing index and cluster are:

  • 1 index with 2.6 million products and 9 shards 1 replica. 5.8gb(11.6 gb including replica)
  • we tried different Elasticsearch heap size: 8G and 15G
  • The cluster has 2 nodes. Each node is a i3.xlarge(4cpu 30.5GB SSD)

question

So the question is what causes the performance issue?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.