We are trying to migrate the elasticsearch from 6.7 to 7.10 by using amazon/opendistro-for-elasticsearch
which are 0.9.0 and 1.13.2 and we plan to move to OpenSearch afterwards.
The indices are reindexed in the new elasticsearch cluster.
And we did some comparisons.
simple load test is slow
A simple load test indicates that the same set of queries uses 91ms per request in 6 while it takes 112ms in 7.
An example query looks like this: request.json · GitHub
profile shows slower parts
The profile detail shows a lot of queries take 100% longer in 7 than it in 6.
{
"type": "PointInSetQuery",
"description": "brandIds:{59 (...omitted 100 brands) 31389}",
"time_6_baseline": "1.2531 ms",
"time_7_baseline": "4.4807 ms"
}
Breakdowns with a same query have huge differences.
The left one comes from ES6 and the right one is ES7.
page cache is used
We know that the elasticsearch 7 has a off-heap changes and it moves terms index out of the heap. so we checked the page cache: sudo lsof +D /mnt/elasticsearch/data/nodes/0/indices/wu47nPk0TEuVBzLo3WEOsQ
and it looks like this:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 30779 elasticsearch mem REG 259,0 108409303 15728663 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/7/index/_5.cfs
java.........................................................................cfs
java.........................................................................doc
java.........................................................................dvd
java.........................................................................kdd
java.........................................................................kdi
java.........................................................................nvd
java.........................................................................tim
java.........................................................................tip(these files are loaded in mem)
java 30779 elasticsearch mem REG 259,0 71445 15729187 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/4/index/_9_Lucene84_0.tip
java 30779 elasticsearch 373r REG 259,0 33969522 15728698 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/1/index/_d_Lucene84_0.pos
java.........................................................................ckp
java.........................................................................fdt
java.........................................................................fdx
java.........................................................................lock
java.........................................................................pos
java.........................................................................tlog(these files are not loaded in mem)
java 30779 elasticsearch 731w REG 259,0 88 15728658 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/7/translog/translog.ckp
we tried with different index store settings(change the store type to mmapfs, preload custom files) and they load more files into the mem like elasticsearch 6 but the overall performance is still not as good as 6.
query optimization
Query optimization should improve the performance but it doesn’t cover the gaps between 6 and 7.
We tried using filter
to replace must
and have some improvement but it doesn’t fill the gap between 6 and 7.
index and cluster brief
The testing index and cluster are:
- 1 index with 2.6 million products and 9 shards 1 replica. 5.8gb(11.6 gb including replica)
- we tried different Elasticsearch heap size: 8G and 15G
- The cluster has 2 nodes. Each node is a i3.xlarge(4cpu 30.5GB SSD)
question
So the question is what causes the performance issue?