Why the actual log size and index size is differing too much?

Hi community,
We created a new index and sent 5,00,000 logs with message size of 1 kb for each log.
So we have sent total log size of around 500mb, but the pri.store.size of that index is showing 980.8mb. Can anyone please explain why pri.store.size is much more than total log size.

And also is there any way to get pri.store.size at document level with each document ID ?

Using GET <host>/_cat/indices/<index-name>?v=true&s=index we get the following response.

health  status    index      uuid  pri rep docs.count docs.deleted store.size pri.store.size
green   open   <index-name>   <id>   4   1   499999           1      1.9gb          980.8mb

Thanks,
Triptesh

In short, overhead because log lines != documents.

When you ingest a log line you’re converting it into a document that Lucene can index as well as storing the data in a binary format. 1kb log lines are pretty small so the proportion of metadata and overhead to actual data is pretty lopsided.

1 Like

Additionally, I am pretty sure that shards merging and lucene segment merging would make the index’s size seem to fluctuate (grows during the merge and then ends up smaller than when it started).

1 Like