The content of this page has been automatically translated by AI. If you encounter any problems while reading, you can view the corresponding content in Chinese.

Compression Algorithm Optimization

Last updated: 2024-10-25 09:23:31

Background

Lucene currently supports two compression algorithms for storing document fields data:
LZ4
Deflate
LZ4 has a higher compression and decompression speed, while Deflate has a higher compression ratio. They have obvious differences in performance and compression ratio. Based on these two existing compression algorithm, you cannot get a good balance between compression ratio and performance. Lucene uses LZ4 by default.

Optimized Scheme

ES integrates the industry-leading advanced compression algorithm Zstandard (ZSTD) to improve the compression ratio while reducing the performance loss.

Strengths of Zstandard compression algorithm

The Zstandard compression algorithm can be said to combine the advantages of both LZ4 and Deflate: It is on par with LZ4 in terms of performance (in tests on log data, it was found that the Zstandard algorithm is slightly better than LZ4), but its compression ratio is slightly lower than that of Deflate.
The following are the comparison results of the three compression algorithms:
Compression Algorithm
Load Time (1 Shard)
Load Time (5 Shards)
Fields(*fdt) File Size
Total Index Size
LZ4
1143769ms
420447ms
4.15 GB
6.3 GB
Deflate
1270408ms
448738ms
2.56 GB
4.7 GB
Zstandard(16K Chunk)
1109414ms
415256ms
2.93 GB
5.1 GB
Zstandard(32K Chunk)
1088959ms
406661ms
2.67 GB
4.8 GB
Note
1. Test data: based on a typical log application.
2. Test method: based on Elasticsearch REST High Level Client API.

Product Use

Based on REST High Level Client API

When creating an Index, add the "index.codec" configuration item to the CreateIndexRequest and set the value to "zstandard":
CreateIndexRequest createRequest = new CreateIndexRequest(indexName);
createRequest.settings(Settings.builder()
.put("index.number_of_shards", shards)
.put("index.number_of_replicas", replicas)
.put("index.codec", "zstandard")
);

Based on HTTP request

Similarly, add the "index.codec" configuration item in the settings and set the value to "zstandard":
PUT /newIndex
{
"settings": {
"index.codec": "zstandard",
"index.number_of_shards": 1
}
}

Optimization Effect

ZSTD has a 35% higher row storage compression ratio than LZ4 and a performance comparable to that of LZ4.

Supported Editions

6.8.2,7.5.1,7.10.1,7.14.2