Home Managing data in Clickhouse
Post
Cancel

Managing data in Clickhouse

Compression & TTL

ClickHouse offers several mechanisms to optimize storage and lifecycle management of data. Two key features are column-level compression and TTL policies, which help reduce disk usage and automate data retention or transformation.


Compression

ClickHouse supports multiple compression algorithms that can be applied per column, allowing fine-grained control over storage efficiency and query performance.

Common Compression Modes

Refer to the official documentation for a full list:
Compression Modes | ClickHouse Docs

  • lz4: Fast and efficient with low CPU overhead, but offers a lower compression ratio.
  • lz4hc: High-compression variant of lz4, offering better compression at the cost of slower performance.
  • zstd: Provides higher compression ratios than lz4, but is slower and may impact query performance.

Optimization Tip

If a column has fewer than 10,000 unique values, consider using LowCardinality()

1
LowCardinality(column_name)

This stores the column as a dictionary-encoded structure in memory, reducing storage and improving performance for filtering and grouping operations. It actually stores the string values as integer representations which is alot more compressable.

Measuring Compression Efficiency

You can inspect compression statistics using the following query:

1
2
3
4
5
6
SELECT 
    name,
    formatReadableSize(data_compressed_bytes) AS compressed,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed
FROM system.columns
WHERE table = 'your_table_name'

This helps identify which columns benefit most from compression and where further tuning may be needed.


TTL (Time-To-Live)

TTL rules in ClickHouse allow you to automatically expire, move, or aggregate data based on time conditions. TTL can be applied at the tablepartition, or column level.

Table-Level TTL Example

Automatically delete rows 1 month after their timestamp:

1
2
3
4
CREATE TABLE ...
ENGINE = MergeTree
ORDER BY ...
TTL timestamp_column + INTERVAL 1 MONTH

Moving Data Between Volumes (Non-Cloud Only)

You can use TTL to move older data to a different disk volume (e.g., from hot to cold storage):

1
2
3
4
5
6
CREATE TABLE ...
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY ...
TTL timestamp + INTERVAL 1 MONTH TO VOLUME 'cold_volume',
    timestamp + INTERVAL 12 MONTH

This feature is not supported in ClickHouse Cloud.


Rolling Up Old Data

TTL can also be used to aggregate older data into summaries:

1
2
3
4
5
CREATE TABLE ...
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY ...
TTL timestamp + INTERVAL 12 MONTH GROUP BY sum_x = SUM(x)

This is useful for long-term storage where detailed granularity is no longer needed.


Summary

FeaturePurposeBest Use Case
CompressionReduce disk usageColumns with repetitive or large data
LowCardinalityOptimize low-cardinality columnsCategorical fields with <10k values
TTL DeleteAuto-expire old dataTime-based retention policies
TTL VolumeMove data to cold storageTiered storage (non-cloud)
TTL RollupAggregate old dataLong-term summaries
This post is licensed under CC BY 4.0 by the author.
Contents