How TimescaleDB compresses time-series data

(roszigit.com)

45 points | by lkanwoqwp 1 hour ago

2 comments

  • gopalv 36 minutes ago
    > What does compression do to query performance?

    That section is the most relevant whenever compression in a DB is discussed.

    The purpose of a database is to find, aggregate or update data - storage is where the trade-off gets expressed. There are no silver bullets here.

    Any method of compression which speeds up either filter rejection or scan rate is better than something that only trades off IO for CPU usage.

    For example, dictionary encoding can be slower to read (because you decompress the whole dictionary and not just the skip read after filter), but not if you can squeeze out an IN clause by turning string comparisons into O(1) dictionary followed by a simple integer filter. Remember, this can be arbitrarily complex (Druid is a great example of this) and then the bitmaps can be used because the dictionary index will be a dense 0-N.

    Even better if that can feed a deterministic operation like UPPER() so that you do it over the dictionary hits once, instead of each row. You can even use it over the same hash slot, instead of another dictionary collision check or hash computation.

    If anyone is looking at JSONB compression, go take a long look at the Variant encoding proposals from Databricks/Snowflake for Iceberg[1].

    Turning a single column "payload" JSONB field into chunks which are columnarized and strictly typed allows you to do all the tricks mentioned here, but on loosely typed data but chunk by chunk.

    [1] - https://github.com/apache/parquet-format/blob/master/Variant...

  • blackoil 59 minutes ago
    Gorilla by Facebook had this. Value is stored as delta and time as delta of delta.
    • f311a 43 minutes ago
      It's used in ClickHouse as well. CH supports all known compression algos and they are documented pretty well.
    • lokar 48 minutes ago
      They say they are using “gorilla compression “

      I’m still amazed every time I go back and read how the compression for floating point values works.