▲ | valyala 2 months ago | |
> The mistake many teams make is to worry about storage but not querying The amounts of logs, wide events and traces, which must be stored and queried, is frequently measured in hundreds of terabytes and petabytes. A petabyte of data on S3 costs $20000/month. Storage costs usually exceed compute costs on such a scale. So it is important to efficiently compress the ingested observability data, so it occupies less disk space and saves storage costs. Efficient storage schemes help improving the performance for heavy queries, which need to process hundreds of terabytes of logs. For example, if you need to calculate the 95th percentile of request duration among hundreds of billions of nginx logs, the database needs to read all these logs. The query performance in this case is limited by storage read bandwidth, since these amounts of logs do not fit RAM, so the processed logs cannot be cached there (either by the database itself or by the Operating System page cache). Let's estimate the time needed for processing 100TB of logs on a storage with 10GB/s read bandwidth (high-performance S3 and/or SSD): 100TB/10GB/s = 10000 seconds = 2 hours 47 minutes. Such query performance is unacceptable in most cases. How to optimize the query performance in this case? 1. Compress the data stored on disk. Typical logs are compressed by 10x and more, so they occupy 10x less storage space than the actual size of the stored logs. This helps improving heavy query performance in the case above by 10x, to 1000 seconds or around 17 minutes. 2. Use column-oriented storage, e.g. store data per every column separately. This usually improves compression rate, since the data from a single column usually has lower randomness comparing to the per-log-entry data. This also helps improving heavy query performance a lot, since the database can read only the data for the requested columns, while skipping the data for the rest of columns. Suppose, the request duration is stored in an uint64 fields with nanosecond precision. Then 100 billions of request durations can be stored in 100 billions * 8 = 800GB. 800GB can be read in 800GB / 10GB/s = 80 seconds on the storage with 10 GB/s read bandwidth. The query duration can be reduced even more if the column data is compressed. There are other options, which help optimizing the query performance over petabytes of logs (wide events): - Partitioning the data by time (for example, storing it into per-day partitions). This helps improving performance for queries with time range filters by efficiently skipping the partitions outside the requested time range (the majority of practical queries over logs / wide events). - Using bloom filters for skipping logs without the given rarely seen words / phrases (such as trace_id, user_id, ip, etc). See https://itnext.io/how-do-open-source-solutions-for-logs-work... for details. - Using min/max indexes for skipping logs without the given numeric values. All these techniques are implemented in VictoriaLogs, so it achieves high performance even in a single-node setup when running on your laptop / Raspberry PI. See https://docs.victoriametrics.com/victorialogs/ |