cache dictionary layout
The cached dictionary layout type is stores the dictionary in a cache that has a fixed number of cells.
These cells contain frequently used elements.
The dictionary key has the UInt64 type.
When searching for a dictionary, the cache is searched first. For each block of data, all keys that are not found in the cache or are outdated are requested from the source using SELECT attrs... FROM db.table WHERE id IN (k1, k2, ...). The received data is then written to the cache.
If keys are not found in dictionary, then update cache task is created and added into update queue. Update queue properties can be controlled with settings max_update_queue_size, update_queue_push_timeout_milliseconds, query_wait_timeout_milliseconds, max_threads_for_updates.
For cache dictionaries, the expiration lifetime of data in the cache can be set. If more time than lifetime has passed since loading the data in a cell, the cell's value is not used and key becomes expired. The key is re-requested the next time it needs to be used. This behaviour can be configured with setting allow_read_expired_keys.
This is the least effective of all the ways to store dictionaries. The speed of the cache depends strongly on correct settings and the usage scenario. A cache type dictionary performs well only when the hit rates are high enough (recommended 99% and higher). You can view the average hit rate in the system.dictionaries table.
If setting allow_read_expired_keys is set to 1, by default 0. Then dictionary can support asynchronous updates. If a client requests keys and all of them are in cache, but some of them are expired, then dictionary will return expired keys for a client and request them asynchronously from the source.
To improve cache performance, use a subquery with LIMIT, and call the function with the dictionary externally.
All types of sources are supported.
Example of settings:
- DDL
- Configuration file
Set a large enough cache size. You need to experiment to select the number of cells:
- Set some value.
- Run queries until the cache is completely full.
- Assess memory consumption using the
system.dictionariestable. - Increase or decrease the number of cells until the required memory consumption is reached.
ClickHouse is not recommended as a source for this layout. Dictionary lookups require random point reads, which are not the access pattern ClickHouse is optimized for.