YAMLRegExpTree dictionary source
Not supported in ClickHouse Cloud
The YAMLRegExpTree source loads a regular expression tree from a YAML file on the local filesystem.
It is designed exclusively for use with the regexp_tree dictionary layout
and provides hierarchical regex-to-attribute mappings for pattern-based lookups such as user agent parsing.
Note
The YAMLRegExpTree source is only available in ClickHouse Open Source.
For ClickHouse Cloud, export the dictionary to CSV and load it via a ClickHouse table source instead.
See Using regexp_tree dictionaries in ClickHouse Cloud for details.
Configuration
Setting fields:
| Setting | Description |
|---|---|
PATH | The absolute path to the YAML file containing the regular expression tree. When created via DDL, the file must be in the user_files directory. |
YAML file structure
The YAML file contains a list of regular expression tree nodes. Each node can have attributes and child nodes, forming a hierarchy:
Each node has the following structure:
regexp: The regular expression for this node.- attributes: User-defined dictionary attributes (e.g.
name,version). Attribute values may contain back references to capture groups in the regular expression, written as\1or$1(numbers 1-9). These are replaced with the matched capture group at query time. - child nodes: A list of children, each with its own attributes and optionally more children. The name of the child list is arbitrary (e.g.
versionsabove). String matching proceeds depth-first: if a string matches a node, its children are also checked. Attributes of the deepest matching node take precedence, overriding equally named parent attributes.
Related pages
- regexp_tree dictionary layout — layout configuration, query examples, and matching modes
- dictGet, dictGetAll — functions for querying regexp tree dictionaries