clickhouse cluster setup,Database schema design

Table of Contents

In the previous post we discussed about basic background of clickhouse sharding and replication process, in this blog post I will discuss in detail about designing and running queries against the cluster.

Cluster Setup

Let us build a 3(Shard) x 2(Replicas) = 6 Node Clickhouse cluster .The logical topology diagram is as follows.

We will use ReplicatedMergeTree & Distributed table to setup our table.

The above configuration creates 6 (clickHouse)+1 (Zookeeper) cluster

Installation

In order to setup clickhouse cluster as a first step we need to install clickhouse on all nodes in the cluster, I am going to install the following in all nodes

ClickHouse client version 20.3.8.53 (official build).

ClickHouse server version 20.3.8 revision 54433.

The server OS version as

NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"

Modify configuration

Clickhouse cluster is mainly dependent on configuration files. It involves three parts 1.remote_servers, 2.zookeeper, 3.macros, remote_servers and zookeeper of all nodes are the same, but different is macros. Each node modifies the values of shard and replica according to its role. Lets first examine config.xml located @ ‘/etc/clickhouse-server’

config.xml

<?xml version="1.0"?>
<yandex>
    <logger>
        <!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
        <level>trace</level>
        <log>/var/log/clickhouse-server/clickhouse-server.log</log>
        <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
        <size>1000M</size>
        <count>10</count>
        <!-- <console>1</console> --> <!-- Default behavior is autodetection (log to console if not daemon mode and is tty) -->
    </logger>
    <!--display_name>production</display_name--> <!-- It is the name that will be shown in the client -->
    <http_port>8123</http_port>
    <tcp_port>9000</tcp_port>

    <!-- For HTTPS and SSL over native protocol. -->
    <!--
    <https_port>8443</https_port>
    <tcp_port_secure>9440</tcp_port_secure>
    -->

    <!-- Used with https_port and tcp_port_secure. Full ssl options list: https://github.com/ClickHouse-Extras/poco/blob/master/NetSSL_OpenSSL/include/Poco/Net/SSLManager.h#L71 -->
    <openSSL>
        <server> <!-- Used for https server AND secure tcp port -->
            <!-- openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt -->
            <certificateFile>/etc/clickhouse-server/server.crt</certificateFile>
            <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>
            <!-- openssl dhparam -out /etc/clickhouse-server/dhparam.pem 4096 -->
            <dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile>
            <verificationMode>none</verificationMode>
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
        </server>

        <client> <!-- Used for connecting to https dictionary source -->
            <loadDefaultCAFile>true</loadDefaultCAFile>
            <cacheSessions>true</cacheSessions>
            <disableProtocols>sslv2,sslv3</disableProtocols>
            <preferServerCiphers>true</preferServerCiphers>
            <!-- Use for self-signed: <verificationMode>none</verificationMode> -->
            <invalidCertificateHandler>
                <!-- Use for self-signed: <name>AcceptCertificateHandler</name> -->
                <name>RejectCertificateHandler</name>
            </invalidCertificateHandler>
        </client>
    </openSSL>

    <!-- Default root page on http[s] server. For example load UI from https://tabix.io/ when opening http://localhost:8123 -->
    <!--
    <http_server_default_response><![CDATA[<html ng-app="SMI2"><head><base href="http://ui.tabix.io/"></head><body><div ui-view="" class="content-ui"></div><script src="http://loader.tabix.io/master.js"></script></body></html>]]></http_server_default_response>
    -->

    <!-- Port for communication between replicas. Used for data exchange. -->
    <interserver_http_port>9009</interserver_http_port>

    <!-- Hostname that is used by other replicas to request this server.
         If not specified, than it is determined analoguous to 'hostname -f' command.
         This setting could be used to switch replication to another network interface.
      -->
    <!--
    <interserver_http_host>example.yandex.ru</interserver_http_host>
    -->

    <!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->
    <!-- <listen_host>::</listen_host> -->
    <!-- Same for hosts with disabled ipv6: -->
    <!-- <listen_host>0.0.0.0</listen_host> -->

    <!-- Default values - try listen localhost on ipv4 and ipv6: -->
    <!--
    <listen_host>::1</listen_host>
    <listen_host>127.0.0.1</listen_host>
    -->
    <!-- Don't exit if ipv6 or ipv4 unavailable, but listen_host with this protocol specified -->
    <!-- <listen_try>0</listen_try> -->

    <!-- Allow listen on same address:port -->
    <!-- <listen_reuse_port>0</listen_reuse_port> -->

    <!-- <listen_backlog>64</listen_backlog> -->

    <max_connections>4096</max_connections>
    <keep_alive_timeout>3</keep_alive_timeout>

    <!-- Maximum number of concurrent queries. -->
    <max_concurrent_queries>100</max_concurrent_queries>

    <!-- Set limit on number of open files (default: maximum). This setting makes sense on Mac OS X because getrlimit() fails to retrieve
         correct maximum value. -->
    <!-- <max_open_files>262144</max_open_files> -->

    <!-- Size of cache of uncompressed blocks of data, used in tables of MergeTree family.
         In bytes. Cache is single for server. Memory is allocated only on demand.
         Cache is used when 'use_uncompressed_cache' user setting turned on (off by default).
         Uncompressed cache is advantageous only for very short queries and in rare cases.
      -->
    <uncompressed_cache_size>8589934592</uncompressed_cache_size>

    <!-- Approximate size of mark cache, used in tables of MergeTree family.
         In bytes. Cache is single for server. Memory is allocated only on demand.
         You should not lower this value.
      -->
    <mark_cache_size>5368709120</mark_cache_size>


    <!-- Path to data directory, with trailing slash. -->
    <path>/var/lib/clickhouse/</path>

    <!-- Path to temporary data for processing hard queries. -->
    <tmp_path>/var/lib/clickhouse/tmp/</tmp_path>

    <!-- Directory with user provided files that are accessible by 'file' table function. -->
    <user_files_path>/var/lib/clickhouse/user_files/</user_files_path>

    <!-- Path to configuration file with users, access rights, profiles of settings, quotas. -->
    <users_config>users.xml</users_config>

    <!-- Default profile of settings. -->
    <default_profile>default</default_profile>

    <!-- System profile of settings. This settings are used by internal processes (Buffer storage, Distibuted DDL worker and so on). -->
    <!-- <system_profile>default</system_profile> -->

    <!-- Default database. -->
    <default_database>default</default_database>

    <!-- Server time zone could be set here.

         Time zone is used when converting between String and DateTime types,
          when printing DateTime in text formats and parsing DateTime from text,
          it is used in date and time related functions, if specific time zone was not passed as an argument.

         Time zone is specified as identifier from IANA time zone database, like UTC or Africa/Abidjan.
         If not specified, system time zone at server startup is used.

         Please note, that server could display time zone alias instead of specified name.
         Example: W-SU is an alias for Europe/Moscow and Zulu is an alias for UTC.
    -->
    <!-- <timezone>Europe/Moscow</timezone> -->

    <!-- You can specify umask here (see "man umask"). Server will apply it on startup.
         Number is always parsed as octal. Default umask is 027 (other users cannot read logs, data files, etc; group can only read).
    -->
    <!-- <umask>022</umask> -->

    <!-- Configuration of clusters that could be used in Distributed tables.
         https://clickhouse.yandex/docs/en/table_engines/distributed/
      -->
    <remote_servers incl="clickhouse_remote_servers" >
        <!-- Test only shard config for testing distributed storage -->
        <test_shard_localhost>
            <shard>
                <replica>
                    <host>localhost</host>
                    <port>9000</port>
                </replica>
            </shard>
        </test_shard_localhost>
        <test_shard_localhost_secure>
            <shard>
                <replica>
                    <host>localhost</host>
                    <port>9440</port>
                    <secure>1</secure>
                </replica>
            </shard>
        </test_shard_localhost_secure>
    </remote_servers>


    <!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
         By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
         Values for substitutions are specified in /yandex/name_of_substitution elements in that file.
      -->
    <include_from>/etc/clickhouse-server/metrika.xml</include_from>
    <!-- ZooKeeper is used to store metadata about replicas, when using Replicated tables.
         Optional. If you don't use replicated tables, you could omit that.

         See https://clickhouse.yandex/docs/en/table_engines/replication/
      -->
    <zookeeper incl="zookeeper-servers" optional="true" />

    <!-- Substitutions for parameters of replicated tables.
          Optional. If you don't use replicated tables, you could omit that.

         See https://clickhouse.yandex/docs/en/table_engines/replication/#creating-replicated-tables
      -->
    <macros incl="macros" optional="true" />


    <!-- Reloading interval for embedded dictionaries, in seconds. Default: 3600. -->
    <builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>


    <!-- Maximum session timeout, in seconds. Default: 3600. -->
    <max_session_timeout>3600</max_session_timeout>

    <!-- Default session timeout, in seconds. Default: 60. -->
    <default_session_timeout>60</default_session_timeout>

    <!-- Sending data to Graphite for monitoring. Several sections can be defined. -->
    <!--
        interval - send every X second
        root_path - prefix for keys
        hostname_in_path - append hostname to root_path (default = true)
        metrics - send data from table system.metrics
        events - send data from table system.events
        asynchronous_metrics - send data from table system.asynchronous_metrics
    -->
    <!--
    <graphite>
        <host>localhost</host>
        <port>42000</port>
        <timeout>0.1</timeout>
        <interval>60</interval>
        <root_path>one_min</root_path>
        <hostname_in_path>true</hostname_in_path>

        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>true</asynchronous_metrics>
    </graphite>
    <graphite>
        <host>localhost</host>
        <port>42000</port>
        <timeout>0.1</timeout>
        <interval>1</interval>
        <root_path>one_sec</root_path>

        <metrics>true</metrics>
        <events>true</events>
        <asynchronous_metrics>false</asynchronous_metrics>
    </graphite>
    -->


    <!-- Query log. Used only for queries with setting log_queries = 1. -->
    <query_log>
        <!-- What table to insert data. If table is not exist, it will be created.
             When query log structure is changed after system update,
              then old table will be renamed and new table will be created automatically.
        -->
        <database>system</database>
        <table>query_log</table>
        <!--
            PARTITION BY expr https://clickhouse.yandex/docs/en/table_engines/custom_partitioning_key/
            Example:
                event_date
                toMonday(event_date)
                toYYYYMM(event_date)
                toStartOfHour(event_time)
        -->
        <partition_by>toYYYYMM(event_date)</partition_by>
        <!-- Interval of flushing data. -->
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </query_log>


    <!-- Uncomment if use part_log
    <part_log>
        <database>system</database>
        <table>part_log</table>

        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </part_log>
    -->


    <!-- Parameters for embedded dictionaries, used in Yandex.Metrica.
         See https://clickhouse.yandex/docs/en/dicts/internal_dicts/
    -->

    <!-- Path to file with region hierarchy. -->
    <!-- <path_to_regions_hierarchy_file>/opt/geo/regions_hierarchy.txt</path_to_regions_hierarchy_file> -->

    <!-- Path to directory with files containing names of regions -->
    <!-- <path_to_regions_names_files>/opt/geo/</path_to_regions_names_files> -->


    <!-- Configuration of external dictionaries. See:
         https://clickhouse.yandex/docs/en/dicts/external_dicts/
    -->
    <dictionaries_config>*_dictionary.xml</dictionaries_config>

    <!-- Uncomment if you want data to be compressed 30-100% better.
         Don't do that if you just started using ClickHouse.
      -->
    <compression incl="clickhouse_compression">
    <!--
        <!- - Set of variants. Checked in order. Last matching case wins. If nothing matches, lz4 will be used. - ->
        <case>

            <!- - Conditions. All must be satisfied. Some conditions may be omitted. - ->
            <min_part_size>10000000000</min_part_size>        <!- - Min part size in bytes. - ->
            <min_part_size_ratio>0.01</min_part_size_ratio>   <!- - Min size of part relative to whole table size. - ->

            <!- - What compression method to use. - ->
            <method>zstd</method>
        </case>
    -->
    </compression>

    <!-- Allow to execute distributed DDL queries (CREATE, DROP, ALTER, RENAME) on cluster.
         Works only if ZooKeeper is enabled. Comment it if such functionality isn't required. -->
    <distributed_ddl>
        <!-- Path in ZooKeeper to queue with DDL queries -->
        <path>/clickhouse/task_queue/ddl</path>

        <!-- Settings from this profile will be used to execute DDL queries -->
        <!-- <profile>default</profile> -->
    </distributed_ddl>

    <!-- Settings to fine tune MergeTree tables. See documentation in source code, in MergeTreeSettings.h -->
    <!--
    <merge_tree>
        <max_suspicious_broken_parts>5</max_suspicious_broken_parts>
    </merge_tree>
    -->

    <!-- Protection from accidental DROP.
         If size of a MergeTree table is greater than max_table_size_to_drop (in bytes) than table could not be dropped with any DROP query.
         If you want do delete one table and don't want to restart clickhouse-server, you could create special file <clickhouse-path>/flags/force_drop_table and make DROP once.
         By default max_table_size_to_drop is 50GB; max_table_size_to_drop=0 allows to DROP any tables.
         The same for max_partition_size_to_drop.
         Uncomment to disable protection.
    -->
    <!-- <max_table_size_to_drop>0</max_table_size_to_drop> -->
    <!-- <max_partition_size_to_drop>0</max_partition_size_to_drop> -->

    <!-- Example of parameters for GraphiteMergeTree table engine -->
    <graphite_rollup_example>
        <pattern>
            <regexp>click_cost</regexp>
            <function>any</function>
            <retention>
                <age>0</age>
                <precision>3600</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>60</precision>
            </retention>
        </pattern>
        <default>
            <function>max</function>
            <retention>
                <age>0</age>
                <precision>60</precision>
            </retention>
            <retention>
                <age>3600</age>
                <precision>300</precision>
            </retention>
            <retention>
                <age>86400</age>
                <precision>3600</precision>
            </retention>
        </default>
    </graphite_rollup_example>

    <!-- Directory in <clickhouse-path> containing schema files for various input formats.
         The directory will be created if it doesn't exist.
      -->
    <format_schema_path>/var/lib/clickhouse/format_schemas/</format_schema_path>

    <!-- Uncomment to disable ClickHouse internal DNS caching. -->
    <!-- <disable_internal_dns_cache>1</disable_internal_dns_cache> -->
</yandex>

in the above xml file concenrate on the following

<!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
         By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
         Values for substitutions are specified in /yandex/name_of_substitution elements in that file.
      -->
    <include_from>/etc/clickhouse-server/metrika.xml</include_from>

The tag <include_from> here we point to a file in which we have the details of remote servers.

metrika.xml

will give the location of remote servers and Zookeeper server

<yandex>
	<clickhouse_remote_servers>
		<cluster_1>  <!-- cluster name -->
			<shard>  <!-- shard 1 -->
                                <weight>1</weight> <!-- weight assigned to the shard here we assign equal weights to all shards -->
                                <internal_replication>true</internal_replication> <!-- true flag will denote that replication will be taken care by Zookeeper(More Robust) or else distributed table sends INSERTS to all replicas by this there will be no error check if any one insert fails -->
				<replica> <!-- replica 1 -->
					<host>clickhouse-01</host> <!-- host name where the replica 1 resides -->
					<port>9000</port>
				</replica>
				<replica> <!-- replica 2 -->
					<host>clickhouse-06</host> <--- host name where the replica 2 resides -->
					<port>9000</port> <!-- port no -->
				</replica>
			</shard>
			<shard> <!-- shard 2 -->
                                <weight>1</weight>
                                <internal_replication>true</internal_replication>
				<replica>
					<host>clickhouse-02</host>
					<port>9000</port>
				</replica>
				<replica>
					<host>clickhouse-03</host>
					<port>9000</port>
				</replica>
			</shard>
			<shard> <!-- shard 3 -->
                                <weight>1</weight>
                                <internal_replication>true</internal_replication>

				<replica>
					<host>clickhouse-04</host>
					<port>9000</port>
				</replica>
				<replica>
					<host>clickhouse-05</host>
					<port>9000</port>
				</replica>
			</shard>
		</cluster_1>
	</clickhouse_remote_servers>
        <zookeeper-servers> <!-- co-ordinates of zookeeper in cluster -->
            <node index="1">
                <host>clickhouse-zookeeper</host>
                <port>2181</port> <!-- ports from which it can access from -->
            </node>
        </zookeeper-servers>
        <networks>
            <ip>::/0</ip>
        </networks>
        <clickhouse_compression>
            <case>
                <min_part_size>10000000000</min_part_size>
                <min_part_size_ratio>0.01</min_part_size_ratio>
                <method>lz4</method>
            </case>
        </clickhouse_compression>
</yandex>

the above config can be any layout as per business requirement, so the distributed table knows where to issue queries.

Note :- if you make changes to Zookeeper then restart is required.

macros.xml

macros enable consistent DDL over cluster, there are just variables where it changes from server(node) to server(node) in the cluster, and for each server it says ”who I am with in cluster and what i have”

<yandex>
    <macros>
        <replica>clickhouse-01</replica>
        <shard>01</shard>
        <layer>01</layer>
    </macros>
</yandex>

Table Schema

sample data set

warehouse_id|product_id|avl_qty
3|30007|45
3|41392|21
6|96324|85
5|33309|59
1|28871|37

The above data set was created in order to show how sharding and distribution works in clickhouse. I designed data set and schema in order to have each warehouse_id belongs to specific shard.

Data distribution & Replication

In order to create a distributed table we need to do two things:

Configure the Clickhouse nodes to make them aware of all the available nodes in the cluster.(In the config.xml file there is a configuration called remote_servers)
Create a new table using the Distributed engine.

There you can specify a list of clusters containing your shards. Each shard is then defined as a list of replicas, containing the server addresses. The replica definition has the following parameters:

default_database: The database to be used by the Distributedtable engine, if no database is specified.
host: The host of the server where the replica resides.
port: The port of the server where the replica resides.

Inside a shard, one can define as many replicas as they want. The data will be accessed on the first available replica.

Now that we know how to read data from multiple nodes, we want to make sure that our data is also replicated in order to tolerate node failures.

To achieve this we need to:

Point our Clickhouse servers to a Zookeeper instance, which is used by the replication engine.
Use the ReplicatedMergeTree engine when we create the tables.

Create the table holding the data :

CREATE TABLE warehouse_local_replicated ON CLUSTER 'cluster_1'
(
 warehouse_id Int64,
 product_id  Int64,
 avl_qty  Int64
)
Engine=ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/warehouse_local_replicated', '{replica}') 
PARTITION BY warehouse_id
ORDER BY product_id;

in the above syntax i used “ON CLUSTER ‘<cluster-name>'” will create table on the cluster specified (i.e) on all the nodes of the cluster.

CREATE TABLE warehouse_local_replicated ON CLUSTER cluster_1
(
    `warehouse_id` Int64, 
    `product_id` Int64, 
    `avl_qty` Int64
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/warehouse_local_replicated', '{replica}')
PARTITION BY warehouse_id
ORDER BY product_id

┌─host──────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-06 │ 9000 │      0 │       │                   5 │                0 │
│ clickhouse-02 │ 9000 │      0 │       │                   4 │                0 │
│ clickhouse-03 │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse-05 │ 9000 │      0 │       │                   2 │                0 │
│ clickhouse-04 │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse-01 │ 9000 │      0 │       │                   0 │                0 │
└───────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

6 rows in set. Elapsed: 0.971 sec. 

clickhouse-01 :) show tables;

SHOW TABLES

┌─name───────────────────────┐
│ warehouse_local_replicated │
└────────────────────────────┘

1 rows in set. Elapsed: 0.004 sec.


clickhouse-02 :) show tables;

SHOW TABLES

┌─name───────────────────────┐
│ warehouse_local_replicated │
└────────────────────────────┘

1 rows in set. Elapsed: 0.004 sec.

Now Create the distributed table. The distributed table is actually a view, so it needs to have the same schema definition as the shards. Once the view is created, the data is queried on each shard and the results are aggregated on the node where the query was initially called.

clickhouse-02 :) CREATE TABLE warehouse_dist  ON CLUSTER 'cluster_1' (  warehouse_id Int64,  product_id  Int64,  avl_qty  Int64 ) ENGINE = Distributed(cluster_1, default,warehouse_local_replicated, (warehouse_id) );

CREATE TABLE warehouse_dist ON CLUSTER cluster_1
(
    `warehouse_id` Int64, 
    `product_id` Int64, 
    `avl_qty` Int64
)
ENGINE = Distributed(cluster_1, default, warehouse_local_replicated, warehouse_id)

┌─host──────────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
│ clickhouse-06 │ 9000 │      0 │       │                   5 │                0 │
│ clickhouse-03 │ 9000 │      0 │       │                   4 │                0 │
│ clickhouse-02 │ 9000 │      0 │       │                   3 │                0 │
│ clickhouse-05 │ 9000 │      0 │       │                   2 │                0 │
│ clickhouse-04 │ 9000 │      0 │       │                   1 │                0 │
│ clickhouse-01 │ 9000 │      0 │       │                   0 │                0 │
└───────────────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘

6 rows in set. Elapsed: 0.266 sec.

Now we are all set to insert data in to , we have two options 1–> we can insert via distributed table, 2–> Insert directly into replicated table.

root@clickhouse-01:~# ls
warehouse_dataset.csv

root@clickhouse-01:~# clickhouse-client  --format_csv_delimiter="|" --query "INSERT INTO warehouse_dist FORMAT CSVWithNames" < warehouse_dataset.csv

Inserted data from clickhouse-01

clickhouse-02 :) select * from warehouse_dist;

SELECT *
FROM warehouse_dist

┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            4 │      10115 │      92 │
│            4 │      16544 │      96 │
│            4 │      20987 │      24 │
│            4 │      40205 │      57 │
│            4 │      40473 │      12 │
│            4 │      44523 │      97 │
│            4 │      47798 │       7 │
│            4 │      52977 │      86 │
│            4 │      68324 │      31 │
│            4 │      79176 │      63 │
│            4 │      84318 │      31 │
│            4 │      86403 │      70 │
│            4 │      99231 │      38 │
└──────────────┴────────────┴─────────┘
┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            1 │       5862 │      61 │
│            1 │       7491 │      47 │
│            1 │       8667 │      18 │
│            1 │      19014 │      36 │
│            1 │      24434 │      23 │
│            1 │      28871 │      37 │
│            1 │      30007 │      63 │
│            1 │      31606 │      71 │
│            1 │      31718 │      90 │
│            1 │      33197 │      52 │
│            1 │      41761 │      97 │
│            1 │      42652 │      72 │
│            1 │      45720 │      67 │
│            1 │      50276 │      64 │
│            1 │      52264 │      19 │
│            1 │      53810 │      78 │
│            1 │      56265 │      93 │
│            1 │      59045 │      93 │
│            1 │      62200 │      73 │
│            1 │      68246 │      49 │
│            1 │      71215 │      11 │
│            1 │      92358 │       5 │
│            1 │      99644 │      70 │
└──────────────┴────────────┴─────────┘
┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            6 │      10009 │      92 │
│            6 │      11231 │      24 │
│            6 │      28967 │      39 │
│            6 │      38508 │      48 │
│            6 │      57201 │      31 │
│            6 │      60922 │      64 │
│            6 │      80803 │      61 │
│            6 │      82000 │      26 │
│            6 │      91258 │      86 │
│            6 │      91898 │      50 │
│            6 │      96324 │      85 │
└──────────────┴────────────┴─────────┘
┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            3 │       1584 │      14 │
│            3 │       6195 │      13 │
│            3 │      13217 │       3 │
│            3 │      18221 │      91 │
│            3 │      19125 │      72 │
│            3 │      20762 │      56 │
│            3 │      30007 │      45 │
│            3 │      30439 │      10 │
│            3 │      39349 │      45 │
│            3 │      41392 │      21 │
│            3 │      42663 │      59 │
│            3 │      42979 │      30 │
│            3 │      50767 │      35 │
│            3 │      56461 │     100 │
│            3 │      58630 │      71 │
│            3 │      74647 │      94 │
│            3 │      87169 │      74 │
└──────────────┴────────────┴─────────┘
┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            5 │      12183 │      50 │
│            5 │      19408 │      79 │
│            5 │      19459 │       2 │
│            5 │      22234 │      86 │
│            5 │      32849 │      12 │
│            5 │      33309 │      59 │
│            5 │      34581 │      96 │
│            5 │      43149 │      60 │
│            5 │      45009 │      86 │
│            5 │      46159 │      68 │
│            5 │      47629 │      34 │
│            5 │      53782 │      95 │
│            5 │      61013 │      68 │
│            5 │      69121 │      86 │
│            5 │      69341 │      67 │
│            5 │      73624 │      83 │
│            5 │      77165 │      93 │
│            5 │      79078 │      63 │
│            5 │      84342 │      50 │
│            5 │      89709 │      36 │
│            5 │      94128 │      32 │
└──────────────┴────────────┴─────────┘
┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            2 │       2170 │      74 │
│            2 │       3887 │      31 │
│            2 │      20643 │      28 │
│            2 │      23857 │      49 │
│            2 │      27212 │      96 │
│            2 │      27501 │     100 │
│            2 │      28934 │      60 │
│            2 │      38303 │      45 │
│            2 │      40254 │       2 │
│            2 │      42369 │      72 │
│            2 │      60404 │      21 │
│            2 │      61365 │      88 │
│            2 │      66235 │      17 │
│            2 │      69324 │      26 │
│            2 │      92583 │       2 │
└──────────────┴────────────┴─────────┘

100 rows in set. Elapsed: 0.085 sec.

Verified the inserted data from clickhouse-02, we can confirm that data was sharded based on our sharding key specified while creating the table.If we select from local table we have warehouse_id 4 and warehouse_id 1 one will be shard and another one replica, if we check from server 3 clickhouse-02 we should get the same let us check…

clickhouse-02 :) select * from warehouse_local_replicated;

SELECT *
FROM warehouse_local_replicated

┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            4 │      10115 │      92 │
│            4 │      16544 │      96 │
│            4 │      20987 │      24 │
│            4 │      40205 │      57 │
│            4 │      40473 │      12 │
│            4 │      44523 │      97 │
│            4 │      47798 │       7 │
│            4 │      52977 │      86 │
│            4 │      68324 │      31 │
│            4 │      79176 │      63 │
│            4 │      84318 │      31 │
│            4 │      86403 │      70 │
│            4 │      99231 │      38 │
└──────────────┴────────────┴─────────┘
┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            1 │       5862 │      61 │
│            1 │       7491 │      47 │
│            1 │       8667 │      18 │
│            1 │      19014 │      36 │
│            1 │      24434 │      23 │
│            1 │      28871 │      37 │
│            1 │      30007 │      63 │
│            1 │      31606 │      71 │
│            1 │      31718 │      90 │
│            1 │      33197 │      52 │
│            1 │      41761 │      97 │
│            1 │      42652 │      72 │
│            1 │      45720 │      67 │
│            1 │      50276 │      64 │
│            1 │      52264 │      19 │
│            1 │      53810 │      78 │
│            1 │      56265 │      93 │
│            1 │      59045 │      93 │
│            1 │      62200 │      73 │
│            1 │      68246 │      49 │
│            1 │      71215 │      11 │
│            1 │      92358 │       5 │
│            1 │      99644 │      70 │
└──────────────┴────────────┴─────────┘

36 rows in set. Elapsed: 0.004 sec.

clickhouse-03 :) select * from warehouse_local_replicated;

SELECT *
FROM warehouse_local_replicated

┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            4 │      10115 │      92 │
│            4 │      16544 │      96 │
│            4 │      20987 │      24 │
│            4 │      40205 │      57 │
│            4 │      40473 │      12 │
│            4 │      44523 │      97 │
│            4 │      47798 │       7 │
│            4 │      52977 │      86 │
│            4 │      68324 │      31 │
│            4 │      79176 │      63 │
│            4 │      84318 │      31 │
│            4 │      86403 │      70 │
│            4 │      99231 │      38 │
└──────────────┴────────────┴─────────┘
┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            1 │       5862 │      61 │
│            1 │       7491 │      47 │
│            1 │       8667 │      18 │
│            1 │      19014 │      36 │
│            1 │      24434 │      23 │
│            1 │      28871 │      37 │
│            1 │      30007 │      63 │
│            1 │      31606 │      71 │
│            1 │      31718 │      90 │
│            1 │      33197 │      52 │
│            1 │      41761 │      97 │
│            1 │      42652 │      72 │
│            1 │      45720 │      67 │
│            1 │      50276 │      64 │
│            1 │      52264 │      19 │
│            1 │      53810 │      78 │
│            1 │      56265 │      93 │
│            1 │      59045 │      93 │
│            1 │      62200 │      73 │
│            1 │      68246 │      49 │
│            1 │      71215 │      11 │
│            1 │      92358 │       5 │
│            1 │      99644 │      70 │
└──────────────┴────────────┴─────────┘

36 rows in set. Elapsed: 0.005 sec.

same way we can get for clickhouse-01,clickhouse-06 as we configured in metrika.xml

clickhouse-01 :)  select * from warehouse_local_replicated;

SELECT *
FROM warehouse_local_replicated

┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            3 │       1584 │      14 │
│            3 │       6195 │      13 │
│            3 │      13217 │       3 │
│            3 │      18221 │      91 │
│            3 │      19125 │      72 │
│            3 │      20762 │      56 │
│            3 │      30007 │      45 │
│            3 │      30439 │      10 │
│            3 │      39349 │      45 │
│            3 │      41392 │      21 │
│            3 │      42663 │      59 │
│            3 │      42979 │      30 │
│            3 │      50767 │      35 │
│            3 │      56461 │     100 │
│            3 │      58630 │      71 │
│            3 │      74647 │      94 │
│            3 │      87169 │      74 │
└──────────────┴────────────┴─────────┘
┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            6 │      10009 │      92 │
│            6 │      11231 │      24 │
│            6 │      28967 │      39 │
│            6 │      38508 │      48 │
│            6 │      57201 │      31 │
│            6 │      60922 │      64 │
│            6 │      80803 │      61 │
│            6 │      82000 │      26 │
│            6 │      91258 │      86 │
│            6 │      91898 │      50 │
│            6 │      96324 │      85 │
└──────────────┴────────────┴─────────┘

28 rows in set. Elapsed: 0.004 sec. 

clickhouse-06 :) select * from warehouse_local_replicated;

SELECT *
FROM warehouse_local_replicated

┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            6 │      10009 │      92 │
│            6 │      11231 │      24 │
│            6 │      28967 │      39 │
│            6 │      38508 │      48 │
│            6 │      57201 │      31 │
│            6 │      60922 │      64 │
│            6 │      80803 │      61 │
│            6 │      82000 │      26 │
│            6 │      91258 │      86 │
│            6 │      91898 │      50 │
│            6 │      96324 │      85 │
└──────────────┴────────────┴─────────┘
┌─warehouse_id─┬─product_id─┬─avl_qty─┐
│            3 │       1584 │      14 │
│            3 │       6195 │      13 │
│            3 │      13217 │       3 │
│            3 │      18221 │      91 │
│            3 │      19125 │      72 │
│            3 │      20762 │      56 │
│            3 │      30007 │      45 │
│            3 │      30439 │      10 │
│            3 │      39349 │      45 │
│            3 │      41392 │      21 │
│            3 │      42663 │      59 │
│            3 │      42979 │      30 │
│            3 │      50767 │      35 │
│            3 │      56461 │     100 │
│            3 │      58630 │      71 │
│            3 │      74647 │      94 │
│            3 │      87169 │      74 │
└──────────────┴────────────┴─────────┘

28 rows in set. Elapsed: 0.005 sec.

Conclusion

Through this article, I am hoping that it help better understand the sharding and distribution mechanism on ClickHouse. Giving the solution to solve node failure and guarantee all the replicas will have 100% the same data.

Clickhouse Cluster setup and Replication Configuration Part-2