前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >ClickHouse 读"写"分离

ClickHouse 读"写"分离

原创
作者头像
jasong
发布2022-03-16 20:38:42
1.8K0
发布2022-03-16 20:38:42
举报
文章被收录于专栏:ClickHouseClickHouse

写隔离 ,因为ClickHouse 都是本地表写入,所有用户通过system.clusters 来设计不同的写入规则即可

所以我们这里介绍可以匹配的写得查询隔离

代码语言:javascript
复制
Row 1:
──────
cluster:                 test_cluster_two_shards
shard_num:               1
shard_weight:            1
replica_num:             1
host_name:               127.0.0.1
host_address:            127.0.0.1
port:                    9000
is_local:                1
user:                    default
default_database:
errors_count:            0
slowdowns_count:         0
estimated_recovery_time: 0
​
Row 2:
──────
cluster:                 test_cluster_two_shards
shard_num:               2
shard_weight:            1
replica_num:             1
host_name:               127.0.0.2
host_address:            127.0.0.2
port:                    9000
is_local:                0
user:                    default
default_database:
errors_count:            0
slowdowns_count:         0
estimated_recovery_time: 0

一 基础知识

1 load_balancing

Specifies the algorithm of replicas selection that is used for distributed query processing.

ClickHouse supports the following algorithms of choosing replicas:

See also:

Random (by Default)
代码语言:javascript
复制
load_balancing = random

The number of errors is counted for each replica. The query is sent to the replica with the fewest errors, and if there are several of these, to anyone of them. Disadvantages: Server proximity is not accounted for; if the replicas have different data, you will also get different data.

Nearest Hostname
代码语言:javascript
复制
load_balancing = nearest_hostname

The number of errors is counted for each replica. Every 5 minutes, the number of errors is integrally divided by 2. Thus, the number of errors is calculated for a recent time with exponential smoothing. If there is one replica with a minimal number of errors (i.e. errors occurred recently on the other replicas), the query is sent to it. If there are multiple replicas with the same minimal number of errors, the query is sent to the replica with a hostname that is most similar to the server’s hostname in the config file (for the number of different characters in identical positions, up to the minimum length of both hostnames).

For instance, example01-01-1 and example01-01-2 are different in one position, while example01-01-1 and example01-02-2 differ in two places. This method might seem primitive, but it does not require external data about network topology, and it does not compare IP addresses, which would be complicated for our IPv6 addresses.

Thus, if there are equivalent replicas, the closest one by name is preferred. We can also assume that when sending a query to the same server, in the absence of failures, a distributed query will also go to the same servers. So even if different data is placed on the replicas, the query will return mostly the same results.

In Order
代码语言:javascript
复制
load_balancing = in_order

Replicas with the same number of errors are accessed in the same order as they are specified in the configuration. This method is appropriate when you know exactly which replica is preferable.

First or Random
代码语言:javascript
复制
load_balancing = first_or_random

This algorithm chooses the first replica in the set or a random replica if the first is unavailable. It’s effective in cross-replication topology setups, but useless in other configurations.

The first_or_random algorithm solves the problem of the in_order algorithm. With in_order, if one replica goes down, the next one gets a double load while the remaining replicas handle the usual amount of traffic. When using the first_or_random algorithm, the load is evenly distributed among replicas that are still available.

It's possible to explicitly define what the first replica is by using the setting load_balancing_first_offset. This gives more control to rebalance query workloads among replicas.

Round Robin
代码语言:javascript
复制
load_balancing = round_robin

This algorithm uses a round-robin policy across replicas with the same number of errors (only the queries with round_robin policy is accounted).

2 distributed_replica_max_ignored_errors

  • Type: unsigned int
  • Default value: 0

The number of errors that will be ignored while choosing replicas (according to load_balancing algorithm).

See also:

3 priority

代码语言:javascript
复制
        <shard>
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
            <internal_replication>false</internal_replication>
            <replica>
                <!-- Optional. Priority of the replica for load balancing (see also load_balancing setting). Default: 1 (less value has more priority). -->
                <priority>1</priority>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>

4 load_balancing in users.xml

代码语言:javascript
复制
<?xml version="1.0"?>
<clickhouse>
    <profiles replace="replace">
        <default>
            <load_balancing>first_or_random</load_balancing>
        </default>
    </profiles>
</clickhouse>

5 load_balancing in metrika.xml

代码语言:javascript
复制
        <shard>
            <weight>1</weight>
            <internal_replication>false</internal_replication>
            <replica_list load_balancing="first_or_random">
            <replica>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>

6 vcluster

代码语言:javascript
复制
<remote_servers>
    <cluster1>
    </cluster1>
    <cluster2>
    </cluster2>
</remote_servers>

7 Replicated + mutli cdw cluster + auxiliary_zookeepers

代码语言:javascript
复制
CREATE TABLE table_name
(
    EventDate DateTime,
    CounterID UInt32,
    UserID UInt32
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/table_name', '{replica}')
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
代码语言:javascript
复制
<auxiliary_zookeepers>
    <zookeeper2>
        <node>
            <host>example_2_1</host>
            <port>2181</port>
        </node>
        <node>
            <host>example_2_2</host>
            <port>2181</port>
        </node>
        <node>
            <host>example_2_3</host>
            <port>2181</port>
        </node>
    </zookeeper2>
    <zookeeper3>
        <node>
            <host>example_3_1</host>
            <port>2181</port>
        </node>
    </zookeeper3>
</auxiliary_zookeepers>

二 方案

1 priority

查询弱隔离 调整 不同 replica priority 即可

代码语言:javascript
复制
        <shard>
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
            <internal_replication>false</internal_replication>
            <replica>
                <!-- Optional. Priority of the replica for load balancing (see also load_balancing setting). Default: 1 (less value has more priority). -->
                <priority>1</priority>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>

2 user.xml load_balancing

查询强隔离 可设置到不同 profile 粒度, 设置 first_or_random

代码语言:javascript
复制
<?xml version="1.0"?>
<clickhouse>
    <profiles replace="replace">
        <default>
            <load_balancing>first_or_random</load_balancing>
        </default>
    </profiles>
</clickhouse>

3 metika.xml load_balancing

代码语言:javascript
复制
    <shard>
        <weight>1</weight>
        <internal_replication>false</internal_replication>
        <replica_list load_balancing="first_or_random">
        <replica>
            <host>example01-01-1</host>
            <port>9000</port>
        </replica>
        <replica>
            <host>example01-01-2</host>
            <port>9000</port>
        </replica>
    </shard>

4 vcluster

读集群

写集群

创建Schema集群

代码语言:javascript
复制
<remote_servers>
    <read_cluster1>
         <shard>
            <replica>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
        </shard>
    </read_cluster1>
    <write_cluster1>
        <shard>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>
    </write_cluster1>
    <schema_cluster1>
         <shard>
            <replica>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>
    </schema_cluster1>
</remote_servers>

5 multi cluster

Cluster 1 + Cluster2 + auxiliary_zookeepers

On shard use on zookeeper

代码语言:javascript
复制
cluster1
<remote_servers>
    <write_cluster>
         <shard>
            <replica>
                <host>cluster1-shard1-replica1</host>
                <port>9000</port>
            </replica>
        </shard>
    </write_cluster>
    <cluster>
         <shard>
            <replica>
                <host>cluster1-shard1-replica1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>cluster2-shard1-replica1</host>
                <port>9000</port>
            </replica>
        </shard>
    </cluster>
</remote_servers>
​
​
cluster2 
<remote_servers>
    <write_cluster>
         <shard>
            <replica>
                <host>cluster2-shard1-replica1</host>
                <port>9000</port>
            </replica>
        </shard>
    </write_cluster>
    <cluster>
         <shard>
            <replica>
                <host>cluster1-shard1-replica1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>cluster2-shard1-replica1</host>
                <port>9000</port>
            </replica>
        </shard>
    </cluster>
</remote_servers>

6 其他

以上方案组合拳

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一 基础知识
    • 1 load_balancing
      • 2 distributed_replica_max_ignored_errors¶
        • 3 priority
          • 4 load_balancing in users.xml
            • 5 load_balancing in metrika.xml
              • 6 vcluster
                • 7 Replicated + mutli cdw cluster + auxiliary_zookeepers
                • 二 方案
                  • 1 priority
                    • 2 user.xml load_balancing
                      • 3 metika.xml load_balancing
                        • 4 vcluster
                          • 5 multi cluster
                            • 6 其他
                            相关产品与服务
                            数据库
                            云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
                            领券
                            问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档