TiDB是一款开源的NewSQL数据库,我们看一下官方的描述:
TiDB 是 PingCAP 公司自主设计、研发的开源分布式关系型数据库,是一款同时支持在线事务处理与在线分析处理 (Hybrid Transactional and Analytical Processing, HTAP)的融合型分布式数据库产品,具备水平扩容或者缩容、金融级高可用、实时 HTAP、云原生的分布式数据库、兼容 MySQL 5.7 协议和 MySQL 生态等重要特性。目标是为用户提供一站式 OLTP (Online Transactional Processing)、OLAP (Online Analytical Processing)、HTAP 解决方案。TiDB 适合高可用、强一致要求较高、数据规模较大等各种应用场景。 |
---|
这里面有几个关键点:
基本概念
这里介绍一下TiDB中的几个重要概念:
实验环境
由于我本地资源的限制,我们使用快速部署的方式。
TiDB快速部署的方式有2种:
第一种:使用 TiUP Playground 快速部署本地测试环境
适用场景:利用本地 Mac 或者单机 Linux 环境快速部署 TiDB 集群。可以体验 TiDB 集群的基本架构,以及 TiDB、TiKV、PD、监控等基础组件的运行。
第二种:使用TiUP cluster在单机上模拟生产环境部署步骤
希望用单台Linux服务器,体验TiDB最小的完整拓扑的集群,并模拟生产的部署步骤。
这里我采用第二种方式。
据官方描述,TiDB在CentOS 7.3做过大量的测试,建议在CentOS 7.3以上部署。
本地环境:VMware虚拟机,操作系统CentOS7.6
开始部署
我们按照官方的步骤来安装
1.关闭防火墙
systemctl stop firewalld
service iptables stop
2.下载并安装 TiUP,命令和结果如下
[root@master ~]# curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 8697k 100 8697k 0 0 4316k 0 0:00:02 0:00:02 --:--:-- 4318k
WARN: adding root certificate via internet: https://tiup-mirrors.pingcap.com/root.json
You can revoke this by remove /root/.tiup/bin/7b8e153f2e2d0928.root.json
Set mirror to https://tiup-mirrors.pingcap.com success
Detected shell: bash
Shell profile: /root/.bash_profile
/root/.bash_profile has been modified to add tiup to PATH
open a new terminal or source /root/.bash_profile to use it
Installed path: /root/.tiup/bin/tiup
===============================================
Have a try: tiup playground
===============================================
3.安装 TiUP 的 cluster 组件
首先声明全局的环境变量,不然找不到tiup命令:
source .bash_profile
执行安装cluster命令:
tiup cluster
输出如下:
[root@master ~]# tiup cluster
The component `cluster` is not installed; downloading from repository.
download https://tiup-mirrors.pingcap.com/cluster-v1.3.1-linux-amd64.tar.gz 10.05 MiB / 10.05 MiB 100.00% 13.05 MiB p/s
Starting component `cluster`: /root/.tiup/components/cluster/v1.3.1/tiup-cluster
Deploy a TiDB cluster for production
Usage:
tiup cluster [command]
Available Commands:
check Perform preflight checks for the cluster.
deploy Deploy a cluster for production
start Start a TiDB cluster
stop Stop a TiDB cluster
restart Restart a TiDB cluster
scale-in Scale in a TiDB cluster
scale-out Scale out a TiDB cluster
destroy Destroy a specified cluster
clean (EXPERIMENTAL) Cleanup a specified cluster
upgrade Upgrade a specified TiDB cluster
exec Run shell command on host in the tidb cluster
display Display information of a TiDB cluster
prune Destroy and remove instances that is in tombstone state
list List all clusters
audit Show audit log of cluster operation
import Import an exist TiDB cluster from TiDB-Ansible
edit-config Edit TiDB cluster config.
Will use editor from environment variable `EDITOR`, default use vi
reload Reload a TiDB cluster's config and restart if needed
patch Replace the remote package with a specified package and restart the service
rename Rename the cluster
enable Enable a TiDB cluster automatically at boot
disable Disable starting a TiDB cluster automatically at boot
help Help about any command
Flags:
-h, --help help for tiup
--ssh string (EXPERIMENTAL) The executor type: 'builtin', 'system', 'none'.
--ssh-timeout uint Timeout in seconds to connect host via SSH, ignored for operations that don't need an SSH connection. (default 5)
-v, --version version for tiup
--wait-timeout uint Timeout in seconds to wait for an operation to complete, ignored for operations that don't fit. (default 120)
-y, --yes Skip all confirmations and assumes 'yes'
Use "tiup cluster help [command]" for more information about a command.
4.调大sshd服务的连接数限制
这里需要使用root权限,具体修改/etc/ssh/sshd_config文件下面的参数配置:
MaxSessions 20
改完后重启sshd:
[root@master ~]# service sshd restart
Redirecting to /bin/systemctl restart sshd.service
5.编辑集群配置模板文件
这个文件我们命名为topo.yaml,内容如下:
# # Global variables are applied to all deployments and used as the default value of
# # the deployments if a specific deployment value is missing.
global:
user: "tidb"
ssh_port: 22
deploy_dir: "/tidb-deploy"
data_dir: "/tidb-data"
# # Monitored variables are applied to all the machines.
monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
server_configs:
tidb:
log.slow-threshold: 300
tikv:
readpool.storage.use-unified-pool: false
readpool.coprocessor.use-unified-pool: true
pd:
replication.enable-placement-rules: true
replication.location-labels: ["host"]
tiflash:
logger.level: "info"
pd_servers:
- host: 192.168.59.146
tidb_servers:
- host: 192.168.59.146
tikv_servers:
- host: 192.168.59.146
port: 20160
status_port: 20180
config:
server.labels: { host: "logic-host-1" }
# - host: 192.168.59.146
# port: 20161
# status_port: 20181
# config:
# server.labels: { host: "logic-host-2" }
# - host: 192.168.59.146
# port: 20162
# status_port: 20182
# config:
# server.labels: { host: "logic-host-3" }
tiflash_servers:
- host: 192.168.59.146
这里有2点需要注意:
官方文件的tikv_servers是3个节点,我这里设置成了只有1个节点,原因是本地配置多个节点时只有1个节点能启动成功
6.部署集群
部署集群的命令如下:
tiup cluster deploy <cluster-name> <tidb-version> ./topo.yaml --user root -p
上面的cluster-name是集群名称,tidb-version是指TiDB版本号,可以通过tiup list tidb这个命令来查看,这里使用v3.1.2,集群名称叫mytidb-cluster,命令如下:
tiup cluster deploy mytidb-cluster v3.1.2 ./topo.yaml --user root -p
下面是部署时输出的日志:
[root@master ~]# tiup cluster deploy mytidb-cluster v3.1.2 ./topo.yaml --user root -p
Starting component `cluster`: /root/.tiup/components/cluster/v1.3.1/tiup-cluster deploy mytidb-cluster v3.1.2 ./topo.yaml --user root -p
Please confirm your topology:
Cluster type: tidb
Cluster name: mytidb-cluster
Cluster version: v3.1.2
Type Host Ports OS/Arch Directories
---- ---- ----- ------- -----------
pd 192.168.59.146 2379/2380 linux/x86_64 /tidb-deploy/pd-2379,/tidb-data/pd-2379
tikv 192.168.59.146 20160/20180 linux/x86_64 /tidb-deploy/tikv-20160,/tidb-data/tikv-20160
tidb 192.168.59.146 4000/10080 linux/x86_64 /tidb-deploy/tidb-4000
tiflash 192.168.59.146 9000/8123/3930/20170/20292/8234 linux/x86_64 /tidb-deploy/tiflash-9000,/tidb-data/tiflash-9000
prometheus 192.168.59.146 9090 linux/x86_64 /tidb-deploy/prometheus-9090,/tidb-data/prometheus-9090
grafana 192.168.59.146 3000 linux/x86_64 /tidb-deploy/grafana-3000
Attention:
1. If the topology is not what you expected, check your yaml file.
2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]: y
Input SSH password:
+ Generate SSH keys ... Done
+ Download TiDB components
- Download pd:v3.1.2 (linux/amd64) ... Done
- Download tikv:v3.1.2 (linux/amd64) ... Done
- Download tidb:v3.1.2 (linux/amd64) ... Done
- Download tiflash:v3.1.2 (linux/amd64) ... Done
- Download prometheus:v3.1.2 (linux/amd64) ... Done
- Download grafana:v3.1.2 (linux/amd64) ... Done
- Download node_exporter:v0.17.0 (linux/amd64) ... Done
- Download blackbox_exporter:v0.12.0 (linux/amd64) ... Done
+ Initialize target host environments
- Prepare 192.168.59.146:22 ... Done
+ Copy files
- Copy pd -> 192.168.59.146 ... Done
- Copy tikv -> 192.168.59.146 ... Done
- Copy tidb -> 192.168.59.146 ... Done
- Copy tiflash -> 192.168.59.146 ... Done
- Copy prometheus -> 192.168.59.146 ... Done
- Copy grafana -> 192.168.59.146 ... Done
- Copy node_exporter -> 192.168.59.146 ... Done
- Copy blackbox_exporter -> 192.168.59.146 ... Done
+ Check status
Enabling component pd
Enabling instance pd 192.168.59.146:2379
Enable pd 192.168.59.146:2379 success
Enabling component node_exporter
Enabling component blackbox_exporter
Enabling component tikv
Enabling instance tikv 192.168.59.146:20160
Enable tikv 192.168.59.146:20160 success
Enabling component tidb
Enabling instance tidb 192.168.59.146:4000
Enable tidb 192.168.59.146:4000 success
Enabling component tiflash
Enabling instance tiflash 192.168.59.146:9000
Enable tiflash 192.168.59.146:9000 success
Enabling component prometheus
Enabling instance prometheus 192.168.59.146:9090
Enable prometheus 192.168.59.146:9090 success
Enabling component grafana
Enabling instance grafana 192.168.59.146:3000
Enable grafana 192.168.59.146:3000 success
Cluster `mytidb-cluster` deployed successfully, you can start it with command: `tiup cluster start mytidb-cluster`
7.启动集群
命令如下:
tiup cluster start mytidb-cluster
启动成功日志如下:
[root@master ~]# tiup cluster start mytidb-cluster
Starting component `cluster`: /root/.tiup/components/cluster/v1.3.1/tiup-cluster start mytidb-cluster
Starting cluster mytidb-cluster...
+ [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/mytidb-cluster/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/mytidb-cluster/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [ Serial ] - StartCluster
Starting component pd
Starting instance pd 192.168.59.146:2379
Start pd 192.168.59.146:2379 success
Starting component node_exporter
Starting instance 192.168.59.146
Start 192.168.59.146 success
Starting component blackbox_exporter
Starting instance 192.168.59.146
Start 192.168.59.146 success
Starting component tikv
Starting instance tikv 192.168.59.146:20160
Start tikv 192.168.59.146:20160 success
Starting component tidb
Starting instance tidb 192.168.59.146:4000
Start tidb 192.168.59.146:4000 success
Starting component tiflash
Starting instance tiflash 192.168.59.146:9000
Start tiflash 192.168.59.146:9000 success
Starting component prometheus
Starting instance prometheus 192.168.59.146:9090
Start prometheus 192.168.59.146:9090 success
Starting component grafana
Starting instance grafana 192.168.59.146:3000
Start grafana 192.168.59.146:3000 success
+ [ Serial ] - UpdateTopology: cluster=mytidb-cluster
Started cluster `mytidb-cluster` successfully
8.访问数据库
因为TiDB支持mysql客户端访问,我们使用sqlyog登录TiDB,用户名root,密码空,地址192.168.59.149,端口4000,如下图:
登录成功如下图,左侧我们可以看到TiDB自带的一些表:
9.访问TiDB的Grafana监控
访问地址如下:
http://192.168.59.146:3000/login
初始用户名/密码:admin/admin,登录进去后修改密码,成功后页面如下:
10.dashboard
TiDB v3.x版本没有dashboard,v4.0开始加入,访问地址如下:
http://192.168.59.146:2379/dashboard
11.查看集群列表
命令:tiup cluster list,结果如下:
[root@master /]# tiup cluster list
Starting component `cluster`: /root/.tiup/components/cluster/v1.3.1/tiup-cluster list
Name User Version Path PrivateKey
---- ---- ------- ---- ----------
mytidb-cluster tidb v3.1.2 /root/.tiup/storage/cluster/clusters/mytidb-cluster /root/.tiup/storage/cluster/clusters/mytidb-cluster/ssh/id_rsa
12.查看集群拓扑结构
命令如下:
tiup cluster list
输入命令后,我本地集群的输出如下:
[root@master /]# tiup cluster list
Starting component `cluster`: /root/.tiup/components/cluster/v1.3.1/tiup-cluster list
Name User Version Path PrivateKey
---- ---- ------- ---- ----------
mytidb-cluster tidb v3.1.2 /root/.tiup/storage/cluster/clusters/mytidb-cluster /root/.tiup/storage/cluster/clusters/mytidb-cluster/ssh/id_rsa
You have new mail in /var/spool/mail/root
[root@master /]# tiup cluster display mytidb-cluster
Starting component `cluster`: /root/.tiup/components/cluster/v1.3.1/tiup-cluster display mytidb-cluster
Cluster type: tidb
Cluster name: mytidb-cluster
Cluster version: v3.1.2
SSH type: builtin
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
192.168.59.146:3000 grafana 192.168.59.146 3000 linux/x86_64 Up - /tidb-deploy/grafana-3000
192.168.59.146:2379 pd 192.168.59.146 2379/2380 linux/x86_64 Up|L /tidb-data/pd-2379 /tidb-deploy/pd-2379
192.168.59.146:9090 prometheus 192.168.59.146 9090 linux/x86_64 Up /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
192.168.59.146:4000 tidb 192.168.59.146 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
192.168.59.146:9000 tiflash 192.168.59.146 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /tidb-data/tiflash-9000 /tidb-deploy/tiflash-9000
192.168.59.146:20160 tikv 192.168.59.146 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
Total nodes: 6
遇到的问题
安装TiDB v4.0.9版本,可以部署成功,但是启动报错,如果topo.yaml中配置了3个节点,启动报错,tikv只能启动成功一个,日志如下:
[root@master ~]# tiup cluster start mytidb-cluster
Starting component `cluster`: /root/.tiup/components/cluster/v1.3.1/tiup-cluster start mytidb-cluster
Starting cluster mytidb-cluster...
+ [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/mytidb-cluster/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/mytidb-cluster/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [Parallel] - UserSSH: user=tidb, host=192.168.59.146
+ [ Serial ] - StartCluster
Starting component pd
Starting instance pd 192.168.59.146:2379
Start pd 192.168.59.146:2379 success
Starting component node_exporter
Starting instance 192.168.59.146
Start 192.168.59.146 success
Starting component blackbox_exporter
Starting instance 192.168.59.146
Start 192.168.59.146 success
Starting component tikv
Starting instance tikv 192.168.59.146:20162
Starting instance tikv 192.168.59.146:20160
Starting instance tikv 192.168.59.146:20161
Start tikv 192.168.59.146:20162 success
Error: failed to start tikv: failed to start: tikv 192.168.59.146:20161, please check the instance's log(/tidb-deploy/tikv-20161/log) for more detail.: timed out waiting for port 20161 to be started after 2m0s
Verbose debug logs has been written to /root/.tiup/logs/tiup-cluster-debug-2021-01-05-19-58-46.log.
Error: run `/root/.tiup/components/cluster/v1.3.1/tiup-cluster` (wd:/root/.tiup/data/SLGrLJI) failed: exit status 1
查看日志文件/tidb-deploy/tikv-20161/log/tikv.log,提示下面2个目录下找不到文件:
[2021/01/06 05:48:44.231 -05:00] [FATAL] [lib.rs:482] ["called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: \"No such file or directory\" }"] [backtrace="stack backtrace:\n 0: tikv_util::set_panic_hook::{{closure}}\n at components/tikv_util/src/lib.rs:481\n 1: std::panicking::rust_panic_with_hook\n at src/libstd/panicking.rs:475\n 2: rust_begin_unwind\n at src/libstd/panicking.rs:375\n 3: core::panicking::panic_fmt\n at src/libcore/panicking.rs:84\n 4: core::result::unwrap_failed\n at src/libcore/result.rs:1188\n 5: core::result::Result<T,E>::unwrap\n at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libcore/result.rs:956\n cmd::server::TiKVServer::init_fs\n at cmd/src/server.rs:310\n cmd::server::run_tikv\n at cmd/src/server.rs:95\n 6: tikv_server::main\n at cmd/src/bin/tikv-server.rs:166\n 7: std::rt::lang_start::{{closure}}\n at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/rt.rs:67\n 8: main\n 9: __libc_start_main\n 10: <unknown>\n"] [location=src/libcore/result.rs:1188] [thread_name=main]
如果配置一个节点,启动还是失败,启动日志我们截取后半段:
Starting component pd
Starting instance pd 192.168.59.146:2379
Start pd 192.168.59.146:2379 success
Starting component node_exporter
Starting instance 192.168.59.146
Start 192.168.59.146 success
Starting component blackbox_exporter
Starting instance 192.168.59.146
Start 192.168.59.146 success
Starting component tikv
Starting instance tikv 192.168.59.146:20160
Start tikv 192.168.59.146:20160 success
Starting component tidb
Starting instance tidb 192.168.59.146:4000
Start tidb 192.168.59.146:4000 success
Starting component tiflash
Starting instance tiflash 192.168.59.146:9000
Error: failed to start tiflash: failed to start: tiflash 192.168.59.146:9000, please check the instance's log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s
Verbose debug logs has been written to /root/.tiup/logs/tiup-cluster-debug-2021-01-06-20-02-13.log.
在/tidb-deploy/tiflash-9000/log中文件如下:
[2021/01/06 20:06:26.207 -05:00] [INFO] [mod.rs:335] ["starting working thread"] [worker=region-collector-worker]
[2021/01/06 20:06:27.130 -05:00] [FATAL] [lib.rs:482] ["called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: \"No such file or directory\" }"] [backtrace="stack backtrace:\n 0: tikv_util::set_panic_hook::{{closure}}\n 1: std::panicking::rust_panic_with_hook\n at src/libstd/panicking.rs:475\n 2: rust_begin_unwind\n at src/libstd/panicking.rs:375\n 3: core::panicking::panic_fmt\n at src/libcore/panicking.rs:84\n 4: core::result::unwrap_failed\n at src/libcore/result.rs:1188\n 5: cmd::server::run_tikv\n 6: run_proxy\n 7: operator()\n at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Server/Server.cpp:415\n 8: execute_native_thread_routine\n at ../../../../../libstdc++-v3/src/c++11/thread.cc:83\n 9: start_thread\n 10: __clone\n"] [location=src/libcore/result.rs:1188] [thread_name=<unnamed>]
试了v4.0.1版本,也是一样的问题,都是报找不到文件的错误。
总结
TiDB部署相对容易,但是如果部署失败,比如本文的V4.0.x版本,不太好解决,因为网上相关的经验很少,官网也找不到,只能翻源代码来解决了。