前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >mxgate是gpcopy同步速度的2倍

mxgate是gpcopy同步速度的2倍

原创
作者头像
小徐
修改2021-10-29 14:36:32
7960
修改2021-10-29 14:36:32
举报
文章被收录于专栏:GreenplumGreenplumGreenplum
1、构造测试的数据
2、查看数据的大小和行数
3、创建表语句
4、查看创建后的表
5、编写mxgate同步脚本
6、编写gpcopy同步脚本
7、查看硬件资源
     7、1 CPU和内存信息
     7、2 网络信息 
8、常见问题总结
    8、1 slice bounds out of range
     8、2 "gpcopy_temp_3" does not exist
8、3 Control connection pair should call `InitConnections` first
8、4 解决方式

集群的架构

序列

数据库

服务器

segment

master

访问端口

1

Greenplum6.0.0

sdw2

16 primary + 16 mirror

sdw3

5432

2

sdw3

3

sdw4

4

matrixdb4.2

sdw5

16 primary + 16 mirror

sdw7

5432

1、构造测试的数据

# 造10列100w设备的数据
time ./tsbs_generate_data --format="matrixdb" --use-case="massiveiot" --scale="1000000" --log-interval=60s --column-data-type=simple --point-type=int --columns-per-row=10 --timestamp-start="2016-01-01T00:00:00Z" --timestamp-end="2016-01-01T02:00:00Z" > /data/c1/cpu_10_100w.csv
 
# 造50列100w设备的数据
time ./tsbs_generate_data --format="matrixdb" --use-case="massiveiot" --scale="1000000" --log-interval=60s --column-data-type=simple --point-type=int --columns-per-row=50 --timestamp-start="2016-01-01T00:00:00Z" --timestamp-end="2016-01-01T02:00:00Z" > /data/c1/cpu_50_100w.csv
 
# 造100列100w设备的数据
time ./tsbs_generate_data --format="matrixdb" --use-case="massiveiot" --scale="1000000" --log-interval=60s --column-data-type=simple --point-type=int --columns-per-row=100 --timestamp-start="2016-01-01T00:00:00Z" --timestamp-end="2016-01-01T02:00:00Z" > /data/c1/cpu_100_100w.csv

2、查看数据的大小和行数

$ du -sh cpu_10_100w.csv
15G     cpu_10_100w.csv
 
$ du -sh cpu_50_100w.csv
63G     cpu_50_100w.csv
 
$ du -sh cpu_100_100w.csv
122G    cpu_100_100w.csv
 
 
$ wc -l cpu_10_100w.csv
120000000 cpu_10_100w.csv
 
$ wc -l cpu_50_100w.csv
120000000 cpu_50_100w.csv
 
$ wc -l cpu_100_100w.csv
120000000 cpu_100_100w.csv

3、创建表语句

# 创建创建表的function
create or replace function f(name, int, text) returns void as $$   
declare   
  res text := 'ts varchar(50),tag_id integer,';   
begin   
  for i in 1..$2 loop 
  res := res||'c'||i||' integer,';   
  end loop;   
  res := rtrim(res, ',');   
  if $3 = 'ao_col' then   
    res := 'create table '||$1||'('||res||') with(appendonly=true, compresstype=zstd, orientation=column) DISTRIBUTED BY (tag_id)';   
  elsif $3 = 'ao_row' then   
    res := 'create table '||$1||'('||res||') with(appendonly=true, blocksize=262144, orientation=row) DISTRIBUTED BY (tag_id)';   
  elsif $3 = 'heap_row' then   
    res := 'create table '||$1||'('||res||') with(appendonly=false) DISTRIBUTED BY (tag_id)';   
  else   
    raise notice 'use ao_col, ao_row, heap_row as $3';   
    return;   
  end if;   
  execute res;   
end;   
$$ language plpgsql;
 
 
# 创建表语句和加载语句
 
$ cat matrixdball--ao.sh
#bin/bash
 
col_arr=( 10 )
device_arr=( 100 )
 
for col in "${col_arr[@]}"
do
   for device in "${device_arr[@]}"
   do
      echo "${col},${device} heap"
          file_name="cpu_${col}_${device}w"
          table_name="cpu_${col}_${device}w_ao"
 
          echo "${col},${device} heap_index"
          psql -d gpdb -c  "drop table if exists ${table_name}"
          psql -d gpdb -c  "select f('${table_name}','${col}','ao_row');"
          psql -d gpdb -c  "create index on ${table_name} (tag_id, ts);create index on ${table_name} (ts);"
          time psql -d gpdb -c  "COPY cpu_10_100w_ao FROM '/home/gpadmin/1c/cpu_10_100w.csv' WITH csv DELIMITER '|' NULL as 'null string'"
   done
done
 
 
col_arr=( 50 )
device_arr=( 100 )
 
for col in "${col_arr[@]}"
do
   for device in "${device_arr[@]}"
   do
      echo "${col},${device} heap"
          file_name="cpu_${col}_${device}w"
          table_name="cpu_${col}_${device}w_ao"
 
          echo "${col},${device} heap_index"
          psql -d gpdb -c  "drop table if exists ${table_name}"
          psql -d gpdb -c  "select f('${table_name}','${col}','ao_row');"
          psql -d gpdb -c  "create index on ${table_name} (tag_id, ts);create index on ${table_name} (ts);"
          time psql -d gpdb -c  "COPY cpu_50_100w_ao FROM '/home/gpadmin/1c/cpu_50_100w.csv' WITH csv DELIMITER '|' NULL as 'null string'"
   done
done
 
 
col_arr=( 100 )
device_arr=( 100 )
 
for col in "${col_arr[@]}"
do
   for device in "${device_arr[@]}"
   do
      echo "${col},${device} heap"
          file_name="cpu_${col}_${device}w"
          table_name="cpu_${col}_${device}w_ao"
 
          echo "${col},${device} heap_index"
          psql -d gpdb -c  "drop table if exists ${table_name}"
          psql -d gpdb -c  "select f('${table_name}','${col}','ao_row');"
          psql -d gpdb -c  "create index on ${table_name} (tag_id, ts);create index on ${table_name} (ts);"
          time psql -d gpdb -c  "COPY cpu_100_100w_ao FROM '/home/gpadmin/1c/cpu_100_100w.csv' WITH csv DELIMITER '|' NULL as 'null string'"
   done
done
 
 
 
$ cat matrixdball--aoco.sh
#bin/bash
 
col_arr=( 10 )
device_arr=( 100 )
 
for col in "${col_arr[@]}"
do
   for device in "${device_arr[@]}"
   do
      echo "${col},${device} heap"
          file_name="cpu_${col}_${device}w"
          table_name="cpu_${col}_${device}w_aoco"
 
          echo "${col},${device} heap_index"
          psql -d gpdb -c  "drop table if exists ${table_name}"
          psql -d gpdb -c  "select f('${table_name}','${col}','ao_col');"
          psql -d gpdb -c  "create index on ${table_name} (tag_id, ts);create index on ${table_name} (ts);"
          time psql -d gpdb -c  "COPY cpu_10_100w_aoco FROM '/home/gpadmin/1c/cpu_10_100w.csv' WITH csv DELIMITER '|' NULL as 'null string'"
   done
done
 
 
col_arr=( 50 )
device_arr=( 100 )
 
for col in "${col_arr[@]}"
do
   for device in "${device_arr[@]}"
   do
      echo "${col},${device} heap"
          file_name="cpu_${col}_${device}w"
          table_name="cpu_${col}_${device}w_aoco"
 
          echo "${col},${device} heap_index"
          psql -d gpdb -c  "drop table if exists ${table_name}"
          psql -d gpdb -c  "select f('${table_name}','${col}','ao_col');"
          psql -d gpdb -c  "create index on ${table_name} (tag_id, ts);create index on ${table_name} (ts);"
          time psql -d gpdb -c  "COPY cpu_50_100w_aoco FROM '/home/gpadmin/1c/cpu_50_100w.csv' WITH csv DELIMITER '|' NULL as 'null string'"
   done
done
 
 
col_arr=( 100 )
device_arr=( 100 )
 
for col in "${col_arr[@]}"
do
   for device in "${device_arr[@]}"
   do
      echo "${col},${device} heap"
          file_name="cpu_${col}_${device}w"
          table_name="cpu_${col}_${device}w_aoco"
 
          echo "${col},${device} heap_index"
          psql -d gpdb -c  "drop table if exists ${table_name}"
          psql -d gpdb -c  "select f('${table_name}','${col}','ao_col');"
          psql -d gpdb -c  "create index on ${table_name} (tag_id, ts);create index on ${table_name} (ts);"
          time psql -d gpdb -c  "COPY cpu_100_100w_aoco FROM '/home/gpadmin/1c/cpu_100_100w.csv' WITH csv DELIMITER '|' NULL as 'null string'"
   done
done
 
 
 
$ cat matrixdball--heap.sh
#bin/bash
 
col_arr=( 10 )
device_arr=( 100 )
 
for col in "${col_arr[@]}"
do
   for device in "${device_arr[@]}"
   do
      echo "${col},${device} heap"
          file_name="cpu_${col}_${device}w"
          table_name="cpu_${col}_${device}w_heap"
 
          echo "${col},${device} heap_index"
          psql -d gpdb -c  "drop table if exists ${table_name}"
          psql -d gpdb -c  "select f('${table_name}','${col}','heap_row');"
          psql -d gpdb -c  "create index on ${table_name} (tag_id, ts);create index on ${table_name} (ts);"
          time psql -d gpdb -c  "COPY cpu_10_100w_heap FROM '/home/gpadmin/1c/cpu_10_100w.csv' WITH csv DELIMITER '|' NULL as 'null string'"
   done
done
 
 
col_arr=( 50 )
device_arr=( 100 )
 
for col in "${col_arr[@]}"
do
   for device in "${device_arr[@]}"
   do
      echo "${col},${device} heap"
          file_name="cpu_${col}_${device}w"
          table_name="cpu_${col}_${device}w_heap"
 
          echo "${col},${device} heap_index"
          psql -d gpdb -c  "drop table if exists ${table_name}"
          psql -d gpdb -c  "select f('${table_name}','${col}','heap_row');"
          psql -d gpdb -c  "create index on ${table_name} (tag_id, ts);create index on ${table_name} (ts);"
          time psql -d gpdb -c  "COPY cpu_50_100w_heap FROM '/home/gpadmin/1c/cpu_50_100w.csv' WITH csv DELIMITER '|' NULL as 'null string'"
   done
done
 
 
col_arr=( 100 )
device_arr=( 100 )
 
for col in "${col_arr[@]}"
do
   for device in "${device_arr[@]}"
   do
      echo "${col},${device} heap"
          file_name="cpu_${col}_${device}w"
          table_name="cpu_${col}_${device}w_heap"
 
          echo "${col},${device} heap_index"
          psql -d gpdb -c  "drop table if exists ${table_name}"
          psql -d gpdb -c  "select f('${table_name}','${col}','heap_row');"
          psql -d gpdb -c  "create index on ${table_name} (tag_id, ts);create index on ${table_name} (ts);"
          time psql -d gpdb -c  "COPY cpu_100_100w_heap FROM '/home/gpadmin/1c/cpu_100_100w.csv' WITH csv DELIMITER '|' NULL as 'null string'"
   done
done

4、查看创建后的表

# \dt*
                          List of relations
 Schema |       Name        | Type  |  Owner  |       Storage
--------+-------------------+-------+---------+----------------------
 public | cpu_100_100w_ao   | table | mxadmin | append only
 public | cpu_100_100w_aoco | table | mxadmin | append only columnar
 public | cpu_100_100w_heap | table | mxadmin | heap
 public | cpu_10_100w_ao    | table | mxadmin | append only
 public | cpu_10_100w_aoco  | table | mxadmin | append only columnar
 public | cpu_10_100w_heap  | table | mxadmin | heap
 public | cpu_50_100w_ao    | table | mxadmin | append only
 public | cpu_50_100w_aoco  | table | mxadmin | append only columnar
 public | cpu_50_100w_heap  | table | mxadmin | heap

5、编写mxgate同步脚本

以下脚本可以同步不同的表类型,heap、ao、aoco、分区表等

$  vim sync_mxgate.sh
 
#!bin/bash
 
mxgate --source transfer \
       --src-host 192.168.***.** \
       --src-port 5432 \
       --src-db postgres \
       --src-user gpadmin \
       --src-password  gpadmin \
       --src-schema public \
       --src-table cpu_100_100w_heap \
       --compress "lz4" \
       --local-ip 192.168.***.*** \
       --db-database postgres \
       --db-master-port 5432 \
       --db-user mxadmin \
       --db-password mxadmin \
       --target public.cpu_100_100w_heap \
       --format csv \
       --parallel 300 \
       --interval 100  \
       --stream-prepared 6 \
       --time-format raw

6、编写gpcopy同步脚本

gpcopy --source-host 192.168.***.*** \
        --source-port 5432 \
        --source-user gpadmin \
        --include-table postgres.public.cpu_100_100w_heap \
        --dest-host 192.168.***.*** \
        --dest-port 5432 \
        --dest-user mxadmin \
        --drop --exclude-table postgres.public.cpu_100_100w_heap \
        --jobs 200 /
        --on-segment-threshold 90000

7、查看硬件资源

7、1 CPU和内存信息

7、2 网络信息

8、常见问题总结

 8、1 slice bounds out of range

20211021:10:42:20 gpcopy:mxadmin:sdw7:026691-[INFO]:-Copy Timestamp = 20211021104220
20211021:10:42:20 gpcopy:mxadmin:sdw7:026691-[INFO]:-pg_dump (PostgreSQL) 12
20211021:10:42:20 gpcopy:mxadmin:sdw7:026691-[INFO]:-pg_dumpall (PostgreSQL) 12
20211021:10:42:21 gpcopy:mxadmin:sdw7:026691-[WARNING]:-pg_dump version is higher than source cluster, which might have compatibility issues
20211021:10:42:21 gpcopy:mxadmin:sdw7:026691-[ERROR]:-Caught an error: or: slice bounds out of range
20211021:10:42:21 gpcopy:mxadmin:sdw7:026691-[INFO]:-------------------------------------------------
20211021:10:42:21 gpcopy:mxadmin:sdw7:026691-[INFO]:-Total elapsed time: 953.799132ms
20211021:10:42:21 gpcopy:mxadmin:sdw7:026691-[INFO]:-Copied 0 databases

8、2 "gpcopy_temp_3" does not exist


failed to stop daemon on segments. pq: relation "gpcopy_temp_6" does not exist
Error: pq: relation "gpcopy_temp_3" does not exist

8、3 Control connection pair should call `InitConnections` first

20211025:13:32:01 gpcopy:mxadmin:sdw7:162638-[ERROR]:-Caught an error: or: slice bounds out of range [62:40]
20211025:13:32:01 gpcopy:mxadmin:sdw7:162638-[INFO]:-------------------------------------------------
20211025:13:32:01 gpcopy:mxadmin:sdw7:162638-[INFO]:-Total elapsed time: 74.028874ms
20211025:13:32:01 gpcopy:mxadmin:sdw7:162638-[INFO]:-Total transferred data 0B, transfer rate 0B/h
20211025:13:32:01 gpcopy:mxadmin:sdw7:162638-[INFO]:-Copied 0 databases
20211025:13:32:01 gpcopy:mxadmin:sdw7:162638-[WARNING]:-Encountered an error during cleanup: Control connection pair should call `InitConnections` first.

8、4 解决方式

1、检查greenplum和matrix的版本信息时候一致,因为gpcopy会校验版本

2、修改version信息需要在postgresql数据库下进行

3、替换一下红色的部门即可

CREATE OR REPLACE FUNCTION pg_catalog.version() RETURNS TEXT
AS $$
SELECT 'PostgreSQL 9.4.20 (Greenplum Database 6.0.0-beta.1 build commit:067fb8e43115bb087c872ee3d2269d869430263d) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Mar 19 2019 01:25:00';
$$ LANGUAGE SQL;

参考资料

https://ymatrix.cn/blog/load-test-steps

https://www.ymatrix.cn/blog/20210524-MatrixDB-insertperformance

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 集群的架构
  • 1、构造测试的数据
  • 2、查看数据的大小和行数
  • 3、创建表语句
  • 4、查看创建后的表
  • 5、编写mxgate同步脚本
  • 6、编写gpcopy同步脚本
  • 7、查看硬件资源
    • 7、1 CPU和内存信息
      • 7、2 网络信息
      • 8、常见问题总结
        •  8、1 slice bounds out of range
          • 8、2 "gpcopy_temp_3" does not exist
            • 8、3 Control connection pair should call `InitConnections` first
              • 8、4 解决方式
              • 参考资料
              相关产品与服务
              数据库
              云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
              领券
              问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档