Practical HBase Data Migration

Last updated: 2024-01-02 14:21:29

HBase tables, constructed atop Hadoop HDFS, can be migrated from two perspectives: utilizing the migration method of Hadoop HDFS's distcp and employing the migration tools provided at the HBase table structure level.


As depicted above, there are various strategies for HBase migration, among which the Snapshot-based migration method is the recommended approach.

Migration of HBase Based on Snapshot

Note:
The following steps are all performed as a Hadoop user.
1. Establish a table on the original target cluster that mirrors the structure of the existing table.
$ hbase shell
hbase> create 'myTable', 'cf1', 'cf2'
2. Initialize the table data on the original cluster.
$ hbase shell
hbase> put 'myTable', 'row1', 'cf1:a', 'value1'
hbase> put 'myTable', 'row2', 'cf2:b', 'value2'

hbase> scan 'myTable'
ROW COLUMN+CELL row1 column=cf1:a, timestamp=2023-08-09T16:43:10.024, value=value1 row2 column=cf2:b, timestamp=2023-08-09T16:43:20.036, value=value2
3. Create a snapshot in the original cluster using hbase shell.
$ hbase shell
hbase>snapshot 'myTable', 'myTableSnapshot'
Here, 'myTable' is the table name in HBase, and 'myTableSnapshot' is the name of the snapshot. After creation, you can use list_snapshots to confirm success, or use delete_snapshot to remove the snapshot.
hbase> delete_snapshot 'myTableSnapshot'
4. Export the snapshot from the source cluster to the target cluster.
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot myTableSnapshot -copy-to hdfs://10.0.0.38:4007/hbase/snapshot/myTableSnapshot
Here, 10.0.0.38:4007 is the $activeip:$rpcport of the target cluster. When exporting a snapshot, the system level will initiate a mapreduce task. You can add -mappers 16 -bandwidth 200 afterwards to specify the mapper and bandwidth. The 200 here refers to 200MB/sec.
5. Restore the snapshot to the target HDFS in the target cluster by executing the following command.
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot myTableSnapshot -copy-from /hbase/snapshot/myTableSnapshot -copy-to /hbase/
6. Restore the corresponding HBase table and data from HDFS in the target cluster.
hbase> disable "myTable"
hbase> restore_snapshot 'myTableSnapshot'
hbase> enable 'myTable'
7. Finally, you can conduct tests through simple HBase table operations.