前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Forklift ETL 基础(一)(1)

Forklift ETL 基础(一)(1)

作者头像
franket
发布2021-10-18 11:03:57
4860
发布2021-10-18 11:03:57
举报
文章被收录于专栏:技术杂记

前言

Forklift ETL 是基于 Ruby 语言用来对 Mysql 和 Elasticsearch 进行 ETL 的工具集

Forklift is a ruby gem that makes it easy for you to move your data around. Forklift can be an integral part of your datawarehouse pipeline or a backup tool. Forklift can collect and collapse data from multiple sources or across a single source

什么是 ETL 可以参考前面的一篇博客 ETL (Extract-Transform-Load) with Kiba

ETL主要分三部:

  • 数据抽取:(Data extraction)从各类数据源读取数据
  • 数据处理:(Data transformation)对数据进行适当的加工处理以适应需求
  • 数据装载:(Data loading)将结果保存到合适的地方

这里分享一下 forklift_etl 的相关基础,详细可以参考 forklift

Tip: 当前最新版本为 forklift_etl (1.2.2)


概要


环境

h102

代码语言:javascript
复制
[root@h102 ~]# ruby -v
ruby 2.3.0p0 (2015-12-25 revision 53290) [x86_64-linux]
[root@h102 ~]# gem --version
2.5.1
[root@h102 ~]# cat /etc/issue
CentOS release 6.6 (Final)
Kernel \r on an \m

[root@h102 ~]# uname  -a 
Linux h102.temp 2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@h102 ~]#  

h105

代码语言:javascript
复制
[root@h105 ~]# cat /etc/issue
CentOS release 6.6 (Final)
Kernel \r on an \m

[root@h105 ~]# uname -a 
Linux h105 2.6.32-504.el6.x86_64 #1 SMP Wed Oct 15 04:27:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@h105 ~]# mysql -u root -p 
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.6.27-76.0-log Percona Server (GPL), Release 76.0, Revision 5498987

Copyright (c) 2009-2015 Percona LLC and/or its affiliates
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| Syslog             |
| db_d               |
| db_s               |
| mysql              |
| performance_schema |
| test               |
| testxxx            |
+--------------------+
8 rows in set (0.00 sec)

mysql>

创建 ETL 项目

配置与依赖

代码语言:javascript
复制
[root@h102 ~]# mkdir forklift
[root@h102 ~]# cd forklift/
[root@h102 forklift]# vim Gemfile 
[root@h102 forklift]# cat Gemfile 
source 'https://gems.ruby-china.org'


gem 'forklift_etl'
[root@h102 forklift]# bundle install 
Don't run Bundler as root. Bundler can ask for sudo if it is needed, and installing your bundle as root will break this application
for all non-root users on this machine.
Fetching gem metadata from https://gems.ruby-china.org/
Fetching version metadata from https://gems.ruby-china.org/
Fetching dependency metadata from https://gems.ruby-china.org/
Resolving dependencies...
Using i18n 0.7.0
Using json 1.8.3
Using minitest 5.9.0
Using thread_safe 0.3.5
Using multi_json 1.12.1
Using multipart-post 2.0.0
Using lumberjack 1.0.10
Using mysql2 0.4.4
Using mime-types-data 3.2016.0521
Using bundler 1.12.5
Using tzinfo 1.2.2
Using elasticsearch-api 1.1.0
Using faraday 0.9.2
Using mime-types 3.1
Using activesupport 4.2.7
Using elasticsearch-transport 1.1.0
Using mail 2.6.4
Using elasticsearch 1.1.0
Using pony 1.11
Installing forklift_etl 1.2.2
Bundle complete! 1 Gemfile dependency, 20 gems now installed.
Use `bundle show [gemname]` to see where a bundled gem is installed.
[root@h102 forklift]#

生成项目

使用 bundle exec forklift --generate 在当前目录中生成项目结构

代码语言:javascript
复制
[root@h102 forklift]# bundle exec forklift --generate
Example plan generated
Example plan generated
Example plan generated
Example plan generated
Example plan generated
[root@h102 forklift]# ls
config  Gemfile  Gemfile.lock  log  patterns  pid  plan.rb  template  transformations  transports
[root@h102 forklift]# tree 
.
├── config
│   ├── connections
│   │   ├── csv
│   │   ├── elasticsearch
│   │   └── mysql
│   │       ├── destination.yml
│   │       └── source.yml
│   └── email.yml
├── Gemfile
├── Gemfile.lock
├── log
├── patterns
├── pid
├── plan.rb
├── template
│   └── email.erb
├── transformations
└── transports

11 directories, 7 files
[root@h102 forklift]#

forklift 命令源码

forklift 引导脚本

代码语言:javascript
复制
[root@h102 ~]# which forklift
/usr/local/rvm/gems/ruby-2.3.0/bin/forklift
[root@h102 ~]# cat /usr/local/rvm/gems/ruby-2.3.0/bin/forklift
#!/usr/bin/env ruby_executable_hooks
#
# This file was generated by RubyGems.
#
# The application 'forklift_etl' is installed as part of a gem, and
# this file is here to facilitate running it.
#

require 'rubygems'

version = ">= 0.a"

if ARGV.first
  str = ARGV.first
  str = str.dup.force_encoding("BINARY") if str.respond_to? :force_encoding
  if str =~ /\A_(.*)_\z/ and Gem::Version.correct?($1) then
    version = $1
    ARGV.shift
  end
end

gem 'forklift_etl', version
load Gem.bin_path('forklift_etl', 'forklift', version)
[root@h102 ~]# 

这个脚本是在做一个版本的兼容处理

实际执行的是下面这一段代码

代码语言:javascript
复制
load Gem.bin_path('forklift_etl', 'forklift', version)

其实就是在加载 /usr/local/rvm/gems/ruby-2.3.0/gems/forklift_etl-1.2.2/bin/forklift 文件

代码语言:javascript
复制
[root@h102 ~]# ruby -e "puts Gem.bin_path('forklift_etl', 'forklift', '>= 0.a')"
/usr/local/rvm/gems/ruby-2.3.0/gems/forklift_etl-1.2.2/bin/forklift
[root@h102 ~]# 

本文系转载,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文系转载前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 前言
  • 概要
    • 环境
      • h102
      • h105
    • 创建 ETL 项目
      • 配置与依赖
      • 生成项目
    • forklift 命令源码
      • forklift 引导脚本
相关产品与服务
Elasticsearch Service
腾讯云 Elasticsearch Service(ES)是云端全托管海量数据检索分析服务,拥有高性能自研内核,集成X-Pack。ES 支持通过自治索引、存算分离、集群巡检等特性轻松管理集群,也支持免运维、自动弹性、按需使用的 Serverless 模式。使用 ES 您可以高效构建信息检索、日志分析、运维监控等服务,它独特的向量检索还可助您构建基于语义、图像的AI深度应用。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档