前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >通过多种方式将数据导入hive表

通过多种方式将数据导入hive表

作者头像
闵开慧
发布2018-04-02 12:16:52
9230
发布2018-04-02 12:16:52
举报
文章被收录于专栏:闵开慧闵开慧

hive官方手册 http://slaytanic.blog.51cto.com/2057708/939950

通过多种方式将数据导入hive表

1.通过外部表导入

用户在hive上建external表,建表的同时指定hdfs路径,在数据拷贝到指定hdfs路径的同时,也同时完成数据插入external表。 例如: 编辑文件test.txt $ cat test.txt  1       hello 2       world 3       test 4       case 字段之间以'\t'分割 启动hive: $ hive 建external表: hive> CREATE EXTERNAL TABLE MYTEST(num INT, name STRING)     > COMMENT 'this is a test'     > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'     > STORED AS TEXTFILE     > LOCATION '/data/test'; OK Time taken: 0.714 seconds hive> show tables; OK mytest partition_test partition_test_input test Time taken: 0.07 seconds hive> desc mytest ; OK num     int name    string Time taken: 0.121 seconds| 数据拷贝到hdfs: $ Hadoop fs -put test.txt /data/test 查看hive表数据: hive> select * from mytest; OK 1       hello 2       world 3       test 4       case Time taken: 0.375 seconds hive> select num from mytest; Total MapReduce jobs = 1 Launching Job 1 out of 1 ...... Total MapReduce CPU Time Spent: 510 msec OK 1 2 3 4 Time taken: 27.157 seconds 这种方式常常用于当hdfs上有一些历史数据,而我们需要在这些数据上做一些hive的操作时使用。这种方式避免了数据拷贝开销

2.从本地导入

数据不在hdfs上,直接从本地导入hive表 文件/home/work/test.txt内容同上 建表: hive> CREATE TABLE MYTEST2(num INT, name STRING)     > COMMENT 'this is a test2'                               > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'          > STORED AS TEXTFILE;   OK Time taken: 0.077 seconds 导数据入表: hive> LOAD DATA LOCAL INPATH '/home/work/test.txt' INTO TABLE MYTEST2; Copying data from file:/home/work/test.txt Copying file: file:/home/work/test.txt Loading data to table default.mytest2 OK Time taken: 0.24 seconds 查看数据: hive> select * from MYTEST2; OK 1       hello 2       world 3       test 4       case Time taken: 0.11 seconds

这种方式导入的本地数据可以是一个文件,一个文件夹或者通配符,需要注意的是,如果是文件夹,文件夹内不能包含子目录,同样,通配符只能通配文件。

3.从hdfs导入

上述test.txt文件已经导入/data/test 则可以使用下述命令直接将数据导入hive表: hive> CREATE TABLE MYTEST3(num INT, name STRING)     > COMMENT "this is a test3"     > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'     > STORED AS TEXTFILE; OK Time taken: 4.735 seconds hive> LOAD DATA INPATH '/data/test/test.txt' INTO TABLE MYTEST3; Loading data to table default.mytest3 OK Time taken: 0.337 seconds hive> select * from MYTEST3 ; OK 1       hello 2       world 3       test 4       case Time taken: 0.227 seconds

4. 从其它表导入数据:

hive> CREATE EXTERNAL TABLE MYTEST4(num INT) ; OK Time taken: 0.091 seconds hive> FROM MYTEST3 test3     > INSERT OVERWRITE TABLE MYTEST4     > select test3.num where name='world'; Total MapReduce jobs = 2 Launching Job 1 out of 2 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201207230024_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201207230024_0002 Kill Command = /home/work/hadoop/hadoop-1.0.3/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201207230024_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2012-07-23 18:59:02,365 Stage-1 map = 0%,  reduce = 0% 2012-07-23 18:59:08,417 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec 2012-07-23 18:59:09,435 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec 2012-07-23 18:59:10,445 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec 2012-07-23 18:59:11,455 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec 2012-07-23 18:59:12,470 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec 2012-07-23 18:59:13,489 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec 2012-07-23 18:59:14,508 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.62 sec MapReduce Total cumulative CPU time: 620 msec Ended Job = job_201207230024_0002 Ended Job = -174856900, job is filtered out (removed at runtime). Moving data to: hdfs://localhost:9000/tmp/hive-work/hive_2012-07-23_18-58-44_166_189728317691010041/-ext-10000 Loading data to table default.mytest4 Deleted hdfs://localhost:9000/user/hive/warehouse/mytest4 Table default.mytest4 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 2, raw_data_size: 0] 1 Rows loaded to mytest4 MapReduce Jobs Launched:  Job 0: Map: 1   Accumulative CPU: 0.62 sec   HDFS Read: 242 HDFS Write: 2 SUCESS Total MapReduce CPU Time Spent: 620 msec OK Time taken: 30.663 seconds hive> select * from mytest4; OK 2 Time taken: 0.103 seconds

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1.通过外部表导入
  • 2.从本地导入
  • 3.从hdfs导入
  • 4. 从其它表导入数据:
相关产品与服务
数据库
云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档