可以使用带有RC或ORC文件格式的JSON serde吗?我正在尝试插入一个文件格式为ORC的Hive表,并将其存储在序列化JSON中的azure blob中。
发布于 2017-04-05 06:35:38
显然不是
insert overwrite local directory '/home/cloudera/local/mytable'
stored as orc
select '{"mycol":123,"mystring","Hello"}'
;
create external table verify_data (rec string)
stored as orc
location 'file:////home/cloudera/local/mytable'
;
select * from verify_data
;
记录
{"mycol":123,"mystring","Hello"}
create external table mytable (myint int,mystring string)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
stored as orc
location 'file:///home/cloudera/local/mytable'
;
myint mystring
失败,出现异常java.io.IOException:java.lang.ClassCastException:
不能将org.apache.hadoop.hive.ql.io.orc.OrcStruct转换为org.apache.hadoop.io.Text
...
import org.apache.hadoop.io.Text;
...
@Override
public Object deserialize(Writable blob) throws SerDeException {
Text t = (Text) blob;
...
发布于 2018-01-15 23:46:51
您可以使用某种类型的转换步骤来完成此操作,例如在目标目录中生成ORC文件的分组表步骤,并在分组表之后挂载具有相同模式的hive表。如下所示。
CREATE EXTERNAL TABLE my_fact_orc
(
mycol STRING,
mystring INT
)
PARTITIONED BY (dt string)
CLUSTERED BY (some_id) INTO 64 BUCKETS
STORED AS ORC
LOCATION 's3://dev/my_fact_orc'
TBLPROPERTIES ('orc.compress'='SNAPPY');
ALTER TABLE my_fact_orc ADD IF NOT EXISTS PARTITION (dt='2017-09-07') LOCATION 's3://dev/my_fact_orc/dt=2017-09-07';
ALTER TABLE my_fact_orc PARTITION (dt='2017-09-07') SET FILEFORMAT ORC;
SELECT * FROM my_fact_orc WHERE dt='2017-09-07' LIMIT 5;
https://stackoverflow.com/questions/43220529
复制