首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >将数据上载到Apache Hbase时出现管道断开错误

将数据上载到Apache Hbase时出现管道断开错误
EN

Stack Overflow用户
提问于 2019-06-27 06:17:59
回答 1查看 778关注 0票数 0

我目前正在尝试将一个大型CSV加载到Apache hbase中。CSV有50,000列宽和15,000行。CSV的值只是整数。

Hbase集群在AWS EMR上运行,具有充足的内存(244 32)和计算(每个32核,4个节点)。

我正在尝试使用以下python脚本将数据加载到数据库中:

代码语言:javascript
复制
import happybase
import pandas as pd

connection = happybase.Connection('localhost')

familes = {
    's': dict(in_memory=True)
}

#connection.delete_table('exon', disable=True)
connection.create_table('exon', familes)

table = connection.table('exon')
df = pd.read_csv('exon.csv', nrows=1000)

col = list(df)
col = col[1:]


for index, row in df.iterrows():
    to_put = {}
    for col_name in col:
        to_put[('s:'+ col_name).encode('utf-8')] = str(row[col_name]).encode('utf-8')
    print('putting: ' + str(row[0]))
    table.put(row[0].encode('utf-8'), to_put)

当此脚本运行时,只读取前几行,没有任何问题:

代码语言:javascript
复制
df = pd.read_csv('exon.csv', nrows=20)

但是,读取更多的行会导致错误:

代码语言:javascript
复制
df = pd.read_csv('exon.csv', nrows=1000)
代码语言:javascript
复制
putting: F1S4_160106_001_B01
Traceback (most recent call last):
  File "load.py", line 25, in <module>
    table.put(row[0].encode('utf-8'), to_put)
  File "/usr/local/lib/python3.6/site-packages/happybase/table.py", line 464, in put
    batch.put(row, data)
  File "/usr/local/lib/python3.6/site-packages/happybase/batch.py", line 137, in __exit__
    self.send()
  File "/usr/local/lib/python3.6/site-packages/happybase/batch.py", line 60, in send
    self._table.connection.client.mutateRows(self._table.name, bms, {})
  File "/usr/local/lib64/python3.6/site-packages/thriftpy2/thrift.py", line 200, in _req
    self._send(_api, **kwargs)
  File "/usr/local/lib64/python3.6/site-packages/thriftpy2/thrift.py", line 210, in _send
    args.write(self._oprot)
  File "/usr/local/lib64/python3.6/site-packages/thriftpy2/thrift.py", line 153, in write
    oprot.write_struct(self)
  File "thriftpy2/protocol/cybin/cybin.pyx", line 477, in cybin.TCyBinaryProtocol.write_struct
  File "thriftpy2/protocol/cybin/cybin.pyx", line 474, in cybin.TCyBinaryProtocol.write_struct
  File "thriftpy2/protocol/cybin/cybin.pyx", line 212, in cybin.write_struct
  File "thriftpy2/protocol/cybin/cybin.pyx", line 356, in cybin.c_write_val
  File "thriftpy2/protocol/cybin/cybin.pyx", line 115, in cybin.write_list
  File "thriftpy2/protocol/cybin/cybin.pyx", line 362, in cybin.c_write_val
  File "thriftpy2/protocol/cybin/cybin.pyx", line 212, in cybin.write_struct
  File "thriftpy2/protocol/cybin/cybin.pyx", line 356, in cybin.c_write_val
  File "thriftpy2/protocol/cybin/cybin.pyx", line 115, in cybin.write_list
  File "thriftpy2/protocol/cybin/cybin.pyx", line 362, in cybin.c_write_val
  File "thriftpy2/protocol/cybin/cybin.pyx", line 209, in cybin.write_struct
  File "thriftpy2/protocol/cybin/cybin.pyx", line 71, in cybin.write_i08
  File "thriftpy2/transport/buffered/cybuffered.pyx", line 55, in thriftpy2.transport.buffered.cybuffered.TCyBufferedTransport.c_write
  File "thriftpy2/transport/buffered/cybuffered.pyx", line 80, in thriftpy2.transport.buffered.cybuffered.TCyBufferedTransport.c_flush
  File "/usr/local/lib64/python3.6/site-packages/thriftpy2/transport/socket.py", line 136, in write
    self.sock.sendall(buff)
BrokenPipeError: [Errno 32] Broken pipe

是不是一次插入的数据太多了?我也试过批量推送,同样的问题也出现了。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-06-27 07:54:25

发现错误-因为我在打开HappyBase连接后调用pandas.read_csv,连接超时。在我打开连接之前调用read_csv解决了这个问题。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56781616

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档