首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >在IBM Data Science Experience(IBM DSX)中将zip文件导入Python Notebook

在IBM Data Science Experience(IBM DSX)中将zip文件导入Python Notebook
EN

Stack Overflow用户
提问于 2017-09-19 10:36:43
回答 2查看 669关注 0票数 1

我有一个压缩文件train.zip(1.1 it ),我想把它导入Python Notebook,解压缩,然后着手处理它。我使用选项Inert StringIO Object将其作为String IO对象导入。

代码语言:javascript
复制
from io import StringIO
import requests
import json
import pandas as pd

# @hidden_cell
# This function accesses a file in your Object Storage. The definition contains your credentials.
# You might want to remove those credentials before you share your notebook.
def get_object_storage_file_with_credentials_xxxxxx(container, filename):
    """This functions returns a StringIO object containing
    the file content from Bluemix Object Storage."""

    url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
    data = {'auth': {'identity': {'methods': ['password'],
            'password': {'user': {'name': 'member_xxxxxx','domain': {'id': 'xxxxxxx'},
            'password': 'xxxxx),(xxxxx'}}}}}
    headers1 = {'Content-Type': 'application/json'}
    resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
    resp1_body = resp1.json()
    for e1 in resp1_body['token']['catalog']:
        if(e1['type']=='object-store'):
            for e2 in e1['endpoints']:
                        if(e2['interface']=='public'and e2['region']=='dallas'):
                            url2 = ''.join([e2['url'],'/', container, '/', filename])
    s_subject_token = resp1.headers['x-subject-token']
    headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
    resp2 = requests.get(url=url2, headers=headers2)
    return StringIO(resp2.text)

# Your data file was loaded into a StringIO object and you can process the data.
# Please read the documentation of pandas to learn more about your possibilities to load your data.
# pandas documentation: http://pandas.pydata.org/pandas-docs/stable/io.html
data_1 = get_object_storage_file_with_credentials_20e75635ab104e58bd1a6e91635fed51('DefaultProjectxxxxxxxx', 'train.zip')

这将给出一个输出:

代码语言:javascript
复制
data_1
<_io.StringIO at 0x7f8a288cd3a8>

但是,当我尝试使用Zipfile解压它时,我得到了以下错误:

代码语言:javascript
复制
from zipfile import ZipFile
file = ZipFile(data_1)

BadZipFile: File is not a zip file

如何访问IBM DSX中的文件?

EN

回答 2

Stack Overflow用户

发布于 2017-10-24 03:54:18

您可以使用如下所示的函数从对象存储中保存zip文件。credentials参数是插入到DSX笔记本代码中的字典。此函数为also on gist

代码语言:javascript
复制
import zipfile
from io import BytesIO
import requests
import json
import pandas as pd

def get_zip_file(credentials):

    url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens']) 
    data = {'auth': {'identity': {'methods': ['password'], 'password': {'user': {'name': credentials['username'],'domain': {'id': credentials['domain_id']}, 'password': credentials['password']}}}}} 
    headers1 = {'Content-Type': 'application/json'} resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1) 
    resp1_body = resp1.json() 
    for e1 in resp1_body['token']['catalog']:
        if(e1['type']=='object-store'): 
            for e2 in e1['endpoints']:   
                if(e2['interface']=='public'and e2['region']==credentials['region']): url2 = ''.join([e2['url'],'/', credentials['container'], '/', credentials['filename']]) s_subject_token = resp1.headers['x-subject-token'] headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'} 

    s_subject_token = resp1.headers['x-subject-token']
    headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
    r = requests.get(url=url2, headers=headers2, stream=True)

    z = zipfile.ZipFile(BytesIO(r.content))
    z.extractall()# save zip contents to disk

    return(z)

z = get_zip_file(credentials)
票数 1
EN

Stack Overflow用户

发布于 2017-09-19 13:36:22

ZipFile构造函数需要文件名,而不是文件内容。请在此处查看解决方案:Unzip buffer with Python?

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46290941

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档