blocks|key|1895914|text|文档告诉我们，如果成员不是常规文件或链接，则提取文件()不会返回任何内容。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|1895915|一个可能的解决方案是跳过无结果：|1895916|tar+=+tarfile.open("filename.tar.gz",+"r:gz")
for+member+in+tar.getmembers():
+++++f+=+tar.extractfile(member)
+++++if+f+is+not+None:
+++++++++content+=+f.read()|code-block|syntax|javascript|1895917|entityMap|0|LINK|mutability|MUTABLE|url|https://docs.python.org/2.7/library/tarfile.html#tarfile.TarFile.extractfile^0|0|2|0|0|0|0^^$0|@$1|2|3|4|5|6|7|S|8|@]|9|@$A|T|B|U|1|V]]|C|$]]|$1|D|3|E|5|6|7|W|8|@]|9|@]|C|$]]|$1|F|3|G|5|H|7|X|8|@]|9|@]|C|$I|J]]|$1|K|3|-4|5|6|7|Y|8|@]|9|@]|C|$]]]|L|$M|$5|N|O|P|C|$Q|R]]]]

The <a href="https://docs.python.org/2.7/library/tarfile.html#tarfile.TarFile.extractfile" rel="noreferrer">docs</a> tell us that None is returned by extractfile() if the member is a not a regular file or link.

One possible solution is to skip over the None results:

<pre><code>tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
 f = tar.extractfile(member)
 if f is not None:
 content = f.read()
</code></pre>

blocks|key|974628|text|如果成员既不是文件也不是链接，tarfile.extractfile()可以返回None。例如，tar存档可能包含目录或设备文件。修复：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|974629|import+tarfile
import+numpy+as+np+

tar+=+tarfile.open("filename.tar.gz",+"r:gz")
for+member+in+tar.getmembers():
+++++f+=+tar.extractfile(member)
+++++if+f:
+++++++++content+=+f.read()
+++++++++Data+=+np.loadtxt(content)|code-block|syntax|javascript|974630|entityMap|0|LINK|mutability|MUTABLE|url|https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractfile^0|F|L|14|4|F|L|0|0|0^^$0|@$1|2|3|4|5|6|7|S|8|@$9|T|A|U|B|C]|$9|V|A|W|B|C]]|D|@$9|X|A|Y|1|Z]]|E|$]]|$1|F|3|G|5|H|7|10|8|@]|D|@]|E|$I|J]]|$1|K|3|-4|5|6|7|11|8|@]|D|@]|E|$]]]|L|$M|$5|N|O|P|E|$Q|R]]]]

<a href="https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractfile" rel="noreferrer"><code>tarfile.extractfile()</code></a> can return <code>None</code> if the member is neither a file nor a link. For example your tar archive might contain directories or device files. To fix:

<pre><code>import tarfile
import numpy as np 

tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
 f = tar.extractfile(member)
 if f:
 content = f.read()
 Data = np.loadtxt(content)
</code></pre>

blocks|key|1142640|text|你可以试试这个|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1142641|t+=+tarfile.open("filename.gz",+"r")
for+filename+in+t.getnames():
++++try:
++++++++f+=+t.extractfile(filename)
++++++++Data+=+f.read()
++++++++print+filename,+':',+Data
++++except+:
++++++++print+'ERROR:+Did+not+find+%25s+in+tar+archive'+%25+filename|code-block|syntax|javascript|1142642|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

You may try this one

<pre><code>t = tarfile.open("filename.gz", "r")
for filename in t.getnames():
 try:
 f = t.extractfile(filename)
 Data = f.read()
 print filename, ':', Data
 except :
 print 'ERROR: Did not find %s in tar archive' % filename
</code></pre>

blocks|key|974690|text|我的需求：|type|unstyled|depth|inlineStyleRanges|offset|length|style|BOLD|entityRanges|data|974691|Python3。|ordered-list-item|974692|我的tar.gz文件由多个utf-8文本文件和dir组成。|CODE|974693|需要从所有文件中读取文本行。|974694|问题：|974695|tar.getmembers()返回的tar对象(可能是None+)。|974696|内容extractfile(fname)返回的是一个字节str+(例如b‘’Hello\t\xE4\xbd\xA0\xe5\xa5\xbd‘)。Unicode+char没有正确显示。|974697|解决方案：|974698|首先检查tar对象的类型。我参考了tarfile的文档中的示例。(搜索“如何读取gzip压缩的tar存档并显示某些成员信息”)|974699|解码从字节str到普通str。(参考+-多数投票结果)|974700|代码：|974701|with+tarfile.open("sample.tar.gz",+"r:gz")+as+tar:
for+tarinfo+in+tar:
++++logger.info(f"{tarinfo.name}+is+{tarinfo.size}+bytes+in+size+and+is:+")
++++if+tarinfo.isreg():
++++++++logger.info(f"Is+regular+file:+{tarinfo.name}")
++++++++f+=+tar.extractfile(tarinfo.name)++
++++++++#+To+get+the+str+instead+of+bytes+str
++++++++#+Decode+with+proper+coding,+e.g.+utf-8
++++++++content+=+f.read().decode('utf-8',+errors='ignore')
++++++++#+Split+the+long+str+into+lines
++++++++#+Specify+your+line-sep:+e.g.+\n
++++++++lines+=+content.split('\n')
++++++++for+i,+line+in+enumerate(lines):
++++++++++++print(f"[{i}]:+{line}\n")
++++elif+tarinfo.isdir():
++++++++logger.info(f"Is+dir:+{tarinfo.name}")
++++else:
++++++++logger.info(f"Is+something+else:+{tarinfo.name}.")|code-block|syntax|javascript|974702|entityMap|0|LINK|mutability|MUTABLE|url|https://docs.python.org/3/library/tarfile.html|1|https://stackoverflow.com/questions/606191/convert-bytes-to-a-string^0|0|5|0|0|D|5|0|0|0|3|0|S|4|0|2|I|0|0|5|0|P|2|0|0|G|2|1|0|0|3|0|0^^$0|@$1|2|3|4|5|6|7|1G|8|@$9|1H|A|1I|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|1J|8|@]|D|@]|E|$]]|$1|I|3|J|5|H|7|1K|8|@$9|1L|A|1M|B|K]]|D|@]|E|$]]|$1|L|3|M|5|H|7|1N|8|@]|D|@]|E|$]]|$1|N|3|O|5|6|7|1O|8|@$9|1P|A|1Q|B|C]]|D|@]|E|$]]|$1|P|3|Q|5|H|7|1R|8|@$9|1S|A|1T|B|K]]|D|@]|E|$]]|$1|R|3|S|5|H|7|1U|8|@$9|1V|A|1W|B|K]]|D|@]|E|$]]|$1|T|3|U|5|6|7|1X|8|@$9|1Y|A|1Z|B|C]]|D|@]|E|$]]|$1|V|3|W|5|H|7|20|8|@]|D|@$9|21|A|22|1|23]]|E|$]]|$1|X|3|Y|5|H|7|24|8|@]|D|@$9|25|A|26|1|27]]|E|$]]|$1|Z|3|10|5|6|7|28|8|@$9|29|A|2A|B|C]]|D|@]|E|$]]|$1|11|3|12|5|13|7|2B|8|@]|D|@]|E|$14|15]]|$1|16|3|-4|5|6|7|2C|8|@]|D|@]|E|$]]]|17|$18|$5|19|1A|1B|E|$1C|1D]]|1E|$5|19|1A|1B|E|$1C|1F]]]]

My needs:

<ol>
<li>Python3.</li>
<li>My tar.gz file consists of multiple <code>utf-8</code> text files and dir.</li>
<li>Need to read text lines from all files.</li>
</ol>

Problems: 

<ol>
<li>The tar object returned by tar.getmembers() maybe <code>None</code>.</li>
<li>The content <code>extractfile(fname)</code> returns is a bytes str (e.g. b'Hello\t\xe4\xbd\xa0\xe5\xa5\xbd'). Unicode char doesn't display correctly. </li>
</ol>

Solutions:

<ol>
<li>Check the type of tar object first. I reference the example in <a href="https://docs.python.org/3/library/tarfile.html" rel="nofollow noreferrer">doc</a> of tarfile lib. (Search "How to read a gzip compressed tar archive and display some member information")</li>
<li>Decode from byte str to normal str. (<a href="https://stackoverflow.com/questions/606191/convert-bytes-to-a-string">ref</a> - most voted answer)</li>
</ol>

Code:

<pre><code>with tarfile.open("sample.tar.gz", "r:gz") as tar:
for tarinfo in tar:
 logger.info(f"{tarinfo.name} is {tarinfo.size} bytes in size and is: ")
 if tarinfo.isreg():
 logger.info(f"Is regular file: {tarinfo.name}")
 f = tar.extractfile(tarinfo.name) 
 # To get the str instead of bytes str
 # Decode with proper coding, e.g. utf-8
 content = f.read().decode('utf-8', errors='ignore')
 # Split the long str into lines
 # Specify your line-sep: e.g. \n
 lines = content.split('\n')
 for i, line in enumerate(lines):
 print(f"[{i}]: {line}\n")
 elif tarinfo.isdir():
 logger.info(f"Is dir: {tarinfo.name}")
 else:
 logger.info(f"Is something else: {tarinfo.name}.")
</code></pre>

blocks|key|1142709|text|您不能“阅读”一些特殊文件的内容，如链接，但tar支持它们，tarfile将它们提取正常。当tarfile提取它们时，它不会返回类似文件的对象，但不会返回任何对象。你会得到一个错误，因为你的tarball包含这样一个特殊的文件。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1142710|一种方法是在提取之前确定正在处理的tarball中条目的类型:有了这些信息，您可以决定是否可以“读取”该文件。您可以通过调用tarfile.getmembers()返回包含tarball中包含的文件类型的详细信息的tarfile.TarInfo来实现这一点。|1142711|tarfile.TarInfo类拥有确定tar成员类型所需的所有属性和方法，如isfile()或isdir()、tinfo.islnk()或tinfo.issym()，然后相应地决定如何处理每个成员(提取与否等)。|1142712|例如，我使用这些方法测试这个修补的文件中的文件类型，以一种特殊的方式跳过提取特殊文件和处理链接：|1142713|for+tinfo+in+tar.getmembers():
++++is_special+=+not+(tinfo.isfile()+or+tinfo.isdir()
++++++++++++++++++++++or+tinfo.islnk()+or+tinfo.issym())
...|code-block|syntax|javascript|1142714|entityMap|0|LINK|mutability|MUTABLE|url|https://github.com/nexB/scancode-toolkit/blob/68474f46e6bd125a6b4ee441ce760c6929e80482/src/extractcode/tar.py#L130^0|1A|7|0|1Q|K|2Z|F|0|0|F|13|8|1C|7|1K|D|1Y|D|0|C|7|0|0|0^^$0|@$1|2|3|4|5|6|7|Y|8|@$9|Z|A|10|B|C]]|D|@]|E|$]]|$1|F|3|G|5|6|7|11|8|@$9|12|A|13|B|C]|$9|14|A|15|B|C]]|D|@]|E|$]]|$1|H|3|I|5|6|7|16|8|@$9|17|A|18|B|C]|$9|19|A|1A|B|C]|$9|1B|A|1C|B|C]|$9|1D|A|1E|B|C]|$9|1F|A|1G|B|C]]|D|@]|E|$]]|$1|J|3|K|5|6|7|1H|8|@]|D|@$9|1I|A|1J|1|1K]]|E|$]]|$1|L|3|M|5|N|7|1L|8|@]|D|@]|E|$O|P]]|$1|Q|3|-4|5|6|7|1M|8|@]|D|@]|E|$]]]|R|$S|$5|T|U|V|E|$W|X]]]]

You cannot "read" the content of some special files such as links yet tar supports them and tarfile will extract them alright. When <code>tarfile</code> extracts them, it does not return a file-like object but None. And you get an error because your tarball contains such a special file.

One approach is to determine the type of an entry in a tarball you are processing ahead of extracting it: with this information at hand you can decide whether or not you can "read" the file. You can achieve this by calling <code>tarfile.getmembers()</code> returns <code>tarfile.TarInfo</code>s that contain detailed information about the type of file contained in the tarball. 

The <code>tarfile.TarInfo</code> class has all the attributes and methods you need to determine the type of tar member such as <code>isfile()</code> or <code>isdir()</code> or <code>tinfo.islnk()</code> or <code>tinfo.issym()</code> and then accordingly decide what do to with each member (extract or not, etc).

For instance I use these to test the type of file in <a href="https://github.com/nexB/scancode-toolkit/blob/68474f46e6bd125a6b4ee441ce760c6929e80482/src/extractcode/tar.py#L130" rel="nofollow noreferrer">this patched tarfile</a> to skip extracting special files and process links in a special way:

<pre><code>for tinfo in tar.getmembers():
 is_special = not (tinfo.isfile() or tinfo.isdir()
 or tinfo.islnk() or tinfo.issym())
...
</code></pre>

blocks|key|1142756|text|在木星笔记本中，你可以像下面这样做|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1142757|!wget+-c+http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz+-O+-+%7C+tar+-xz|code-block|syntax|javascript|1142758|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

In Jupyter notebook you can do like below

<pre><code>!wget -c http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz -O - | tar -xz
</code></pre>

I have a text file of 25GB. so i compressed it to tar.gz and it became 450 MB. now i want to read that file from python and process the text data.for this i referred <a href="https://stackoverflow.com/questions/2018512/reading-tar-file-contents-without-untarring-it-in-python-script">question</a> . but in my case code doesn't work. the code is as follows : 

<pre><code>import tarfile
import numpy as np 

tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
 f=tar.extractfile(member)
 content = f.read()
 Data = np.loadtxt(content)
</code></pre>

the error is as follows : 

<pre><code>Traceback (most recent call last):
 File "dataExtPlot.py", line 21, in &lt;module&gt;
 content = f.read()
AttributeError: 'NoneType' object has no attribute 'read'
</code></pre>

also, Is there any other method to do this task ?

Read .tar.gz file in Python

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

EdgeOne AI 安全实战专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我有一个25 of的文本文件。所以我把它压缩成tar.gz，它变成了450 MB。现在，我想从python中读取该文件，并处理文本data.for，这是我所提到的。但在我的案例中代码不起作用。守则如下：import tarfileimport numpy as np tar = tarfile.open("filename.tar.gz", "r:gz")for member in tar.get

问用Python读取.tar.gz文件
EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python读取.tar.gz文件EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python读取.tar.gz文件
EN