我有一个这样的程序来查找大文件
import os, time, shelve
start = time.time()
root = '/'
# errors= set()
# dirs = set()
while True:
try:
root = os.path.abspath(root) #ensure its a abspath
#set the baseline as 100M
#consider the shift
baseline = 100 * 2**20 # 2*20 is1M
#setup to collect the large files
large_files = []
#root is a better choise as the a concept
for foldername, subfolders, files in os.walk(root):
for f in files:
# print(f"{foldername}, {f}")
abspath = os.path.join(foldername, f)
size = os.path.getsize(abspath)
if size >= baseline:
large_files.append((os.path.basename(abspath), size))
print(abspath, size/(2**20))
#write the large files to shelf
shelf = shelve.open('/root/large_files.db')
shelf["large_files"] = large_files
shelf.close()
if subfolders == []:
end = time.time()
break
except (PermissionError,FileNotFoundError) as e:
# errors.add(e)
pass它一致地输出相同的结果
[root@iz2ze9wve43n2nyuvmsfx5z ~]# python3 search_large_files.py
/dev/core 134217726.0078125
/dev/core 134217726.0078125
/dev/core 134217726.0078125
....然而,我找不到任何理由
print(abspath, size/(2**20))会不断地这样做。
我的代码中可能存在什么问题:
发布于 2018-08-28 11:20:39
while True:有一个无限的外部循环,显然/dev/core是文件系统中唯一超过baseline指定的文件大小的文件,因此它会一遍又一遍地输出相同的文件。
移除while True:并取消内部代码块的缩进,代码就可以正常工作了。
请注意,您的if subfolders == []:条件在for foldername, subfolders, files in os.walk(root):循环之外,因此没有什么用处。您应该无条件地记录结束时间,因此只需删除if条件和break语句即可。
https://stackoverflow.com/questions/52049336
复制相似问题