我正在尝试将Linux系统上每个子目录中的文件数汇总到excel表中。
目录通常设置为:maindir/person/task/somedata/files
。但是,子目录的设置各不相同(例如,有些文件可能没有'task
‘目录),所以我需要让python遍历文件路径。
我的问题是,我需要从'person
‘开始的所有子目录名称,并且目前我的代码(如下所示)只附加具有文件数的最近的目录。如果有人能帮我解决这个问题,我将不胜感激!
import os, sys, csv
outwriter = csv.writer(open("Subject_Task_Count.csv", 'w'))
dir_count=[]
os.chdir('./../../')
rootDir = "./" # set the directory you want to start from
for root, dirs, files in os.walk( rootDir ):
for d in dirs:
a = str(d)
count = 0
for f in files:
count+=1
y= (a,count)
dir_count.append(y)
for i in dir_count:
outwriter.writerow(i)
发布于 2012-11-22 07:44:04
我不清楚您的问题,您可能需要重新阅读os.walk文档。root
是正在遍历的当前目录。dirs
是root
中直接的子目录,files
是root
中直接的文件。由于您的代码现在是这样的,所以您可以计算相同的文件(从根目录开始),并将其记录为每个子目录中的文件数。
这就是我想出来的。希望它接近你想要的。如果不是,则adapt :)它输出一个目录、该目录中的文件数以及该目录及其所有子目录中的文件数。
import os
import csv
# Open the csv and write headers.
with open("Subject_Task_Count.csv",'wb') as out:
outwriter = csv.writer(out)
outwriter.writerow(['Directory','FilesInDir','FilesIncludingSubdirs'])
# Track total number of files in each subdirectory by absolute path
totals = {}
# topdown=False iterates lowest level (leaf) subdirectories first.
# This way I can collect grand totals of files per subdirectory.
for path,dirs,files in os.walk('.',topdown=False):
files_in_current_directory = len(files)
# Start with the files in the current directory and compute a
# total for all subdirectories, which will be in the `totals`
# dictionary already due to topdown=False.
files_including_subdirs = files_in_current_directory
for d in dirs:
fullpath = os.path.abspath(os.path.join(path,d))
# On my Windows system, Junctions weren't included in os.walk,
# but would show up in the subdirectory list. this try skips
# them because they won't be in the totals dictionary.
try:
files_including_subdirs += totals[fullpath]
except KeyError as e:
print 'KeyError: {} may be symlink/junction'.format(e)
totals[os.path.abspath(path)] = files_including_subdirs
outwriter.writerow([path,files_in_current_directory,files_including_subdirs])
发布于 2012-11-22 07:01:03
你应该尝试一些类似的东西:
for root,dirs,files in os.walk( rootDir ) :
print root, len(files)
它打印出子目录和文件的数量。
https://stackoverflow.com/questions/13503409
复制相似问题