文件结构
我有一个名为test_folder的父目录,其中有几个(大约600个)子文件夹。每个子文件夹包含以下内容:
metadump.xml文件.pdf、.pptx、.xls或.docx文件目标
我想重命名父目录( .pdf )的每个子文件夹(test_folder)中的相应的.xml、.pptx、.xls或.docx文件,该文件基于来自同一子文件夹中的.xml文件的标题信息,在下面的示例中名为banana。
import os
for root, dirs, files in os.walk("C:\\**\\Downloads\\test_folder"):
for file in files:
if file == 'metadump.xml':
filename = os.path.join(root, file)
# READ XML FILE TO OBTAIN 'TITLE' INFORMATION
with open(filename, 'r', encoding='utf-8') as xml_file:
contents = xml_file.read()
title = re.search('<dc:title rsfieldtitle="Title" rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>', contents).group(1)
print(title) #AS A CHECK FOR SUCCESSFUL TITLE EXTRACTION
# GOING THROUGH FILES AGAIN TO FIND NON-XML FILE
for file in files:
if file != 'metadump.xml':
print(file) #CHECKING THE CORRECT FILE TO BE RENAMED IS SELECTED
src = os.path.join(root, file) #ORIGINAL SOURCE PATH
dst = os.path.join(root, title)#NEW DESTINATION PATH
os.rename(src, dst) #TO RENAME FILES IN THE SUBFOLDER TO THE TITLE运行此操作后,我将收到以下信息:
Project Alpha <--正确的标题已从XML中提取
foobar.pdf <--子文件夹中正确的“其他”文件已被选中用于重命名
[WinError 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\**\\Downloads\\test_folder\\banana\\foobar.pdf' -> 'C:\\**\\Downloads\\test_folder\\banana\\Project Alpha'
我不知道为什么不能用从同一子文件夹中的foobar.pdf文件中提取的“标题”重命名另一个文件,即.XML。
期望输出示例
在test_folder父目录中,banana子文件夹中有:
给予:
foobar.pdf (通用文件名)metadump.xml (在这个文件中,可以提取标题: Project )结果:
Project_Alpha.pdf (这里更改了pdf名称)metadump.xml提前谢谢你的想法!
发布于 2018-06-07 13:56:58
你的压痕都乱七八糟了。您应该首先读取xml,然后重命名其他文件。见下文。
import os
for root, dirs, files in os.walk(r"C:/**/Downloads/test_folder"):
for file in files:
if file == 'metadump.xml':
filename = os.path.join(root, file)
with open(filename, 'r', encoding='utf-8') as f_xml:
contents = f_xml.read()
title = re.search('<dc:title rsfieldtitle="Title" rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>', contents).group(1)
print(title) #AS A CHECK FOR SUCCESSFUL TITLE EXTRACTION
# NOW GO THROUGH YOUR FILES IN CURRENT DIRECTORY AGAIN
for file in files:
if file != 'metadump.xml':
src = os.path.join(root, file)
dst = os.path.join(root, title)
os.rename(src, dst) #TO RENAME FILES IN THE SUBFOLDER TO THE TITLE或者,更好的是:
import os
for root, dirs, files in os.walk(r"C:/**/Downloads/test_folder"):
# find xml file
xmlFile = [r for r in files if r[-3:]=='xml']
filename = os.path.join(root, xmlFile)
with open(filename, 'r', encoding='utf-8') as f_xml:
contents = f_xml.read()
title = re.search('<dc:title rsfieldtitle="Title" rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>', contents).group(1)
print(title) #AS A CHECK FOR SUCCESSFUL TITLE EXTRACTION
# NOW RENAME FILES
[os.rename(os.path.join(root, f), os.path.join(root, title)) for f in files if f[-3:]!='xml']我不知道你在哪里设置文件扩展名,也许你需要有os.rename(文件,标题+ '.jpg')什么的。
https://stackoverflow.com/questions/50742939
复制相似问题