我使用这段代码提取两个具有类似命名约定的CSV,将它们的文件名放在一个"File“列中,并将数据文件连接到一个名为NatHrs的数据文件中。
import glob
from pathlib import Path
path = r'C:\Users\ThisUser\Desktop\AC Mbr Analysis'
all_files = glob.glob(path + '\\Natl_hours_YTD_OC_*.csv')
Nat_dfs = []
for file in all_files:
df = pd.read_csv(file, index_col=None, encoding='windows-1252', header=1 )
df['File'] = file
Nat_dfs.append(df)
NatHrs = pd.concat(Nat_dfs)现在,我想取"File“列,它返回一个文件名对象,其条目类似于"C:\Users\ThisUser\Desktop\AC Mbr Analysis\Natl_hours_YTD_OC_2018-2019",只提取文件名的末尾--在本例中为"2018-2019”--并将这些字符放入一个新的colum "Program年“中,以反映”2018-2019“条目。我在操纵字符串或系列剧方面没有成功--我应该使用path.replace吗?我迷路了。当我描述我想要分析的列时..。
NatHrs['File'].describe...I得到了以下信息:
Name: File, dtype: object>发布于 2020-02-03 16:47:35
我试过这个:
import re
string = NatHrs['File']
short = string.split('\\')[-1]
substring = re.search('\d+[-]*\d+',short).group()
print(substring)
NatHrs['Program Year'] = substring
NatHrs我得到了这个:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-257-f8c37bb604e2> in <module>
3
4 string = NatHrs['File']
----> 5 short = string.split('\\')[-1]
6
7 substring = re.search('\d+[-]*\d+',short).group()
~\anaconda3\envs\PythonData\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5177 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5178 return self[name]
-> 5179 return object.__getattribute__(self, name)
5180
5181 def __setattr__(self, name, value):
AttributeError: 'Series' object has no attribute 'split'这将返回一个文件,该文件读取文件年和程序年之间的这种不一致类型:
File Program Year
C:\Users\HHeatley\Desktop\AC Mbr Analysis\Natl_hours_YTD_OC_2018-2019.csv 2018-2019
C:\Users\HHeatley\Desktop\AC Mbr Analysis\Natl_hours_YTD_OC_2018-2019.csv 2018-2019
C:\Users\HHeatley\Desktop\AC Mbr Analysis\Natl_hours_YTD_OC_2018-2019.csv 2018-2019
C:\Users\HHeatley\Desktop\AC Mbr Analysis\Natl_hours_YTD_OC_2019-2020.csv 2018-2019
C:\Users\HHeatley\Desktop\AC Mbr Analysis\Natl_hours_YTD_OC_2019-2020.csv 2018-2019
C:\Users\HHeatley\Desktop\AC Mbr Analysis\Natl_hours_YTD_OC_2019-2020.csv 2018-2019
C:\Users\HHeatley\Desktop\AC Mbr Analysis\Natl_hours_YTD_OC_2019-2020.csv 2018-2019
C:\Users\HHeatley\Desktop\AC Mbr Analysis\Natl_hours_YTD_OC_2019-2020.csv 2018-2019
C:\Users\HHeatley\Desktop\AC Mbr Analysis\Natl_hours_YTD_OC_2019-2020.csv 2018-2019我也试过这个:
import re
string = file
short = string.split('\\')[-1]
substring = re.search('\d+[-]*\d+',short).group()
print(substring)
NatHrs['Program Year'] = substring
NatHrs还有一个专栏“项目年”只反映了2019-2020年,尽管我希望2018-2019和2019-2020都能出现。
https://stackoverflow.com/questions/60031790
复制相似问题