我正在导入所有相同列的dask csv文件,columns=['Date', 'Datapint']将csv导入到4000+非常简单,对我来说效果很好。
file_paths = '/root/data/daily/'
df = dd.read_csv(file_paths+'*.csv',
delim_whitespace=True,
names=['Date','Datapoint'])我正在尝试实现的任务是能够将'Datapoint'列命名为.csv的文件名。我知道您可以使用include_path_column = True将列设置为路径。但我想知道是否有一种简单的方法,使用该路径名作为列名,而不必在行中运行单独的步骤。
发布于 2019-10-26 10:01:27
我可以使用dask的延迟函数(相当简单)做到这一点:
import pandas as pd
import dask.dataframe as dd
from dask import delayed
import glob
path = r'/root/data/daily' # use your path
file_list = glob.glob(path + "/*.csv")
def read_and_label_csv(filename):
# reads each csv file to a pandas.DataFrame
df_csv = pd.read_csv(filename,
delim_whitespace=True,
names=['Date','Close'])
df_csv.rename(columns={'Close':path_2_column}, inplace=True)
return df_csv
# create a list of functions ready to return a pandas.DataFrame
dfs = [delayed(read_and_label_csv)(fname) for fname in file_list]
# using delayed, assemble the pandas.DataFrames into a dask.DataFrame
ddf = dd.from_delayed(dfs)https://stackoverflow.com/questions/58567192
复制相似问题