我有一个csv文件,其中我需要删除第二行和第三行以及第3到18列。我可以通过两个步骤让它工作,这就产生了一个临时文件。我在想,肯定有一种更好、更紧凑的方法来做到这一点。如有任何建议,我们将不胜感激。
另外,如果我想要删除多个列区域,如何在此代码中指定。例如,如果除了已经指定的第3到18列之外,我还想删除第25到29列,我该如何添加到代码中呢?谢谢
remove_from = 2
remove_to = 17
with open('file_a.csv', 'rb') as infile, open('interim.csv', 'wb') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
del row[remove_from : remove_to]
writer.writerow(row)
with open('interim.csv', 'rb') as infile, open('file_b.csv', 'wb') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
writer.writerow(next(reader))
reader.next()
reader.next()
for row in reader:
writer.writerow(row)
发布于 2018-06-09 06:54:29
这是一种pandas方法:
第1步,创建示例数据帧
import pandas as pd
# Create sample CSV-file (100x100)
df = pd.DataFrame(np.arange(10000).reshape(100,100))
df.to_csv('test.csv', index=False)
第二步,变魔术
import pandas as pd
import numpy as np
# Read first row to determine size of columns
size = pd.read_csv('test.csv',nrows=0).shape[1]
#want to remove columns 25 to 29, in addition to columns 3 to 18 already specified,
# Ok so let's create an array with the length of dataframe deleting the ranges
ranges = np.r_[3:19,25:30]
ar = np.delete(np.arange(size),ranges)
# Now let's read the dataframe
# let us also skip rows 2 and 3
df = pd.read_csv('test.csv', skiprows=[2,3], usecols=ar)
# And output
dt.to_csv('output.csv', index=False)
证据是:
https://stackoverflow.com/questions/50769101
复制相似问题