我正在尝试读取CSV文件,并使用感兴趣的数据创建一个新文件。有一些行中的特定数据值(在年龄和性别列中)被标记为-1,因此在新的CSV表中不需要。我应该用Pandas库重写它吗?此外,我试图忽略以前的id (因为一些行将被忽略),并将新的行计为新的id。
import csv
data = []
def transform_row(row):
# id = new count
age = line[2]
gender = line[3]
url = line[4]
return [
#new count
age,
gender,
url
]
# read csv file line by line
with open('data_sample.csv', 'r') as f:
reader = csv.reader(f)
""" bad try at ignoring the line with value -1
for value in reader:
if value == '-1':
pass
else:
continue
"""
# loop through each line in csv and transform
for line in reader:
data.append(transform_row(line))
# write a new csv file
with open('data_test.csv', 'w', newline='') as f:
# define new csv writer
writer = csv.writer(f, delimiter=',')
# write a header row to our output.csv file
writer.writerow([
#'id', - new line count as id
'age',
'gender',
'url'
])
# write our data to the file
writer.writerows(data)
此外,欢迎任何其他建议。
发布于 2019-02-20 03:19:55
我通过Pandas重写了脚本。以下是该问题的两种解决方案。
import pandas as pd
cols = [2, 3, 4]
data = pd.read_csv('data_sample.csv', usecols=cols, header=None)
data.columns = ["url", "gender", "age"]
#remove the unneeded columns
data = data[data['gender'] != -1]
data = data[data['age'] != -1]
#reset the index
data.reset_index(drop=True, inplace=True)
""" Additional working solution
indexGender = data[data['gender'] == -1].index
indexAge = data[data['age'] == -1].index
# Delete the rows indexes from dataFrame
data.drop(indexGender,inplace=True)
data.drop(indexAge, inplace=True)
"""
data.to_csv('data_test.csv')
希望它能帮助到一些人。
发布于 2019-02-20 02:28:54
使用pandas
将使您的工作变得容易得多,因为csv
模块不适合粒度数据操作。如果您希望根据特定列的值删除行,则可以将原始csv初始化为数据帧,然后创建一个仅包含所需值的新csv:
import pandas as pd
start_data = pd.read_csv('./data_sample.csv')
# replace 'age' with 'gender' if that's what you prefer
clean_data = start_data[start_data['age'] != -1]
检查start_data
和clean_data
的长度应该会显示所有不需要的行都已被删除。然后,您可以使用以下命令创建新的csv:
clean_data.to_csv('./data_test.csv')
https://stackoverflow.com/questions/54770325
复制相似问题