我正在尝试解析csv文件,并仅从特定列中提取数据。
示例csv:
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
我试图只捕获特定的列,比如ID
、Name
、Zip
和Phone
。
我看过的代码使我相信我可以通过相应的数字来调用特定的列,因此:Name
将对应于2
,使用row[2]
遍历每一行将产生列2中的所有项。但它不是。
以下是我到目前为止所做的工作:
import sys, argparse, csv
from settings import *
# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
fromfile_prefix_chars="@" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file
# open csv file
with open(csv_file, 'rb') as csvfile:
# get number of columns
for line in csvfile.readlines():
array = line.split(',')
first_item = array[0]
num_columns = len(array)
csvfile.seek(0)
reader = csv.reader(csvfile, delimiter=' ')
included_cols = [1, 2, 6, 7]
for row in reader:
content = list(row[i] for i in included_cols)
print content
我期望它只打印出我想要的每一行的特定列,但它不是这样的,我只得到最后一列。
发布于 2013-05-12 10:34:03
import csv
from collections import defaultdict
columns = defaultdict(list) # each value in each column is appended to a list
with open('file.txt') as f:
reader = csv.DictReader(f) # read rows into a dictionary format
for row in reader: # read a row as {column1: value1, column2: value2,...}
for (k,v) in row.items(): # go over each column name and value
columns[k].append(v) # append the value into the appropriate list
# based on column name k
print(columns['name'])
print(columns['phone'])
print(columns['street'])
使用像这样的文件
name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.
将输出
>>>
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']
或者,如果您想要对列进行数字索引:
with open('file.txt') as f:
reader = csv.reader(f)
reader.next()
for row in reader:
for (i,v) in enumerate(row):
columns[i].append(v)
print(columns[0])
>>>
['Bob', 'James', 'Smithers']
要更改分隔符,请向相应的实例化添加delimiter=" "
,即reader = csv.reader(f,delimiter=" ")
发布于 2017-05-23 17:05:23
发布于 2014-01-10 21:46:05
您可以使用numpy.loadtext(filename)
。例如,如果这是您的数据库.csv
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
并且您需要Name
列:
import numpy as np
b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))
>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '],
dtype='|S7')
您可以更轻松地使用genfromtext
b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '],
dtype='|S7')
https://stackoverflow.com/questions/16503560
复制相似问题