我有一些数据,我试图将它们组合在一起:
Serial_Num Latitude Longitude
1950004S11059 -11.1 59.1
1950004S11059 -11.6 57.8
1950004S11059 -12.4 56
1950004S11059 -13.2 54.6
1950004S11059 -13.8 53.8
1950004S11059 -14.8 52.7
1950004S11059 -15.9 52
1950004S11059 -18.3 52.4
1950004S11059 -20 54
1950004S11059 -22.1 55.9
1950004S11059 -26.2 59.8
1950012S14150 -14 146.9
1950012S14150 -14.4 145.8
1950012S14150 -14.9 145.4
1950012S14150 -15.8 145.6
1950012S14150 -18.9 149.1
1950012S14150 -22.3 152.5
1950013S14139 -16 139
1950013S14139 -16.3 139
所以简单地说,对于每个唯一的Serial_Num
,我需要坐标。我期望的结果类似于:
1950004S11059: {"GPS": (-11.1 , 59.1) , (-11.6, 57.8) , (-12.4, 56), ..., (-26.2, 59.8)}
然后我可以遍历每个Serial_Num
的GPS
坐标并绘制。
我有一些我在其他地方使用过的脚本,但主要依赖于使用Serial_Num
作为关键字的.csv数据来建立字典。
但是,csv中的数据是顺序的,顺序很重要。
对于每个Serial_Num
,有什么方法可以按照它们在CSV中的顺序输出坐标列表?
编辑:我现在正在看Pandas,因为它有一个groupBy
方法可能会有帮助。
发布于 2018-02-17 08:15:07
这是一种方法。下面每个步骤的分解。
import pandas as pd
df = pd.read_csv('file.csv', delim_whitespace=True)
df['GPS'] = list(zip(df.Latitude, df.Longitude))
df.groupby('Serial_Num')['GPS'].apply(list).to_dict()
读取数据
df = pd.read_csv('file.csv', delim_whitespace=True)
# Serial_Num Latitude Longitude
# 0 1950004S11059 -11.1 59.1
# 1 1950004S11059 -11.6 57.8
# 2 1950004S11059 -12.4 56.0
# 3 1950004S11059 -13.2 54.6
# 4 1950004S11059 -13.8 53.8
# 5 1950004S11059 -14.8 52.7
使元组列
df['GPS'] = list(zip(df.Latitude, df.Longitude))
# Serial_Num Latitude Longitude GPS
# 0 1950004S11059 -11.1 59.1 (-11.1, 59.1)
# 1 1950004S11059 -11.6 57.8 (-11.6, 57.8)
# 2 1950004S11059 -12.4 56.0 (-12.4, 56.0)
# 3 1950004S11059 -13.2 54.6 (-13.2, 54.6)
# 4 1950004S11059 -13.8 53.8 (-13.8, 53.8)
# 5 1950004S11059 -14.8 52.7 (-14.8, 52.7)
创建字典
df.groupby('Serial_Num')['GPS'].apply(list).to_dict()
# {'1950004S11059': [(-11.1, 59.100000000000001),
# (-11.6, 57.799999999999997),
# (-12.4, 56.0),
# (-13.199999999999999, 54.600000000000001),
# (-13.800000000000001, 53.799999999999997),
# (-14.800000000000001, 52.700000000000003),
# (-15.9, 52.0),
# (-18.300000000000001, 52.399999999999999),
# (-20.0, 54.0),
# (-22.100000000000001, 55.899999999999999),
# (-26.199999999999999, 59.799999999999997)],
# '1950012S14150': [(-14.0, 146.90000000000001),
# (-14.4, 145.80000000000001),
# (-14.9, 145.40000000000001),
# (-15.800000000000001, 145.59999999999999),
# (-18.899999999999999, 149.09999999999999),
# (-22.300000000000001, 152.5)],
# '1950013S14139': [(-16.0, 139.0), (-16.300000000000001, 139.0)]}
发布于 2018-02-17 07:40:39
给定的
名为foo.csv
的文件
Serial_Num Latitude Longitude
1950004S11059 -11.1 59.1
1950004S11059 -11.6 57.8
1950004S11059 -12.4 56
1950004S11059 -13.2 54.6
1950004S11059 -13.8 53.8
1950004S11059 -14.8 52.7
1950004S11059 -15.9 52
1950004S11059 -18.3 52.4
1950004S11059 -20 54
1950004S11059 -22.1 55.9
1950004S11059 -26.2 59.8
1950012S14150 -14 146.9
1950012S14150 -14.4 145.8
1950012S14150 -14.9 145.4
1950012S14150 -15.8 145.6
1950012S14150 -18.9 149.1
1950012S14150 -22.3 152.5
1950013S14139 -16 139
1950013S14139 -16.3 139
以及一些将数据解析为元组(序列,坐标)的代码:
import csv
import collections as ct
def read_file(fname):
with open(fname) as f:
reader = csv.reader(f)
next(reader)
for line in reader:
#line = [x for x in line[0].split(" ") if x]
yield line[0], tuple(map(float, line[1:]))
代码
我们构建了一个defaultdicts的嵌套
data = ct.defaultdict(dict)
for serial, coords in (read_file("foo.csv")):
if serial not in data:
dd = ct.defaultdict(list)
dd["GPS"].append(coords)
data[serial] = dd
dict(data)
输出
{'1950004S11059': defaultdict(list,
{'GPS': [
(-11.1, 59.1),
(-11.6, 57.8),
(-12.4, 56.0),
(-13.2, 54.6),
(-13.8, 53.8),
(-14.8, 52.7),
(-15.9, 52.0),
(-18.3, 52.4),
(-20.0, 54.0),
(-22.1, 55.9),
(-26.2, 59.8)]}),
'1950012S14150': defaultdict(list,
{'GPS': [
(-14.0, 146.9),
(-14.4, 145.8),
(-14.9, 145.4),
(-15.8, 145.6),
(-18.9, 149.1),
(-22.3, 152.5)]}),
'1950013S14139': defaultdict(list,
{'GPS': [
(-16.0, 139.0),
(-16.3, 139.0)]})}
https://stackoverflow.com/questions/48835899
复制相似问题