我有一份100点字典的清单如下:
datapoint1 a:1 b:2 c:6
datapoint2 a:2 d:8 p:10
.....
datapoint100: c:9 d:1 z:12
我想将列表打印到文件中,如下所示:
a b c d ...... z
datapoint1 1 2 6 0 ...... 0
datapoint2 2 0 0 8 ...... 0
.........
.........
datapoint100 0 0 9 1 ...... 12
这里提到a,b,c.z只是,例如,实际单词的数量是事先不知道的,所以总字数不是26个,可以是1000/ 10000和a,b,……将被“我的”、“嗨”、“托特”这样的真词取代.等。
我一直在考虑这样做:
但在python中,这种方法似乎很复杂。在python中有什么更好的方法吗?
发布于 2013-03-19 20:26:43
如果您不太关心列对齐的细微之处,这并不是太糟糕:
datapoints = [{'a': 1, 'b': 2, 'c': 6},
{'a': 2, 'd': 8, 'p': 10},
{'c': 9, 'd': 1, 'z': 12}]
# get all the keys ever seen
keys = sorted(set.union(*(set(dp) for dp in datapoints)))
with open("outfile.txt", "wb") as fp:
# write the header
fp.write("{}\n".format(' '.join([" "] + keys)))
# loop over each point, getting the values in order (or 0 if they're absent)
for i, datapoint in enumerate(datapoints):
out = '{} {}\n'.format(i, ' '.join(str(datapoint.get(k, 0)) for k in keys))
fp.write(out)
产生
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
正如注释中提到的,熊猫解决方案也相当不错:
>>> import pandas as pd
>>> df = pd.DataFrame(datapoints).fillna(0).astype(int)
>>> df
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
>>> df.to_csv("outfile_pd.csv", sep=" ")
>>> !cat outfile_pd.csv
a b c d p z
0 1 2 6 0 0 0
1 2 0 0 8 10 0
2 0 0 9 1 0 12
如果您真的需要对齐列,那么也有一些方法可以做到这一点,但是我从来不需要它们,所以我对它们不太了解。
发布于 2013-03-19 20:31:18
计划:
data_points = [
{'a': 1, 'b': 2, 'c': 6},
{'a': 2, 'd': 8, 'p': 10},
{'c': 9, 'd': 1, 'z': 12},
{'e': 3, 'f': 6, 'g': 3}
]
merged_data_points = {
}
for data_point in data_points:
for k, v in data_point.items():
if k not in merged_data_points:
merged_data_points[k] = []
merged_data_points[k].append(v)
# print the merged datapoints
print '{'
for k in merged_data_points:
print ' {0}: {1},'.format(k, merged_data_points[k])
print '}'
输出:
{
a: [1, 2],
c: [6, 9],
b: [2],
e: [3],
d: [8, 1],
g: [3],
f: [6],
p: [10],
z: [12],
}
https://stackoverflow.com/questions/15509299
复制相似问题