这个discussion涵盖了dtypes
和converters
在pandas.read_csv
函数中的区别。
我在文档中找不到与pandas.DataFrame构造函数等效的转换器。
如果我直接从列表中构建数据,那么模仿相同行为的最佳方法是什么?
一些虚构的例子:
# data.csv
sport,population
football,15M
darts,50k
sailing,3000
# convert_csv_to_df.py
import pandas as pd
def f_population_to_int(population):
dict_multiplier={"k": 1000, "M": 1000000}
try:
multiplier = dict_multiplier[population[-1]]
return int(population[0:-1]) * multiplier
except KeyError:
return population
dict_converters = {"population": f_population_to_int}
df = pd.read_csv("data.csv", converters=dict_converters)
产出:
sport population
0 football 15000000
1 darts 50000
2 sailing 3000
从列表中获取相同数据的最佳方法是什么?
data = [["sports", "population"], ["football", "15M"], ["darts", "50k"], ["sailing", 3000]]
编辑以求澄清:
示例dict_converter只包含一个函数,但其思想是能够对多个列应用不同的转换。
发布于 2021-08-25 01:47:13
更改f_population_to_int
函数以返回相同的值,如果有任何错误(删除KeyError
),并在创建DataFrame之后使用Series.apply
data = [["sports", "population"], ["football", "15M"], ["darts", "50k"], ["sailing", 3000]]
def f_population_to_int(population):
dict_multiplier={"k": 1000, "M": 1000000}
try:
multiplier = dict_multiplier[population[-1]]
return int(population[0:-1]) * multiplier
except:
return population
df = pd.DataFrame(data[1:], columns=data[0])
df['population'] = df['population'].apply(f_population_to_int)
print (df)
sports population
0 football 15000000
1 darts 50000
2 sailing 3000
如果需要,可以使用dict dict_converters
:
dict_converters = {"population": f_population_to_int}
for k, v in dict_converters.items():
df[k] = df[k].apply(v)
https://stackoverflow.com/questions/68920479
复制相似问题