我的目标是在Pandas DataFrame中添加一个新列,但我遇到了一个奇怪的错误。
新列应该是现有列的转换,可以在字典/hashmap中进行查找。
# Loading data
df = sqlContext.read.format(...).load(train_df_path)
# Instanciating the map
some_map = {
'a': 0,
'b': 1,
'c': 1,
}
# Creating a new column using the map
df['new_column'] = df.apply(lambda row: some_map(row.some_column_name), axis=1)
这会导致以下错误:
AttributeErrorTraceback (most recent call last)
<ipython-input-12-aeee412b10bf> in <module>()
25 df= train_df
26
---> 27 df['new_column'] = df.apply(lambda row: some_map(row.some_column_name), axis=1)
/usr/lib/spark/python/pyspark/sql/dataframe.py in __getattr__(self, name)
962 if name not in self.columns:
963 raise AttributeError(
--> 964 "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
965 jc = self._jdf.apply(name)
966 return Column(jc)
AttributeError: 'DataFrame' object has no attribute 'apply'
其他可能有用的信息:*我正在使用Spark和Python 2。
发布于 2018-06-05 02:42:45
你有一个spark数据帧,而不是一个熊猫数据帧。要向spark数据框添加新列,请执行以下操作:
import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType
df = df.withColumn('new_column', F.udf(some_map.get, IntegerType())(some_column_name))
df.show()
https://stackoverflow.com/questions/50686616
复制相似问题