我希望这能让你们一切顺利。我一直在使用钢制数据集。我试图使用映射方法来one_hot_encode一个分类数据列及其相应的值。但是,当我这样做时,列会在两者之间获得空值。我不明白为什么。one_hot_encoding之前的列没有任何空值。但是,在与相应的值进行映射之后,它会在两者之间获得空值。以下是代码:
df["material_spec"].unique()
array(['Material_0', 'Material_1', 'Material_2', 'Material_3',
'Material_4', 'Material_5', 'Material_6', 'Material_7',
'Material_8', 'Material_9', 'Material_10', 'Material_11',
'Material_12', 'Material_13', 'Material_14', 'Material_15',
'Material_16', 'Material_17', 'Material_18', 'Material_19',
'Material_20', 'Material_21', 'Material_22', 'Material_23',
'Material_24', 'Material_25', 'Material_26', 'Material_27',
'Material_28', 'Material_29', 'Material_30', 'Material_31',
'Material_32', 'Material_33', 'Material_34', 'Material_35',
'Material_36', 'Material_37', 'Material_38', 'Material_39',
'Material_40', 'Material_41', 'Material_42', 'Material_43',
'Material_44', 'Material_45', 'Material_46', 'Material_47',
'Material_48'], dtype=object)这就是我是如何one_hot_encoding的数据:
df["material_spec"] = df["material_spec"].map({"Material_0":0, "Material_1":1,"Material_2":2,"Material_3":3,"Material_4":4,"Material_5":5,
"Material_6":6,"Material_7":7,"Material_8":8,"Material_9":9,"Material_10":10,"Material_11":11,"Material_12":12,
"Material_13":13,"Material_14":14,"Material_15":15,"Material_16":16,"Material_17":17,"Material_18":18,"Material_19":19,"Material:20":20,"Material_21":21,"Material_22":22,"Material_23":23,"Material_24":24,"Material_25":25,"Material_26":26,"Material_27":27,"Material_28":28,
"Material_29":29,"Material_30":30,"Material_31":31,"Material_32":32,"Material_33":33,"Material_34":34,
"Material_35":35,"Material_36":36,"Material_37":37,"Material_38":38,"Material_39":39,
"Material_40":40,"Material_41":41,"Material_42":42,"Material_43":43,"Material_44":44,
"Material_45":45,"Material_46":46,"Material_47":47,"Material_48":48})这是在这个映射之后的结果:
df["material_spec"].isnull().sum()
122有人能告诉我我在这里做错了什么吗。我的热编码方式是错误的还是其他错误造成的?任何建议都会有帮助。谢谢
发布于 2022-05-23 06:31:50
@ansev已经在评论中回答了您的直接问题。
以下是另一种你想做的事情,这对你来说可能更容易:
df["material_spec"].str.extract(r'Material_(\d+)').astype(int)但是你所做的并不是真正的一种热编码,对吗?我认为一个热编码更像这样:
df["material_spec"].str.get_dummies()https://stackoverflow.com/questions/72340897
复制相似问题