我想部署一个实时预测机器学习模型,用于使用sagemaker进行欺诈检测。
我使用sagemaker jupyter实例:
-load my training data from s3 contains transactions
-preprocessing data and features engineering (i use category_encoders to encode the categorical value)
-training the model and configure the endpoint
对于推断步骤,我使用了一个lambda函数,该函数调用我的端点来获得每个实时事务的预测。
should i calculte again all the features for this real time transactions in lambda function ?
for the features when i use category_encoders with fit_transform() function to transform my categorical feature to numerical one, what should I do because the result will not be the same as training set?
is there another method not to redo the calculation of the features in the inference step?
发布于 2021-09-07 15:45:04
我应该在lambda函数中重新计算这个实时事务的所有功能吗?
是的,当推断一个经过训练的模型(或对实时数据进行预测)时,您应该传递与用于训练模型完全相同的功能列表。如果你在训练时计算一些特征(例如来自timestamp
的part of the day
),你也应该在推理时计算这些特征。
特征的
当我使用带有fit_transform()函数的category_encoders将我的分类特征转换为数值特征时,我应该怎么做,因为结果与训练集不同?
您应该存储用于训练模型的所有转换:数字scalers
、分类encoders
等。
对于python,它看起来是这样的:
import joblib # for dump fitted transformers
import category_encoders as ce
# 1. while training model
# fit encoder on historical data
encoder = ce.OneHotEncoder(cols=[...])
encoder.fit(X, y)
# and dump it
joblib.dump(encoder, 'filename.joblib')
# 2. while inference a trained model
# load fitted encoder
encoder = joblib.load('filename.joblib')
# and apply transformation to new data
encoder.transform(X_new)
https://stackoverflow.com/questions/67422447
复制相似问题