Morphl是一家国外提供AI解决方案的公司(PS:这家公司,web UI挺好看的~): 网址:https://morphl.io/products/morphl-cloud.html
MorphL社区版 MorphL Community Edition使用大数据和机器学习来预测数字产品和服务中的用户行为,其目标是通过个性化来提高KPI(点击率,转换率等),主要涵盖的模型包括:
在morphl理论体系里面,预算分配包含两个步骤:
第一步 预算/收入预测函数
f(Cost) = f(Cost(t) | Cost(t-1), Revenue(t-1), ... Cost(t0), Revenue(t0)) = Revenue function
根据历史的预算/收入数据,进行预测
第二步 预算最优化问题 在有了每个活动预算/收入预测函数之后,就可以开始解决预算最优化,以下有三种情况:
黄线是预算/投入金额累计线;
蓝线是预算/投入效率曲线(原文表示:The blue line is the relation between the budget and the returning sum.
)
曲线的顶点就是最佳的budge范围,可以帮助进行预算分配
github地址:Morphl-AI/Ecommerce-Marketing-Spend-Optimization
来看github放开的两个数据源格式:
其中的几个案例,介绍了几种他们常用的方法:
这里其实是非常简单的几种方法
bucket index
概念,还没特别看懂,猜测是一个合理的活动间隔期,类似sessionLet a bucket be: Cost_B=[0, 0, 50, 20, 0, 15] , Revenue_B=[30, 100] . This means that the first revenue (30) was generated by the first two costs alone, so we merged the next bucket as well. We’ll sum them, getting C_{\Sigma B}=85 and R_{\Sigma B}=130 . Then, the bucket constant is: \alpha_B=130/85=1.529 . Then, our pseudo-revenues will be: Pseudo-Revenue_{B} = [0*\alpha_B, 0*\alpha_B, 50*\alpha_B, 20*\alpha_B, 0*\alpha_B, 15*\alpha_B] = [0, 0, 76.45, 30.58, 0, 22.935] .
借助上述例子,猜测,
bucket constant
第四案例,可能是间断式的活动,那么第五个案例,可能是一个长期的案例, 所以这里的bucket时间间隔是固定的1周,以此进行计算。
github地址:Morphl-AI/Ecommerce-Marketing-Spend-Optimization
来看github放开的两个数据源格式:
对应jupyter - 2. Budget optimization - basic statistical model
就是直接 => Rev / Cost
import pandas as pd
'''
模型一:直接算个总的ROI
Directly modeling f(Cost) = Revenue
'''
class StatisticalModel:
def __init__(self):
# This model has just a single parameter, computed as the count between targets and inputs
self.param = np.nan
def fit(self, x, t):
assert self.param != self.param
self.param = t.sum() / x.sum() # 核心,非常简单的算一个ROI,作为系数进行计算
def predict(self, x):
assert self.param == self.param
return x * self.param
def errorL1(y, t):
return np.abs(y - t).mean()
def plot(model, valData, xKey, tKey):
validCampaigns = list(valData.keys())
ax = plt.subplots(len(validCampaigns), figsize=(5, 30))[1]
for i, k in enumerate(validCampaigns):
x = valData[k][xKey]
t = valData[k][tKey]
y = model[k].predict(x)
ax[i].scatter(x, y, label="%s Predicted" % (tKey))
ax[i].scatter(x, t)
ax[i].set_title(k)
ax[i].legend()
# 数据读入
conversion_data = pd.read_csv('Datasets/conversion_data.csv')
# marketing_spend_data = pd.read_csv('Datasets/marketing_spend_data.csv')
model_cost_revenue = {}
predictions_cost_revenue = {}
errors_cost_revenue = {}
displayDf = pd.DataFrame()
res_cost_revenue = []
campaigns = set(conversion_data['xyz_campaign_id'])
# from sklearn.model_selection import train_test_split
# X_train,X_test,y_train,y_test = train_test_split(iris.data,iris.target,test_size=0.3,random_state=0)
trainData = {}
valData = {}
for k in campaigns:
data = conversion_data[conversion_data['xyz_campaign_id'] == k]
num = int(len(data)*0.8)
trainData[k] = data[:num]
valData[k] = data[num:]
# Cost_col = 'Cost'
# Revenue_col = 'Revenue'
Cost_col = 'Spent' # 投入
Revenue_col = 'Total_Conversion' # 产出
for k in campaigns:
model_cost_revenue[k] = StatisticalModel()
model_cost_revenue[k].fit(trainData[k][Cost_col], trainData[k][Revenue_col])
predictions_cost_revenue[k] = model_cost_revenue[k].predict(valData[k][Cost_col])
errors_cost_revenue[k] = errorL1(predictions_cost_revenue[k], valData[k][Revenue_col])
res_cost_revenue.append([k, trainData[k][Cost_col].sum(), trainData[k][Revenue_col].sum(), \
model_cost_revenue[k].param, errors_cost_revenue[k]])
displayDf = pd.DataFrame(res_cost_revenue, columns=["Campaign", Cost_col, Revenue_col, "Fit", "Error (L1)"])
display(displayDf)
print("Mean error:", displayDf["Error (L1)"].mean())
plot(model_cost_revenue, valData, Cost_col, Revenue_col)
只是一个范例,
类似:cost -> 曝光 -> 收入
Cost x Revenue ~= Cost x Sessions + Sessions x Revenue
曝光 = a1 * cost
收入 = a2 * 曝光
分两步走,主要截取的也是2. Budget optimization - basic statistical model
# 随机设定一个session
session_col = 'Impressions' # 曝光
Cost_col = 'Spent' # 投入
Revenue_col = 'Total_Conversion' # 产出
# 第一步:曝光 = a1 * cost
model_cost_sessions = {}
predictions_cost_sessions = {}
errors_cost_sessions = {}
displayDf = pd.DataFrame()
res_cost_sessions = []
for k in campaigns:
model_cost_sessions[k] = StatisticalModel()
model_cost_sessions[k].fit(trainData[k][Cost_col], trainData[k][session_col])
predictions_cost_sessions[k] = model_cost_sessions[k].predict(valData[k][Cost_col])
errors_cost_sessions[k] = errorL1(predictions_cost_sessions[k], valData[k][session_col])
res_cost_sessions.append([k, trainData[k][Cost_col].sum(), trainData[k][session_col].sum(), \
model_cost_sessions[k].param, errors_cost_sessions[k]])
displayDf = pd.DataFrame(res_cost_sessions, columns=["Campaign", Cost_col, session_col, "Fit", "Error (L1)"])
display(displayDf)
print("Mean error:", displayDf["Error (L1)"].mean())
plot(model_cost_sessions, valData, Cost_col, session_col)
# 第二步:收入 = a2 * 曝光
model_sessions_revenue = {}
predictions_sessions_revenue = {}
errors_sessions_revenue = {}
displayDf = pd.DataFrame()
res_sessions_revenue = []
for k in campaigns:
model_sessions_revenue[k] = StatisticalModel()
model_sessions_revenue[k].fit(trainData[k][session_col], trainData[k][Revenue_col])
predictions_sessions_revenue[k] = model_sessions_revenue[k].predict(valData[k][session_col])
errors_sessions_revenue[k] = errorL1(predictions_sessions_revenue[k], valData[k][Revenue_col])
res_sessions_revenue.append([k, trainData[k][session_col].sum(), trainData[k][Revenue_col].sum(), \
model_sessions_revenue[k].param, errors_sessions_revenue[k]])
displayDf = pd.DataFrame(res_sessions_revenue, columns=["Campaign", session_col, Revenue_col, "Fit", "Error (L1)"])
display(displayDf)
print("Mean error:", displayDf["Error (L1)"].mean())
plot(model_sessions_revenue, valData, session_col, Revenue_col)
# 第三步:合并
displayDf = pd.DataFrame()
errors_cost_revenue = {}
res_cost_revenue_combined = []
class TwoModel(object):
def __init__(self, modelA, modelB):
self.modelA = modelA
self.modelB = modelB
def predict(self, x):
return self.modelA.predict(self.modelB.predict(x))
models_cost_revenue = {k : TwoModel(model_cost_sessions[k], model_sessions_revenue[k]) for k in valData}
for k in campaigns:
predictions_cost_revenue[k] = models_cost_revenue[k].predict(valData[k][Cost_col])
errors_cost_revenue[k] = errorL1(predictions_cost_revenue[k], valData[k][Revenue_col])
res_cost_revenue_combined.append([k, errors_cost_revenue[k]])
displayDf = pd.DataFrame(res_cost_revenue_combined, columns=["Campaign", "Error (L1)"])
display(displayDf)
print("Mean error:", displayDf["Error (L1)"].mean())
plot(models_cost_revenue, valData, Cost_col, Revenue_col)