用户增长 - BG/NBD概率模型预测用户生命周期LTV（二）

悟乙己

发布于 2021-12-07 15:04:39

1.7K00

代码可运行

文章被收录于专栏：素质云笔记素质云笔记

运行总次数：0

代码可运行

文章目录

目前该系列的几篇：

用户增长——CLV用户生命周期价值CLTV 笔记（一）

用户增长 - BG/NBD概率模型预测用户生命周期LTV（二）

用户增长——Cohort Analysis 留存分析（三）

1 理论

1.1 BG / NBD概率模型介绍

数据运营36计(六)：BG/NBD概率模型预测用户生命周期LTV，Python实现

用户存续期价值评估(CLV) 一

想象你有一批客户，可能会重复购买，有些频繁一些而有些不太频繁。对于预测CLV来说，你希望知道哪些客户是活跃的，会继续从你这里购买产品；每个客户会购买多少？为这些问题建模看似简单实则非常困难。比如说，一个客户以前每天都在你这里购物，但有一个星期没有购物了，这很可能表示这个客户已经流失了；而另一个客户每月购买一次，在同样情况下就不会被认为是流失了。也就是说，流失率和购物频率是非常相关的。其难点在于，我们很难观测到任意客户的流失瞬间，只能对其进行概率建模。

要预测由客户带来的收入，一般会把与客户的业务模型分为有合同的和无合同的、连续的和离散的。

本篇讨论应用最为广泛的适用于 无合同连续业务模型 的概率模型：

Pareto/NBD (negative binomial distribution) 模型
其增强版本BG(Beta Geometric)/NBD模型 - 非契约客户关系情境下重复购买行为（需要购买次数 > 1）

这里注意：

有合同与无合同（或者有合约 / 无合约），比如，合约在国内最有代表的是合约手机。一般互联网产品，合约形态较为少见。CLV 的产品形态要求非合约。

BG/NBD模型又称为贝塔几何/负二项模型，BG/NBD 是一个经典模型改进型，详细的数学论证参见：A Note on Deriving the Pareto/NBD Model and Related Expressions

BG/NBD模型是用于描述非契约客户关系情境下重复购买行为。

即用户可以随时购买产品，无时间约束。该模型可利用用户历史交易数据(RFM)来预测未来每个用户的交易次数和流失率。

该模型的几个假设前提：

（1）【交易假设】用户在活跃状态下，一个用户在时间段t内完成的交易数量服从均值为λt的泊松分布。

在一年中的每个月，客户都以抛硬币的方式来决定是否购买。我们所观察到的购买次数取决与泊松分布的参数 𝜆。

上图展示了以 𝜆=4.3 建模的购买概率的分布情况

（2）【交易假设】用户的交易率λ服从形状参数为r，逆尺度参数为α的gamma分布，PDF函数如下所示。

每个客户有自己的Buy Coin，每个Coin的正反面出现的概率是不一样的。

上图模拟了以形状参数为9，尺度参数为0.5的Gamma分布建模的100个客户的泊松分布

（3）【流失假设】每个用户在交易j完成后流失的概率服从参数为p(流失率)的几何分布，PDF函数如下所示。

客户失效点在交易中呈几何分布。

任何交易之后，客户都会以扔"Die Coin"来确定是否留存

(4)【流失假设】用户的流失率p服从形状参数为a，b的beta分布，PDF函数如下所示。比如100个用户的流失率p服从a=1.0，b=2.5的beta分布时，这100个用户的几何分布图如下所示：

（5）【联合假设】每个用户的交易率λ和流失率p互相独立。

到这里我们似乎看不到与Pareto分布和负二项式分布的关联，理解指数分布与Gamma分布的混合分布为Pareto分布；而泊松分布与Gamma分布的混合分布为负二项分布。

简单理解一下，考虑用投“硬币”来确定用户的流失；用扔“骰子”来决定用户交易次数；

而“硬币”以Pareto分布建模，“骰子”以负二项式分布建模。

差异点：

由于该模型只关注时段T内的交易数量和最终交易日期，所以不能表示具有周期性的客户，根据t的设置周期性的客户可能在预期的交易数和实际的交易数之间有很大的差异。

【参考文献：如何预测（计算）用户价值—BG/NBD模型】

1.2 Gamma-Gamma模型

用户存续期价值评估CLV(三) Gamma-Gamma模型 Python模拟

前面已经提到过，Pareto/NBD和BG/NBD模型只对客户存续时间和交易次数进行建模，并不涉及客户未来交易所带来的现金价值。

Gamma-Gamma模型就是对这个问题的一个扩展解决方案。

Gamma-Gamma模型做了如下假设：

从客户角度上来说，交易金额在每个客户的平均交易价值上随机波动。(这一点并不是很有说服力)
所观察到的交易价值均值是隐含价值均值 𝐸(𝑀) 的非完美计量
交易价值均值在客户中是变化的，即使这个值是稳定的（这个假设非常大)
在客户中的平均交易价值的分布与交易过程无关。换句话说，就是现金价值与客户购买次数和客户存续时间可以分开建模。 (在真实场景下，这个假设很可能不成立)

2 实践案例

2.1 lifetimes实践案例一：在线零售业务的交易

具体可参考：python数据分析：使用lifetimes进行客户终身价值（CLV）探索

不过，本文只是简单实现一些细节还是需要回到原文比较合适。

另外官方文档的quickstart也是非常好的素材：Quickstart

来看一下整个的实现：

2.1.1 数据解读

数据：下载链接

跨国数据集，其中包含2010年12月1日至2011年12月9日期间发生的所有在英国和注册的非商店在线零售业务的交易。该公司主要销售独特的全场礼品。该公司的许多客户都是批发商。

字段为：

InvoiceNo：发票编号；标称值，为每个事务唯一分配的6位整数；如果此代码以字母’c’开头，则表示取消。
StockCode：产品（项目）代码。标称值，为每个不同的产品唯一分配的5位整数。
Description：产品（项目）名称。标称。
Quantity：每笔交易的每件产品（项目）的数量。数字。
InvoiceDate：Invice日期和时间。数字，生成每个事务的日期和时间。
UnitPrice：单价。数字，英镑单位产品价格。
CustomerID：客户编号。标称值，为每个客户唯一分配的5位整数。
Country：国家名称。Nominal，每个客户所在国家/地区的名称。

其中涉及到的几个重要指标：

- recency = 客户最后一次购买商品和第一次购买商品的时间差
- frequency = 客户重复购买商品的期间数
    - freq = 0 => 购买过一次;freq = 1购买过两次
- T = 数据集中的最后一天与客户第一次购买商品的时间差

本篇跳过繁琐的预处理步骤，直接来看一些重要的用法：

2.1.2 BG / NBD - 预期交易的frequency /recency 热力图

想想看:一个客户连续三周每天都从你这里买东西，而我们几个月都没有收到他们的消息。

他们仍然“活着”的可能性有多大?

——很小。

另一方面，过去每季度从你这里购买一次产品、上个季度购买的客户很可能仍然活着。

我们可以使用频率/最近次数矩阵来可视化这种关系，该矩阵计算出一个人工客户在给定他或她的最近次数(最后一次购买的年龄)和频率(他或她重复交易的次数)下一个时间段内的预期交易数量。

Plot recency frequecy matrix as heatmap. Plot a figure of expected transactions in T next units of time by a customer’s frequency and recency.

from lifetimes import BetaGeoFitter
bgf = BetaGeoFitter(penalizer_coef=1)
bgf.fit(data['frequency'], data['recency'], data['T'])
print(bgf)

# 可视化（频率/新进度）矩阵
from lifetimes.plotting import plot_frequency_recency_matrix
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12,8))
plot_frequency_recency_matrix(bgf,T=1)

可以会报错：

ConvergenceError:The model did not converge. Try adding a larger penalizer to see if that helps convergence.

需要设定：

 penalizer_coef 0-> 1 可以正常运行

最终画出的图为，

这里引自官方Quickstart文档的解读：

We can see that if a customer has bought 25 times from you, and their latest purchase was when they were 35 weeks old (given the individual is 35 weeks old), then they are your best customer (bottom-right). Your coldest customers are those that are in the top-right corner: they bought a lot quickly, and we haven’t seen them in weeks.

我们可以看到，如果一个客户从你这里购买了25次，并且他们最近一次购买是在他们350天内的时候，那么他们就是你最好的客户(右下角)，也就是在接下来T期时间内，购买概率高，预测的交易数最多。

你最冷的顾客是那些在右上角的:他们很快买了很多东西，我们已经好几个星期没见过他们了。

在(5,250)附近还有美丽的“尾巴”。这代表那些不经常购物的顾客，但我们最近见过他或她，所以他们可能会再次购买——我们不确定他们是已经死了还是只是在购物之间。

该图，plot函数中源引得是：

   Z = np.zeros((max_recency + 1, max_frequency + 1))
    for i, recency in enumerate(np.arange(max_recency + 1)):
        for j, frequency in enumerate(np.arange(max_frequency + 1)):
            Z[i, j] = model.conditional_expected_number_of_purchases_up_to_time(T, frequency, recency, max_recency)

其中，函数conditional_expected_number_of_purchases_up_to_time( self, t, frequency, recency, T)是在接下来的T时间段内，重复购买的条件预期的交易次数。

不过，一般来说，就是右下角本来意义不大，因为肯定这群购买频繁，时间久的是潜客；

值得召回的是中间“彗星尾巴”的人，这群人还处于“犹豫不定”状态。

2.1.3 顾客留存概率热力图

Plot probability alive matrix as heatmap.
Plot a figure of the probability a customer is alive based on their frequency and recency.

from lifetimes.plotting import plot_probability_alive_matrix
fig = plt.figure(figsize=(12,8))
plot_probability_alive_matrix(bgf)

t = 1
data['predicted_purchases'] = bgf.conditional_expected_number_of_purchases_up_to_time(t, 
                                  data['frequency'], data['recency'], data['T'])
data.sort_values(by='predicted_purchases', ascending=False).head()

我们将客户从“下一期最高预期购买量”排名降至最低。

模型公开了一种方法，该方法将使用其历史记录预测下一时期客户的预期购买：

            frequency  recency      T  monetary_value  predicted_purchases
CustomerID                                                                
17949.0           2.0     82.0  371.0         432.000             0.001048
17838.0           2.0     80.0  372.0         485.270             0.000998
18061.0           2.0     76.0  366.0         206.085             0.000958
16775.0           2.0     76.0  367.0         172.600             0.000951
17526.0           2.0     75.0  365.0         315.500             0.000944

购买次数为2，recency = 75天都有购买的，在T=1时间内，购买概率最高。

这里的存活概率的计算方式，参考该conditional_probability_alive函数，

    def conditional_probability_alive(
        self, 
        frequency, 
        recency, 
        T
    ):
        """
        Compute conditional probability alive.

        Compute the probability that a customer with history
        (frequency, recency, T) is currently alive.

        From http://www.brucehardie.com/notes/021/palive_for_BGNBD.pdf

        Parameters
        ----------
        frequency: array or scalar
            historical frequency of customer.
        recency: array or scalar
            historical recency of customer.
        T: array or scalar
            age of the customer.

        Returns
        -------
        array
            value representing a probability
        """

        r, alpha, a, b = self._unload_params("r", "alpha", "a", "b")

        log_div = (r + frequency) * np.log((alpha + T) / (alpha + recency)) + np.log(
            a / (b + np.maximum(frequency, 1) - 1)
        )

        return np.atleast_1d(np.where(frequency == 0, 1.0, expit(-log_div)))

2.1.4 评估模型效果方式一：模型验证重复购买频率

from lifetimes.plotting import plot_period_transactions
plot_period_transactions(bgf)

模型预测的交易次数和实际的交易次数，可以对比一下模型的预测；

这个里面模型怎么做的预测？

是这个函数conditional_expected_number_of_purchases_up_to_time( self, t, frequency, recency, T)吗？

实际是beta_geometric_nbd_model(T, r, alpha, a, b, size=1)

具体参考：Counting Your Customers" the Easy Way: An Alternative to the Pareto/NBD Model

2.1.5 模型训练

现在将数据集划分为校准周期数据集和保持数据集。

这很重要，因为我们想要测试我们的模型如何对尚未看到的数据执行（就像机器学习实践中的交叉验证一样）

时间划分为三段: 最开始2010/12/1 -> 2010/12/8 -> 2011/2/23(结束)

训练集 - 2010/12/1 - 2010/12/8
验证集 - 2010/12/9 - 2011/2/23

from lifetimes.utils import calibration_and_holdout_data

summary_cal_holdout = calibration_and_holdout_data(df, 'CustomerID', 'InvoiceDate',
                                        calibration_period_end='2010-12-08',
                                        observation_period_end='2011-02-23' )   

summary_cal_holdout.head()

2.1.6 预测结果

from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases
bgf.fit(summary_cal_holdout['frequency_cal'], summary_cal_holdout['recency_cal'], summary_cal_holdout['T_cal'])
plot_calibration_purchases_vs_holdout_purchases(bgf, summary_cal_holdout)

该图通过重复购买次数（x轴）对校准期内的所有客户进行分组，然后在保持期（y轴）中对其重复购买进行平均。

橙线和蓝线分别表示模型预测和y轴的实际结果。正如我们所看到的，我们的模型能够非常准确地预测出样本中客户群的行为，

模型低估了4次购买和5次购买后。

2.1.7 客户交易预测

根据客户历史记录，我们现在可以预测个人未来的购买情况：

t = 10
individual = data.loc[12347]
bgf.predict(t, individual['frequency'], individual['recency'], individual['T'])
>>> 0.0012327964120794066

预测12347用户未来10天内购买商品为概率为0.00123

2.1.8 客户概率历史

根据客户交易历史记录，我们可以根据我们训练的模型计算其存活的历史概率。

例如，我们想看看我们最好的客户的交易历史，看看活着的可能性：

from lifetimes.plotting import plot_history_alive
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12,8))
id = 14606
days_since_birth = 365
sp_trans = df.loc[df['CustomerID'] == id]
plot_history_alive(bgf, days_since_birth, sp_trans, 'InvoiceDate')

# 图二
fig = plt.figure(figsize=(12,8))
id = 15858
days_since_birth = 365
sp_trans = df.loc[df['CustomerID'] == id]
plot_history_alive(bgf, days_since_birth, sp_trans, 'InvoiceDate')

购买频率低，流失可能性大

从上图中我们可以看出，一个客户当发生交易行为后，其留存的概率立刻大幅上升；但之后会逐渐下降。

2.1.9 gamma-gamma模型估算客户终生价值

该模型有个重要前提：购买频次和购买金额无相关性。具体可参考 The Gamma-Gamma Model of Monetary Value （来自：如何计算用户生命周期价值（CLV））

returning_customers_summary = data[data['frequency']>0]
returning_customers_summary.shape[0]
>>> 732

我们仅估算至少有一次重复购买的客户。因此，我们估计有732位客户.

2.1.10 模型相关性

returning_customers_summary[['frequency','monetary_value']].corr()

                frequency  monetary_value
frequency         1.00000         0.03638
monetary_value    0.03638         1.00000

我们用来估计用户群的CLV的模型叫做Gamma-Gamma子模型，它依赖于一个重要的假设。

实际上，Gamma-Gamma子模型假设货币价值和购买频率之间没有关系。

在实践中，我们需要检查两个向量之间的Pearson相关性是否接近于0，才能使用这个模型。

2.1.10 gamma-gamma模型训练

from lifetimes import GammaGammaFitter
ggf = GammaGammaFitter(penalizer_coef = 0.1)

returning_customers_summary = returning_customers_summary[returning_customers_summary['monetary_value'] > 0]

ggf.fit(returning_customers_summary['frequency'],
        returning_customers_summary['monetary_value'])

# 这样我们可以估算每个客户的平均交易价值
ggf.conditional_expected_average_profit(
        returning_customers_summary['frequency'],
        returning_customers_summary['monetary_value']
    ).head()

输出：

CustomerID
12347.0   -2308.791783
12348.0   -1106.827152
12359.0   -8918.603492
12370.0   -1348.044172
12377.0   -3041.798727
dtype: float64

报错:

ValueError: There exist non-positive (<= 0) values in the monetary_value vector.

解决： monetary_value 平均金额有小于0的,奇怪…需要去掉

其中，需要留意：monetary_value 是平均数，不是加总，原文档:

monetary_value can be used to represent profit, or revenue, or any value as long as it is consistently calculated for each customer.

2.1.11 使用DCF method计算CLV

DCF = Discount Cash Flow

DCF 现金流折现，得到用户总体价值的当下估值。

bgf.fit(returning_customers_summary['frequency'], returning_customers_summary['recency'], returning_customers_summary['T'])

print(ggf.customer_lifetime_value(
    bgf, #the model to use to predict the number of future transactions
    returning_customers_summary['frequency'],
    returning_customers_summary['recency'],
    returning_customers_summary['T'],
    returning_customers_summary['monetary_value'],
    time=12, # months
    discount_rate=0.01 # monthly discount rate ~ 12.7% annually
).head(10))

显示结果为：

CustomerID
12347.0   -125.996028
12348.0    -48.894498
12359.0   -286.912474
12370.0     -6.825802
12377.0   -133.733238
12383.0     20.717618
12386.0    -20.218854
12388.0    -16.024161
12393.0     -5.250052
12395.0      8.491594
Name: clv, dtype: float64

2.2 BG-NBD模型用于客群分析

2.2.1 数据载入

这个模型已经在Shopify的Cameron Davidson Pilon的生命周期包中用python实现了，就是lifetimes库；

但这篇文章的目的是打破这个框架，并探索我们在使用这个模型时所经历的步骤。

参考文献：

BG-NBD MODEL FOR CUSTOMER BASE ANALYSIS IN PYTHON

数据运营36计(六)：BG/NBD概率模型预测用户生命周期LTV，Python实现

使用数据：

recency = 客户最后一次购买商品和第一次购买商品的时间差 - t_x
frequency = 客户重复购买商品的期间数 - x
- freq = 0 => 购买过一次;freq = 1购买过两次
T = 数据集中的最后一天与客户第一次购买商品的时间差

比如整个观察周期为39周，某个用户第一次购买在第二周,第二次购买在第三周：

freq = 1
R = tx = 1
T = 37 = 39 - 2

2.2.2 宏观流失率p- beta分布求解

模型原理：首先gamma分布和beta分布分别为参数交易率λ和流失率p的先验分布，而泊松分布和几何分布是样本的分布函数，即似然函数。

接下来建立交易率λ和流失率p的联立似然函数，使用Nelder-Mead的单纯形算法求解gamma分布和beta分布中的参数(r，α，a，b)，这是一种启发式的，非梯度搜索方法来最小化负对数似然代价函数。

current_init_params = np.array([1.0, 1.0, 1.0, 1.0])
output = minimize(
    _func_caller,
    method="Nelder-Mead",
    tol=0.0001,
    x0=current_init_params,
    args=([df['x'], scaled_recency, scaled_T], negative_log_likelihood),
    options={'maxiter': 2000}
)

r = output.x[0]
alpha = output.x[1]
a = output.x[2]
b = output.x[3]

alpha /= scale

print("r = {}".format(r))
print("alpha = {}".format(alpha))
print("a = {}".format(a))
print("b = {}".format(b))

获得:

r = 0.24259412356864324
alpha = 4.413588131347807
a = 0.7929354716520993
b = 2.4259553697230176

计算得到r=0.243，α=4.414，a=0.793，b=2.426.

2.2.3 宏观整体用户预测销量

接下来通过使用上面的四个参数建立预测模型，即求解交易次数的期望E(x)。2F1 为高斯超几何函数。

所以现在我们可以测试群组中任何特定个体的预期销量，假设我们想要看看在一年内(52周)我们可以从个体身上期望购买多少东西：

from scipy.special import hyp2f1

def expected_sales_to_time_t(t):
    hyp2f1_a = r
    hyp2f1_b = b
    hyp2f1_c = a + b - 1
    hyp2f1_z = t / (alpha + t)
    hyp_term = hyp2f1(hyp2f1_a, hyp2f1_b, hyp2f1_c, hyp2f1_z)
    
    return ((a + b - 1) / (a - 1)) * (1-(((alpha / (alpha+t)) ** r) * hyp_term))


expected_sales_to_time_t(52)

现在通过以上代码预测未来52周单个用户的交易次数为1.444次，但计算E(x)为总的用户购买总次数，

这里不能简单的通过单个用户交易次数乘以总用户数得到，因为每个用户的第一次交易时间点不同。

这里统计有ns个用户在第s天进行了第一次交易,那么这ns个用户在未来某段时间内的交易次数是相同的。

# Period of consideration is 39 weeks.
# T indicates the length of time since first purchase
n_s = (39 - df['T']).value_counts().sort_index()

n_s.head()

数据得到18个用户在第1/7天进行了第一次交易，22个用户在第2/7天进行了第一次交易，17个用户在第3/7天进行了第一次交易。

forecast_range = np.arange(0, 78, 1/7.0)

def cumulative_repeat_transactions_to_t(t):
    expected_transactions_per_customer = (t - n_s.index).map(lambda x: expected_sales_to_time_t(x) if x > 0 else 0)
    expected_transactions_all_customers = (expected_transactions_per_customer * n_s).values
    return expected_transactions_all_customers.sum()

cum_rpt_sales = pd.Series(map(cumulative_repeat_transactions_to_t, forecast_range), index=forecast_range)

cum_rpt_sales.tail(10)

最后算出接下来78周用户的交易总次数为4156次。

通过了解到未来用户的交易次数，可以计算未来的收入，从而在现阶段，运营人员可以更好地计算推广预算制定相应的运营推广方案。

2.2.4 微观每个用户交易次数的条件预测

为预测每个用户在未来一段时间内的交易次数，这里推导出条件期望，

#根据用户历史的交易次数和交易时间数据，并根据上面得到的分布函数参数值，条件期望的最终计算公式如下所示

def calculate_conditional_expectation(t, x, t_x, T):
    first_term = (a + b + x - 1) / (a-1)
    hyp2f1_a = r + x
    hyp2f1_b = b + x
    hyp2f1_c = a + b + x - 1
    hyp2f1_z = t / (alpha + T + t)
    hyp_term = hyp2f1(hyp2f1_a, hyp2f1_b, hyp2f1_c, hyp2f1_z)
    second_term = (1 - ((alpha + T) / (alpha + T + t)) ** (r + x) * hyp_term)
    delta = 1 if x > 0 else 0
    denominator = 1 + delta * (a / (b + x - 1)) * ((alpha + T) / (alpha + t_x)) ** (r+x)
    return first_term * second_term / denominator


calculate_conditional_expectation(39, 2, 30.43, 38.86)
>>> 1.225904664486748

某用户过去38.86周内(T=38.86)交易两次(x=2)

第二次交易时间为30.43周(tx=30.43)的条件

计算得到该用户在未来39周将交易1.226次。

通过预测出每个用户未来的交易次数，可以更针对性地细分用户人群，找准目标价值人群从而制定细分运营方案，

比如未来一年52周用户预测出将交易1次，那么该用户有流失的风险，

那么在现阶段实施促销方案（如发放促销卡），提高用户的交易频次将减小用户流失的风险。

3 Lifetimes package的介绍

github地址:https://github.com/CamDavidsonPilon/lifetimes

文档地址：https://lifetimes.readthedocs.io/en/latest/Quickstart.html

一篇好的介绍文章：Lifetimes: Measuring Customer Lifetime Value in Python

Lifetimes can be used to both estimate if these entities are still alive, and predict how much more they will interact based on their existing history. If this is too abstract, consider these situations:

Predicting how often a visitor will return to your website.
Understanding how frequently a patient may return to a hospital.
Predicting individuals who gave “died” using only their usage history.

这里再次简单描述一下lifetimes中几个有意思的应用函数。

plot_frequency_recency_matrix(bgf) 绘制预期交易Frequency/Recency矩阵热力图
bgf.conditional_expected_number_of_purchases_up_to_time 在接下来的T时间段内，重复购买的条件预期的交易次数。比较核心的函数

3.1 函数：summary_data_from_transaction_data

自动计算RF

来看一下这个：

from lifetimes.datasets import load_transaction_data
from lifetimes.utils import summary_data_from_transaction_data

transaction_data = load_transaction_data()
print(transaction_data.head())
"""
                  date  id
0  2014-03-08 00:00:00   0
1  2014-05-21 00:00:00   1
2  2014-03-14 00:00:00   2
3  2014-04-09 00:00:00   2
4  2014-05-21 00:00:00   2
"""

summary = summary_data_from_transaction_data(transaction_data, 'id', 'date', observation_period_end='2014-12-31')

print(summary.head())
"""
frequency  recency      T
id
0         0.0      0.0  298.0
1         0.0      0.0  224.0
2         6.0    142.0  292.0
3         0.0      0.0  147.0
4         2.0      9.0  183.0
"""

bgf.fit(summary['frequency'], summary['recency'], summary['T'])
# <lifetimes.BetaGeoFitter: fitted with 5000 subjects, a: 1.85, alpha: 1.86, b: 3.18, r: 0.16>

数据有date/id就可以计算得到：freq / recency / T，几个重要指标：

- recency = 客户最后一次购买商品和第一次购买商品的时间差
- frequency = 客户重复购买商品的期间数
    - freq = 0 => 购买过一次;freq = 1购买过两次
- T = 数据集中的最后一天与客户第一次购买商品的时间差

3.2 标准建模数据样式

可参考load_cdnow_summary_data_with_monetary_value

from lifetimes.datasets import load_cdnow_summary_data_with_monetary_value

summary_with_money_value = load_cdnow_summary_data_with_monetary_value()
summary_with_money_value.head()
returning_customers_summary = summary_with_money_value[summary_with_money_value['frequency']>0]

print(returning_customers_summary.head())
"""
             frequency  recency      T  monetary_value
customer_id
1                    2    30.43  38.86           22.35
2                    1     1.71  38.86           11.77
6                    7    29.43  38.86           73.74
7                    1     5.00  38.86           11.77
9                    2    35.71  38.86           25.55
"""

frequency /recency / T / monetary_value 四个标准指标，其中monetary_value 不是加总，而是均值。

Monetary_value可以用来表示利润，或收入，或任何值，只要它表达的是每个客户的价值。

参考文献

用户存续期价值评估CLV(二) BG/NBD Model python模拟

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2021/04/22 ，如有侵权请联系 cloudcommunity@tencent.com 删除

python

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！