上周推了一篇关于机器学习算法需要掌握到什么程度的文章
第三重境界便是Python实现主流机器学习模型。
今天向大家推荐普林斯顿博士后 David Bourgin 最近开源的项目:用 NumPy 手写所有主流 ML 模型,看了一下,代码可读性极强。
在每一个代码集下,作者都会提供不同实现的参考资料,例如模型的效果示例图、参考论文和参考链接等。
以线性回归为例,作者不但用500行代码实现了OLS/Ridge/Logistic/Bayesian linear regression
import numpy as np
from ..utils.testing import is_symmetric_positive_definite, is_number
class LinearRegression:
def __init__(self, fit_intercept=True):
"""
An ordinary least squares regression model fit via the normal equation.
Parameters
fit_intercept : bool
Whether to fit an additional intercept term in addition to the
model coefficients. Default is True.
"""
self.beta = None
self.fit_intercept = fit_intercept
def fit(self, X, y):
"""
Fit the regression coefficients via maximum likelihood.
Parameters
----------
X : :py:class:`ndarray <numpy.ndarray>` of shape `(N, M)`
A dataset consisting of `N` examples, each of dimension `M`.
y : :py:class:`ndarray <numpy.ndarray>` of shape `(N, K)`
The targets for each of the `N` examples in `X`, where each target
has dimension `K`.
"""
# convert X to a design matrix if we're fitting an intercept
if self.fit_intercept:
X = np.c_[np.ones(X.shape[]), X]
pseudo_inverse = np.dot(np.linalg.inv(np.dot(X.T, X)), X.T)
self.beta = np.dot(pseudo_inverse, y)
def predict(self, X):
"""
Used the trained model to generate predictions on a new collection of
data points.
Parameters
----------
X : :py:class:`ndarray <numpy.ndarray>` of shape `(Z, M)`
A dataset consisting of `Z` new examples, each of dimension `M`.
Returns
-------
y_pred : :py:class:`ndarray <numpy.ndarray>` of shape `(Z, K)`
The model predictions for the items in `X`.
"""
# convert X to a design matrix if we're fitting an intercept
if self.fit_intercept:
X = np.c_[np.ones(X.shape[]), X]
return np.dot(X, self.beta)
还画出了手写与调用sklearn的对比:
更多精彩内容,值得大家仔细挖掘,相信跟着完整实现一遍之后,大家对机器学习基础的掌握也将极其牢固。另外,建议大家配合作者提供的documentation 一同食用,效果更佳。
项目地址:https://github.com/ddbourgin/numpy-ml
文档地址:https://numpy-ml.readthedocs.io/