机器学习相关的python库介绍

XXXX-user

发布于 2019-09-12 12:11:27

5910

发布于 2019-09-12 12:11:27

文章被收录于专栏：不仅仅是python

背景介绍

顾名思义，机器学习是计算机编程的科学，通过它可以从不同类型的数据中学习。Arthur Samuel给出的更一般的定义是 - “机器学习是一个研究领域，它使计算机无需明确编程即可学习。”它们通常用于解决各种类型的生活问题。

在过去，人们习惯于通过手动编码所有算法和数学和统计公式来执行机器学习任务。这使得该过程耗时，乏味且低效。但是在现代，与过去各种python库，框架和模块相比，它变得非常简单和高效。今天，Python是这项任务中最流行的编程语言之一，它已经取代了业界的许多语言，其中一个原因是它拥有大量的库。机器学习中使用的Python库是：

Numpy
Scipy
Scikit-learn
Theano
TensorFlow
Keras
PyTorch
Pandas
Matplotlib

Numpy

NumPy是一个非常流行的python库，用于大型多维数组和矩阵处理，借助大量高级数学函数。它对机器学习中的基础科学计算非常有用。它对线性代数，傅立叶变换和随机数能力特别有用。像TensorFlow这样的高端库在内部使用NumPy来操纵Tensors。

# Python program using NumPy

# for some basic mathematical

# operations

import numpy as np

# Creating two arrays of rank 2

x = np.array([[1, 2], [3, 4]])

y = np.array([[5, 6], [7, 8]])

# Creating two arrays of rank 1

v = np.array([9, 10])

w = np.array([11, 12])

# Inner product of vectors

print(np.dot(v, w), "\n")

# Matrix and Vector product

print(np.dot(x, v), "\n")

# Matrix and matrix product

print(np.dot(x, y))

Output:

219

[29 67]

[[19 22]
 [43 50]]

更多关于Numpy请访问Numpy官网https://numpy.org/

SciPy

SciPy是机器学习爱好者中非常受欢迎的库，因为它包含用于优化，线性代数，集成和统计的不同模块。SciPy库和SciPy堆栈之间存在差异。SciPy是构成SciPy堆栈的核心软件包之一。SciPy对图像处理也非常有用。

# Python script using Scipy # for image manipulation from scipy.misc import imread, imsave, imresize # Read a JPEG image into a numpy arrayimg = imread('D:/Programs / cat.jpg') # path of the imageprint(img.dtype, img.shape) # Tinting the imageimg_tint = img * [1, 0.45, 0.3] # Saving the tinted imageimsave('D:/Programs / cat_tinted.jpg', img_tint) # Resizing the tinted image to be 300 x 300 pixelsimg_tint_resize = imresize(img_tint, (300, 300)) # Saving the resized tinted imageimsave('D:/Programs / cat_tinted_resized.jpg', img_tint_resize)

Original image:

Tinted image:

Resized tinted image:

更多关于SciPy请访问官网https://www.scipy.org/

Scikit-learn

Skikit-learn是经典ML算法中最受欢迎的ML库之一。它建立在两个基本的Python库之上，即NumPy和SciPy。Scikit-learn支持大多数有监督和无监督的学习算法。Scikit-learn也可以用于数据挖掘和数据分析，这使它成为一个开始使用ML的好工具。

# Python script using Scikit-learn # for Decision Tree Clasifier # Sample Decision Tree Classifierfrom sklearn import datasetsfrom sklearn import metricsfrom sklearn.tree import DecisionTreeClassifier # load the iris datasetsdataset = datasets.load_iris() # fit a CART model to the datamodel = DecisionTreeClassifier()model.fit(dataset.data, dataset.target)print(model) # make predictionsexpected = dataset.targetpredicted = model.predict(dataset.data) # summarize the fit of the modelprint(metrics.classification_report(expected, predicted))print(metrics.confusion_matrix(expected, predicted))

Output:

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        50
           1       1.00      1.00      1.00        50
           2       1.00      1.00      1.00        50

   micro avg       1.00      1.00      1.00       150
   macro avg       1.00      1.00      1.00       150
weighted avg       1.00      1.00      1.00       150

[[50  0  0]
 [ 0 50  0]
 [ 0  0 50]]

更多关于Scikit-learn请访问官网https://scikit-learn.org/

Theano

我们都知道机器学习基本上是数学和统计学。Theano是一个流行的python库，用于以有效的方式定义，评估和优化涉及多维数组的数学表达式。它是通过优化CPU和GPU的利用率来实现的。它广泛用于单元测试和自我验证，以检测和诊断不同类型的错误。Theano是一个非常强大的库，已经在大规模计算密集型科学项目中使用了很长时间，但是简单易用，足以供个人用于他们自己的项目。

# Python program using Theano# for computing a Logistic # Function import theanoimport theano.tensor as Tx = T.dmatrix('x')s = 1 / (1 + T.exp(-x))logistic = theano.function([x], s)logistic([[0, 1], [-1, -2]])

Output:

array([[0.5, 0.73105858],
       [0.26894142, 0.11920292]])

更多关于Theano请访问http://deeplearning.net/software/theano/

TensorFlow

TensorFlow是一款非常受欢迎的开源库，用于Google脑力团队在谷歌开发的高性能数值计算。顾名思义，Tensorflow是一个涉及定义和运行涉及张量的计算的框架。它可以训练和运行可用于开发多个AI应用程序的深度神经网络。TensorFlow广泛应用于深度学习研究和应用领域。

# Python program using TensorFlow# for multiplying two arrays # import `tensorflow` import tensorflow as tf # Initialize two constantsx1 = tf.constant([1, 2, 3, 4])x2 = tf.constant([5, 6, 7, 8]) # Multiplyresult = tf.multiply(x1, x2) # Initialize the Sessionsess = tf.Session() # Print the resultprint(sess.run(result)) # Close the sessionsess.close()

Output:

[ 5 12 21 32]

更多关于TensorFlow请访问官网https://www.tensorflow.org/

Keras

Keras是一个非常流行的Python机器学习库。它是一个高级神经网络API，能够在TensorFlow，CNTK或Theano之上运行。它可以在CPU和GPU上无缝运行。Keras让ML初学者真正构建和设计神经网络。Keras最棒的一点就是它可以轻松快速地进行原型设计。

官网地址：https://keras.io/

PyTorch

PyTorch是一个流行的基于Torch的Python开源机器学习库，它是一个开源的机器学习库，在C中用Lua中的包装器实现。它拥有广泛的工具和库选择，支持计算机视觉，自然语言处理（NLP）和更多ML程序。它允许开发人员使用GPU加速在Tensors上执行计算，还有助于创建计算图。

# Python program using PyTorch # for defining tensors fit a # two-layer network to random # data and calculating the loss import torch dtype = torch.floatdevice = torch.device("cpu")# device = torch.device("cuda:0") Uncomment this to run on GPU # N is batch size; D_in is input dimension;# H is hidden dimension; D_out is output dimension.N, D_in, H, D_out = 64, 1000, 100, 10 # Create random input and output datax = torch.randn(N, D_in, device = device, dtype = dtype)y = torch.randn(N, D_out, device = device, dtype = dtype) # Randomly initialize weightsw1 = torch.randn(D_in, H, device = device, dtype = dtype)w2 = torch.randn(H, D_out, device = device, dtype = dtype) learning_rate = 1e-6for t in range(500): # Forward pass: compute predicted y h = x.mm(w1) h_relu = h.clamp(min = 0) y_pred = h_relu.mm(w2) # Compute and print loss loss = (y_pred - y).pow(2).sum().item() print(t, loss) # Backprop to compute gradients of w1 and w2 with respect to loss grad_y_pred = 2.0 * (y_pred - y) grad_w2 = h_relu.t().mm(grad_y_pred) grad_h_relu = grad_y_pred.mm(w2.t()) grad_h = grad_h_relu.clone() grad_h[h < 0] = 0 grad_w1 = x.t().mm(grad_h) # Update weights using gradient descent w1 -= learning_rate * grad_w1 w2 -= learning_rate * grad_w2

Output:

0 47168344.0
1 46385584.0
2 43153576.0
...
...
...
497 3.987660602433607e-05
498 3.945609932998195e-05
499 3.897604619851336e-05

更多关于PyTorch请访问 https://pytorch.org/

Pandas

Pandas是一个流行的Python数据库分析库。它与机器学习没有直接关系。我们知道数据集必须在训练前准备好。在这种情况下，Pandas非常方便，因为它是专门为数据提取和准备而开发的。它提供高级数据结构和各种数据分析工具。它提供了许多用于摸索，组合和过滤数据的内置方法。

# Python program using Pandas for # arranging a given set of data # into a table # importing pandas as pdimport pandas as pd data = {"country": ["Brazil", "Russia", "India", "China", "South Africa"], "capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"], "area": [8.516, 17.10, 3.286, 9.597, 1.221], "population": [200.4, 143.5, 1252, 1357, 52.98] } data_table = pd.DataFrame(data)print(data_table)

Output:

Matplotlib

Matpoltlib是一个非常流行的数据可视化Python库。像Pandas一样，它与机器学习没有直接关系。当程序员想要可视化数据中的模式时，它特别有用。它是一个2D绘图库，用于创建2D图形和绘图。一个名为pyplot的模块使编程人员可以轻松进行绘图，因为它提供了控制线条样式，字体属性，格式化轴等功能。它提供了各种图形和图表，用于数据可视化，即直方图，错误图表，条形图等等。