# 机器学习相关的python库介绍

• Numpy
• Scipy
• Scikit-learn
• Theano
• TensorFlow
• Keras
• PyTorch
• Pandas
• Matplotlib

#### Numpy

NumPy是一个非常流行的python库，用于大型多维数组和矩阵处理，借助大量高级数学函数。它对机器学习中的基础科学计算非常有用。它对线性代数，傅立叶变换和随机数能力特别有用。像TensorFlow这样的高端库在内部使用NumPy来操纵Tensors。

`# Python program using NumPy `

`# for some basic mathematical `

`# operations`

`import` `numpy as np`

`# Creating two arrays of rank 2`

`x =` `np.array([[1, 2], [3, 4]])`

`y =` `np.array([[5, 6], [7, 8]])`

`# Creating two arrays of rank 1`

`v =` `np.array([9, 10])`

`w =` `np.array([11, 12])`

`# Inner product of vectors`

`print(np.dot(v, w), "\n")`

`# Matrix and Vector product`

`print(np.dot(x, v), "\n")`

`# Matrix and matrix product`

`print(np.dot(x, y))`

Output:

```219

[29 67]

[[19 22]
[43 50]]```

### SciPy

SciPy是机器学习爱好者中非常受欢迎的库，因为它包含用于优化，线性代数，集成和统计的不同模块。SciPy库和SciPy堆栈之间存在差异。SciPy是构成SciPy堆栈的核心软件包之一。SciPy对图像处理也非常有用。

# Python script using Scipy # for image manipulation from scipy.misc import imread, imsave, imresize # Read a JPEG image into a numpy arrayimg = imread('D:/Programs / cat.jpg') # path of the imageprint(img.dtype, img.shape) # Tinting the imageimg_tint = img * [1, 0.45, 0.3] # Saving the tinted imageimsave('D:/Programs / cat_tinted.jpg', img_tint) # Resizing the tinted image to be 300 x 300 pixelsimg_tint_resize = imresize(img_tint, (300, 300)) # Saving the resized tinted imageimsave('D:/Programs / cat_tinted_resized.jpg', img_tint_resize)

Original image:

Tinted image:

Resized tinted image:

### Scikit-learn

Skikit-learn是经典ML算法中最受欢迎的ML库之一。它建立在两个基本的Python库之上，即NumPy和SciPy。Scikit-learn支持大多数有监督和无监督的学习算法。Scikit-learn也可以用于数据挖掘和数据分析，这使它成为一个开始使用ML的好工具。

# Python script using Scikit-learn # for Decision Tree Clasifier # Sample Decision Tree Classifierfrom sklearn import datasetsfrom sklearn import metricsfrom sklearn.tree import DecisionTreeClassifier # load the iris datasetsdataset = datasets.load_iris() # fit a CART model to the datamodel = DecisionTreeClassifier()model.fit(dataset.data, dataset.target)print(model) # make predictionsexpected = dataset.targetpredicted = model.predict(dataset.data) # summarize the fit of the modelprint(metrics.classification_report(expected, predicted))print(metrics.confusion_matrix(expected, predicted))

Output:

```DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False, random_state=None,
splitter='best')
precision    recall  f1-score   support

0       1.00      1.00      1.00        50
1       1.00      1.00      1.00        50
2       1.00      1.00      1.00        50

micro avg       1.00      1.00      1.00       150
macro avg       1.00      1.00      1.00       150
weighted avg       1.00      1.00      1.00       150

[[50  0  0]
[ 0 50  0]
[ 0  0 50]]```

### Theano

# Python program using Theano# for computing a Logistic # Function import theanoimport theano.tensor as Tx = T.dmatrix('x')s = 1 / (1 + T.exp(-x))logistic = theano.function([x], s)logistic([[0, 1], [-1, -2]])

Output:

```array([[0.5, 0.73105858],
[0.26894142, 0.11920292]])```

### TensorFlow

TensorFlow是一款非常受欢迎的开源库，用于Google脑力团队在谷歌开发的高性能数值计算。顾名思义，Tensorflow是一个涉及定义和运行涉及张量的计算的框架。它可以训练和运行可用于开发多个AI应用程序的深度神经网络。TensorFlow广泛应用于深度学习研究和应用领域。

# Python program using TensorFlow# for multiplying two arrays # import `tensorflow` import tensorflow as tf # Initialize two constantsx1 = tf.constant([1, 2, 3, 4])x2 = tf.constant([5, 6, 7, 8]) # Multiplyresult = tf.multiply(x1, x2) # Initialize the Sessionsess = tf.Session() # Print the resultprint(sess.run(result)) # Close the sessionsess.close()

Output:

`[ 5 12 21 32]`

### Keras

Keras是一个非常流行的Python机器学习库。它是一个高级神经网络API，能够在TensorFlow，CNTK或Theano之上运行。它可以在CPU和GPU上无缝运行。Keras让ML初学者真正构建和设计神经网络。Keras最棒的一点就是它可以轻松快速地进行原型设计。

### PyTorch

PyTorch是一个流行的基于Torch的Python开源机器学习库，它是一个开源的机器学习库，在C中用Lua中的包装器实现。它拥有广泛的工具和库选择，支持计算机视觉，自然语言处理（NLP）和更多ML程序。它允许开发人员使用GPU加速在Tensors上执行计算，还有助于创建计算图。

# Python program using PyTorch # for defining tensors fit a # two-layer network to random # data and calculating the loss import torch dtype = torch.floatdevice = torch.device("cpu")# device = torch.device("cuda:0") Uncomment this to run on GPU # N is batch size; D_in is input dimension;# H is hidden dimension; D_out is output dimension.N, D_in, H, D_out = 64, 1000, 100, 10 # Create random input and output datax = torch.randn(N, D_in, device = device, dtype = dtype)y = torch.randn(N, D_out, device = device, dtype = dtype) # Randomly initialize weightsw1 = torch.randn(D_in, H, device = device, dtype = dtype)w2 = torch.randn(H, D_out, device = device, dtype = dtype) learning_rate = 1e-6for t in range(500): # Forward pass: compute predicted y h = x.mm(w1) h_relu = h.clamp(min = 0) y_pred = h_relu.mm(w2) # Compute and print loss loss = (y_pred - y).pow(2).sum().item() print(t, loss) # Backprop to compute gradients of w1 and w2 with respect to loss grad_y_pred = 2.0 * (y_pred - y) grad_w2 = h_relu.t().mm(grad_y_pred) grad_h_relu = grad_y_pred.mm(w2.t()) grad_h = grad_h_relu.clone() grad_h[h < 0] = 0 grad_w1 = x.t().mm(grad_h) # Update weights using gradient descent w1 -= learning_rate * grad_w1 w2 -= learning_rate * grad_w2

Output:

```0 47168344.0
1 46385584.0
2 43153576.0
...
...
...
497 3.987660602433607e-05
498 3.945609932998195e-05
499 3.897604619851336e-05```

### Pandas

Pandas是一个流行的Python数据库分析库。它与机器学习没有直接关系。我们知道数据集必须在训练前准备好。在这种情况下，Pandas非常方便，因为它是专门为数据提取和准备而开发的。它提供高级数据结构和各种数据分析工具。它提供了许多用于摸索，组合和过滤数据的内置方法。

# Python program using Pandas for # arranging a given set of data # into a table # importing pandas as pdimport pandas as pd data = {"country": ["Brazil", "Russia", "India", "China", "South Africa"], "capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"], "area": [8.516, 17.10, 3.286, 9.597, 1.221], "population": [200.4, 143.5, 1252, 1357, 52.98] } data_table = pd.DataFrame(data)print(data_table)

Output:

### Matplotlib

Matpoltlib是一个非常流行的数据可视化Python库。像Pandas一样，它与机器学习没有直接关系。当程序员想要可视化数据中的模式时，它特别有用。它是一个2D绘图库，用于创建2D图形和绘图。一个名为pyplot的模块使编程人员可以轻松进行绘图，因为它提供了控制线条样式，字体属性，格式化轴等功能。它提供了各种图形和图表，用于数据可视化，即直方图，错误图表，条形图 等等。

# Python program using Matplotib # for forming a linear plot # importing the necessary packages and modulesimport matplotlib.pyplot as pltimport numpy as np # Prepare the datax = np.linspace(0, 10, 100) # Plot the dataplt.plot(x, x, label ='linear') # Add a legendplt.legend() # Show the plotplt.show()

Output:

