技术 | 机器学习中Python库的3个简单实践——你的图片将由你来创造

译者 | 婉清

编辑 | 姗姗

出品 | 人工智能头条

【导读】今天为大家介绍机器学习、深度学习中一些优秀、有意思的 Python 库,以及这些库的 Code 实践教程。涉及到的理论与学术内容会附上相应的论文与博客,方便大家参考学习。

01

sg2im:从场景图生成图像

这个优秀的开源代码使用图卷积(graph convolution)来处理输入的图形,通过预测对象的边界框和分割掩码来计算场景布局,并将布局转换为具有级联细化网络(cascaded refinement network)的图像。

代码实现了一个端到端神经网络模型,输入的是场景图而输出的是图像。场景图是一个视景(visual scene)的结构化表示,其中节点表示场景中的对象,边缘表示对象之间的关系。

使用图卷积网络(graph convolution network)处理输入场景图,图卷积网络沿着边缘传递信息,计算所有对象的嵌入向量。这些向量被用于预测所有对象的边界框和分割掩码,他们结合起来形成一个粗略的场景布局。布局被传递到级联细化网络,该网络在增加的空间尺度上生成输出图像。这个模型针对一对鉴别器网络(discriminator networks)进行对抗训练,以确保输出图像看起来较为真实。

论文地址: https://arxiv.org/abs/1804.01622 GitHub 地址: https://github.com/google/sg2im 关于级联细化论文可参阅: Photographic Image Synthesis with Cascaded Refinement Networks https://arxiv.org/abs/1707.09405

▌如何运行和测试代码?

首先复制下面这段代码:

git clone https://github.com/google/sg2im.git

原始代码是在 Ubuntu 16.04 上使用 Python 3.5 和 PyTorch 0.4 进行开发和测试的。不过在虚拟环境中建议尝试一下通过设置虚拟环境来运行,可以参考下面的代码:

python3 -m venv env               # Create a virtual environment
source env/bin/activate           # Activate virtual environment
pip install -r requirements.txt   # Install dependencies
echo $PWD > env/lib/python3.5/site-packages/sg2im.pth  # Add current directory to python path
# Work for a while ...
deactivate  # Exit virtual environment

注意:需要安装python-venv。下面的代码大家可以参考一下。

python3 -m venv --without-pip env # Added the --without-pip
source env/bin/activate           # Activate virtual environment
pip install -r requirements.txt   # Install dependencies
echo $PWD > env/lib/python3.6/site-packages/sg2im.pth  # Add current directory to python path
# Work for a while ...
deactivate  # Exit virtual environment

还需要从 requirements.txt 这个文件中中删除 pkg-resources=0.0.0,否则会出现 bug。至于为什么要删除pkg-resources==0.0.0可以参考链接中的内容介绍。

参考链接: https://stackoverflow.com/questions/39577984/what-is-pkg-resources-0-0-0-in-output-of-pip-freeze-command/39638060。

接下来要运行预训练的模型。

先运行脚本 bash scripts/download_models.sh ,下载模型后再开始,这个过程大约需要 355 MB 的硬盘空间。

  • sg2im-models/coco64.pt:在COCO-Stuff数据集上训练模型并生成64x64的图像。
  • sg2im-models/vg64.pt:在 Visual Genome 数据集上训练模型生成 64x64 图像。
  • sg2im-models/vg128.pt:在 Visual Genome 数据集上训练模型生成 128x128 图像。

参考论文: Image Generation from Scene Graphs https://arxiv.org/pdf/1804.01622.pdf

可以使用简单可读的 JSON 格式,运行脚本 scripts/run_model.py,在新场景图上可以轻松运行任何预训练模型。如果要重新创建上面的绵羊图像,需要运行下面这行代码:

python scripts/run_model.py \
  --checkpoint sg2im-models/vg128.pt \
  --scene_graphs scene_graphs/figure_6_sheep.json \
  --output_dir outputs

下面是得到的图像结果

接下来我们一起看一下这段代码:

[
  {
    "objects": ["sky", "grass", "zebra"],
    "relationships": [
      [0, "above", 1],
      [2, "standing on", 1]
    ]
  },
  {
    "objects": ["sky", "grass", "sheep"],
    "relationships": [
      [0, "above", 1],
      [2, "standing on", 1]
    ]
  },
  {
    "objects": ["sky", "grass", "sheep", "sheep"],
    "relationships": [
      [0, "above", 1],
      [2, "standing on", 1],
      [3, "by", 2]
    ]
  },
  {
    "objects": ["sky", "grass", "sheep", "sheep", "tree"],
    "relationships": [
      [0, "above", 1],
      [2, "standing on", 1],
      [3, "by", 2],
      [4, "behind", 2]
    ]
  },
  {
    "objects": ["sky", "grass", "sheep", "sheep", "tree", "ocean"],
    "relationships": [
      [0, "above", 1],
      [2, "standing on", 1],
      [3, "by", 2],
      [4, "behind", 2],
      [5, "by", 4]
    ]
  },
  {
    "objects": ["sky", "grass", "sheep", "sheep", "tree", "ocean", "boat"],
    "relationships": [
      [0, "above", 1],
      [2, "standing on", 1],
      [3, "by", 2],
      [4, "behind", 2],
      [5, "by", 4],
      [6, "in", 5]
    ]
  },
  {
    "objects": ["sky", "grass", "sheep", "sheep", "tree", "ocean", "boat"],
    "relationships": [
      [0, "above", 1],
      [2, "standing on", 1],
      [3, "by", 2],
      [4, "behind", 2],
      [5, "by", 4],
      [6, "on", 1]
    ]
  }
]

首先分析第一段:

{
    "objects": ["sky", "grass", "zebra"],
    "relationships": [
      [0, "above", 1],
      [2, "standing on", 1]
    ]
  }

对象:sky [0]、grass [1]、zebra [2]

关系:sky [0] 在 grass [1] 的上面 ("above")

zebra [2] 站在 grass [1] 上 ("standing on")

也可以创建一段类似的新代码来测试一下刚刚的效果:

[{
    "objects": ["sky", "grass", "dog", "cat", "tree", "ocean", "boat"],
    "relationships": [
      [0, "above", 1],
      [2, "standing on", 1],
      [3, "by", 2],
      [4, "behind", 2],
      [5, "by", 4],
      [6, "on", 1]
    ]
  }]

运行:

python scripts/run_model.py \
  --checkpoint sg2im-models/vg128.pt \
  --scene_graphs scene_graphs/figure_blog.json \
  --output_dir outputs

得到的图片是:

虽然看着有点奇怪,但是这个过程还是很有意思的。

02 TheAlgorithms/Python: 在Python中实现的所有算法

编程是数据科学中的必备技能,在这个伟大的知识资源库中,为大家介绍几个重要的算法实现。但是这些仅用于演示,由于性能的原因,在Python标准库中有许多更好的实现。

在Python标准库中你可以找到机器学习代码、神经网络、动态变成、排序、哈希等等。下面的代码教程是关于如何在 Python 中用 Numpy 从零开始构建 K-means。

'''README, Author - Anurag Kumar(mailto:anuragkumarak95@gmail.com)
Requirements:
  - sklearn
  - numpy
  - matplotlib
Python:
  - 3.5
Inputs:
  - X , a 2D numpy array of features.
  - k , number of clusters to create.
  - initial_centroids , initial centroid values generated by utility function(mentioned in usage).
  - maxiter , maximum number of iterations to process.
  - heterogeneity , empty list that will be filled with hetrogeneity values if passed to kmeans func.
Usage:
  1. define 'k' value, 'X' features array and 'hetrogeneity' empty list

  2. create initial_centroids,
        initial_centroids = get_initial_centroids(
            X, 
            k, 
            seed=0 # seed value for initial centroid generation, None for randomness(default=None)
            )
  3. find centroids and clusters using kmeans function.

        centroids, cluster_assignment = kmeans(
            X, 
            k, 
            initial_centroids, 
            maxiter=400,
            record_heterogeneity=heterogeneity, 
            verbose=True # whether to print logs in console or not.(default=False)
            )


  4. Plot the loss function, hetrogeneity values for every iteration saved in hetrogeneity list.
        plot_heterogeneity(
            heterogeneity, 
            k
        )

  5. Have fun..

'''
from __future__ import print_function
from sklearn.metrics import pairwise_distances
import numpy as np

TAG = 'K-MEANS-CLUST/ '

def get_initial_centroids(data, k, seed=None):
    '''Randomly choose k data points as initial centroids'''
    if seed is not None: # useful for obtaining consistent results
        np.random.seed(seed)
    n = data.shape[0] # number of data points

    # Pick K indices from range [0, N).
    rand_indices = np.random.randint(0, n, k)

    # Keep centroids as dense format, as many entries will be nonzero due to averaging.
    # As long as at least one document in a cluster contains a word,
    # it will carry a nonzero weight in the TF-IDF vector of the centroid.
    centroids = data[rand_indices,:]

    return centroids

def centroid_pairwise_dist(X,centroids):
    return pairwise_distances(X,centroids,metric='euclidean')

def assign_clusters(data, centroids):

    # Compute distances between each data point and the set of centroids:
    # Fill in the blank (RHS only)
    distances_from_centroids = centroid_pairwise_dist(data,centroids)

    # Compute cluster assignments for each data point:
    # Fill in the blank (RHS only)
    cluster_assignment = np.argmin(distances_from_centroids,axis=1)

    return cluster_assignment

def revise_centroids(data, k, cluster_assignment):
    new_centroids = []
    for i in range(k):
        # Select all data points that belong to cluster i. Fill in the blank (RHS only)
        member_data_points = data[cluster_assignment==i]
        # Compute the mean of the data points. Fill in the blank (RHS only)
        centroid = member_data_points.mean(axis=0)
        new_centroids.append(centroid)
    new_centroids = np.array(new_centroids)

    return new_centroids

def compute_heterogeneity(data, k, centroids, cluster_assignment):

    heterogeneity = 0.0
    for i in range(k):

        # Select all data points that belong to cluster i. Fill in the blank (RHS only)
        member_data_points = data[cluster_assignment==i, :]

        if member_data_points.shape[0] > 0: # check if i-th cluster is non-empty
            # Compute distances from centroid to data points (RHS only)
            distances = pairwise_distances(member_data_points, [centroids[i]], metric='euclidean')
            squared_distances = distances**2
            heterogeneity += np.sum(squared_distances)

    return heterogeneity

from matplotlib import pyplot as plt
def plot_heterogeneity(heterogeneity, k):
    plt.figure(figsize=(7,4))
    plt.plot(heterogeneity, linewidth=4)
    plt.xlabel('# Iterations')
    plt.ylabel('Heterogeneity')
    plt.title('Heterogeneity of clustering over time, K={0:d}'.format(k))
    plt.rcParams.update({'font.size': 16})
    plt.show()

def kmeans(data, k, initial_centroids, maxiter=500, record_heterogeneity=None, verbose=False):
    '''This function runs k-means on given data and initial set of centroids.
       maxiter: maximum number of iterations to run.(default=500)
       record_heterogeneity: (optional) a list, to store the history of heterogeneity as function of iterations
                             if None, do not store the history.
       verbose: if True, print how many data points changed their cluster labels in each iteration'''
    centroids = initial_centroids[:]
    prev_cluster_assignment = None

    for itr in range(maxiter):        
        if verbose:
            print(itr, end='')

        # 1. Make cluster assignments using nearest centroids
        cluster_assignment = assign_clusters(data,centroids)

        # 2. Compute a new centroid for each of the k clusters, averaging all data points assigned to that cluster.
        centroids = revise_centroids(data,k, cluster_assignment)

        # Check for convergence: if none of the assignments changed, stop
        if prev_cluster_assignment is not None and \
          (prev_cluster_assignment==cluster_assignment).all():
            break

        # Print number of new assignments 
        if prev_cluster_assignment is not None:
            num_changed = np.sum(prev_cluster_assignment!=cluster_assignment)
            if verbose:
                print('    {0:5d} elements changed their cluster assignment.'.format(num_changed))   

        # Record heterogeneity convergence metric
        if record_heterogeneity is not None:
            # YOUR CODE HERE
            score = compute_heterogeneity(data,k,centroids,cluster_assignment)
            record_heterogeneity.append(score)

        prev_cluster_assignment = cluster_assignment[:]

    return centroids, cluster_assignment

# Mock test below
if False: # change to true to run this test case.
    import sklearn.datasets as ds
    dataset = ds.load_iris()
    k = 3
    heterogeneity = []
    initial_centroids = get_initial_centroids(dataset['data'], k, seed=0)
    centroids, cluster_assignment = kmeans(dataset['data'], k, initial_centroids, maxiter=400,
                                        record_heterogeneity=heterogeneity, verbose=True)
    plot_heterogeneity(heterogeneity, k)

GitHub 地址:https://github.com/TheAlgorithms

03 mlens :ML-Ensemble,  — 高性能集成学习

ML-Ensemble将Scikit-learn高级API与低级计算图框架结合在一起,以尽可能少的代码行构建高效、最大并行化的集成网络。只要基础学习者能够并且可以依靠内存映射的多处理来实现与内存无关的基于进程的并发,那么ML-Ensemble就是线程安全的。有关教程和完成的文档,请访问项目网站。

访问链接: http://ml-ensemble.com/ GitHub 地址: https://github.com/flennerhag/mlens

▌通过PyPI安装

ML-Ensemble 可在 PyPI 上使用。可以这样安装:

pip install mlens

一个简单的示例(iris obligated示例):

import numpy as np
from pandas import DataFrame
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

seed = 2017
np.random.seed(seed)

data = load_iris()
idx = np.random.permutation(150)
X = data.data[idx]
y = data.target[idx]
from mlens.ensemble import SuperLearner
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

# --- Build ---
# Passing a scoring function will create cv scores during fitting
# the scorer should be a simple function accepting to vectors and returning a scalar
ensemble = SuperLearner(scorer=accuracy_score, random_state=seed, verbose=2)

# Build the first layer
ensemble.add([RandomForestClassifier(random_state=seed), SVC()])

# Attach the final meta estimator
ensemble.add_meta(LogisticRegression())

# --- Use ---

# Fit ensemble
ensemble.fit(X[:75], y[:75])

# Predict
preds = ensemble.predict(X[75:])

将得到结果:

Fitting 2 layers
Processing layer-1             done | 00:00:00
Processing layer-2             done | 00:00:00
Fit complete                        | 00:00:00

Predicting 2 layers
Processing layer-1             done | 00:00:00
Processing layer-2             done | 00:00:00
Predict complete                    | 00:00:00

要检查图层中估算器的性能,需调用data属性。该属性可包装在pandas.DataFrame 中。

print("Fit data:\n%r" % ensemble.data)

结果

Fit data:
                                   score-m  score-s  ft-m  ft-s  pt-m  pt-s
layer-1  randomforestclassifier       0.84     0.06  0.05  0.00  0.00  0.00
layer-1  svc                          0.89     0.05  0.01  0.01  0.00  0.00

结果还不错,再看看整体表现:

Prediction score: 0.960

这部分内容还有更详细的教程,大家可以访问下面的链接,学习更多。

更多内容可以参考:

http://ml-ensemble.com/info/tutorials/start.html

原文链接:

https://towardsdatascience.com/weekly-python-digest-for-data-science-1st-week-july-83bbf0355c36

pandas.DataFrame

参考链接:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame

从利用场景图到图片生成这个有趣的过程,大家可以天马行空的去定义创造自己的图片;就算有有从零开始全部的 Coding 可以参考,也需要大家认知研究,从中有更多的收获。后续人工智能头条也还会继续努力为大家推荐更多有用、好用的实践教程。

*本文由人工智能头条整理编译,转载请联系编辑(微信1092722531)

原文发布于微信公众号 - 人工智能头条(AI_Thinker)

原文发表时间:2018-07-13

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏大数据智能实战

LargeVis可视化技术学习

大图可视化一直是大数据可视化领域的一个关键技术,当前有各种办法,但是今年出来了一个LargeVis的技术,因此对这个技术进行复现和学习一下。 前面有很多基础理论...

45970
来自专栏新智元

【干货】神经增强:用 Python 实现深度学习超分辨率处理

【新智元导读】神经网络基于样本图像的训练为模糊图像补充细节,从而把模糊图像变高清。它不能把你的照片重建成一模一样的高清版。这只有好莱坞大片才有可能做到——但使用...

75150
来自专栏和蔼的张星的图像处理专栏

LCT代码跑起来先文章思路总结

论文才刚开始看,但是代码先跑了一下看结果,有一点小坑,记录下: 首先去论文的github上去下载代码:点这里 readme里其实写了怎么搞:

64530
来自专栏目标检测和深度学习

Kaggle放大招:简单几步实现海量数据分析及可视化

Kaggle Kerneler bot是一个自动生成的kernel,其中包含了演示如何读取数据以及分析工作的starter代码。用户可以进入任意一个已经发布的项...

7900
来自专栏AI研习社

从 MAX 网站中获取模型,一秒开始你的深度学习应用

您是否想过对图像进行分类、识别图像中的人脸或位置、处理自然语言或文本,或者根据应用程序中的时间序列数据创建推荐? 通过深度学习(使用深度神经网络的机器学习),你...

7620
来自专栏ATYUN订阅号

将Keras权值保存为动画视频,更好地了解模型是如何学习的

将Keras权值矩阵保存为简短的动画视频,从而更好地理解你的神经网络模型是如何学习的。下面是第一个LSTM层的例子,以及一个经过一个学习周期训练的6级RNN模型...

38140
来自专栏ATYUN订阅号

【实践操作】 在iOS11中使用Core ML 和TensorFlow对手势进行智能识别

在计算机科学中,手势识别是通过数学算法来识别人类手势的一个议题。用户可以使用简单的手势来控制或与设备交互,让计算机理解人类的行为。 这篇文章将带领你实现在你自己...

55260
来自专栏SIGAI学习与实践平台

【免费线上实践】动手训练模型系列:条件GAN

从无序的输出到按照类别输出,Conditional Generative Neural Networks到底借助了什么样的魔(xin)法(xi)?点击下方小程序...

14850
来自专栏新智元

10 亿图片仅需 17.7微秒:Facebook AI 实验室开源图像搜索工具Faiss

【新智元导读】Facebook的 FAIR 最新开源了一个用于有效的相似性搜索和稠密矢量聚类的库,名为 Faiss,在10亿图像数据集上的一次查询仅需17.7 ...

46650
来自专栏大数据文摘

目标检测必须要OpenCV?10行Python代码也能实现,亲测好用!

本文作者和他的团队构建了一个名为ImageAI 的Python库,集成了现今流行的深度学习框架和计算机视觉库。本文将手把手教你构建自己的第一个目标检测应用,而且...

22360

扫码关注云+社区

领取腾讯云代金券