极简主义︱使用Turicreate进行快速图像分类迁移训练与预测（六）

悟乙己

发布于 2019-05-26 19:55:33

1.2K0

发布于 2019-05-26 19:55:33

文章被收录于专栏：素质云笔记

文章目录

0 GPU使用情况：
1 训练集准备
2 训练过程
3 模型预测
4 模型评估

0 GPU使用情况：

其中，Turicreate的后台是mxnet框架，turicreate不太适合使用GPU图像训练，因为现在的mxnet已经cuda10 - 1.4.0+ 而turicreate还支持很老版本的 mxnet - 1.1.0，因为版本问题会出现很多问题，一种比较合适的方式是使用他们官方内部的docker启动。如果要启用GPU之前，需要了解：（linuxGPU.md）

Turi Create does not require a GPU, but certain models can be accelerated by the use of a GPU. To enable GPU support in linux after installation of the turicreate package, please perform the following steps:

Install CUDA 8.0 (instructions)
Install cuDNN 5 for CUDA 8.0 (instructions)
Make sure to add the CUDA library path to your LD_LIBRARY_PATH environment variable. In the typical case, this means adding the following line to your ~/.bashrc file:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
If you installed the cuDNN files into a separate directory, make sure to separately add it as well. Next step is to uninstall mxnet and install the CUDA-enabled mxnet-cu80 package:

(venv) pip uninstall -y mxnet
(venv) pip install mxnet-cu80==1.1.0
Make sure you install the same version of MXNet as the one turicreate recommends (currently 1.1.0). If you have trouble setting up the GPU, the MXNet installation instructions may offer additional help.

确实会报错：

Downloading https://docs-assets.developer.apple.com/turicreate/models/resnet-50-symbol.json
Download completed: /var/tmp/model_cache/resnet-50-symbol.json
[13:44:53] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[13:44:53] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!

ERROR: Incomplete installation for leveraging GPUs for computations.
Please make sure you have CUDA installed and run the following line in
your terminal and try again:

    pip uninstall -y mxnet && pip install mxnet-cu90==1.1.0

Adjust 'cu90' depending on your CUDA version ('cu75' and 'cu80' are also available).
You can also disable GPU usage altogether by invoking turicreate.config.set_num_gpus(0)

1 训练集准备

只要把不同的分类的图像，像这样放在不同文件夹即可。

2 训练过程

# 训练文件夹
    # 分门别类存放
# Load images from the downloaded data
reference_data  = tc.image_analysis.load_images(train_file)
reference_data = reference_data.add_row_number()

#reference_data
reference_data["y"] = reference_data["path"].apply(lambda path: "pos" if "pos" in path else 'neg')

dataBuffer = reference_data
trainingBuffers, testingBuffers = dataBuffer.random_split(0.9)

# 模型训练
model = turicreate.image_classifier.create(trainingBuffers, target="y", model="resnet-50")

其中tc.image_analysis.load_images可以读入整个文件夹，也可以读入本地单张图片。 dataBuffer.random_split(0.9)，把数据集随机拆分，按照 9/1 比例; .image_classifier.create，是进行创建模型，target是选择因变量，model目前有以下几种：

Uses a pretrained model to bootstrap an image classifier: - "resnet-50" : Uses a pretrained resnet model. Exported Core ML model will be ~90M. - "squeezenet_v1.1" : Uses a pretrained squeezenet model. Exported Core ML model will be ~4.7M. - "VisionFeaturePrint_Scene": Uses an OS internal feature extractor. Only on available on iOS 12.0+, macOS 10.14+ and tvOS 12.0+. Exported Core ML model will be ~41K.

create函数详解（image_classifier.py）：

def create(dataset, target, feature=None, model = 'resnet-50',
    l2_penalty=0.01, 
    l1_penalty=0.0,
    solver='auto', feature_rescaling=True,
    convergence_threshold = _DEFAULT_SOLVER_OPTIONS['convergence_threshold'],
    step_size = _DEFAULT_SOLVER_OPTIONS['step_size'],
    lbfgs_memory_level = _DEFAULT_SOLVER_OPTIONS['lbfgs_memory_level'],
    max_iterations = _DEFAULT_SOLVER_OPTIONS['max_iterations'],
    class_weights = None,
    validation_set = 'auto',
    verbose=True,
    seed=None,
    batch_size=64):

3 模型预测

# 读入方式一：url
img = turicreate.Image('http://img5.cache.netease.com/house/2014/1/7/2014010711263691ea9_550.jpg')

# 读入方式二：本地文件
img = turicreate.Image('train/pos/p89.jpg')

# 读入方式三：加载本地文件
image_data = tc.image_analysis.load_images('train/pos/p89.jpg')

# 预测
predictions = loaded_model.predict(image_data, output_type='class', batch_size=64)

其中predictions的output_type参数有，可以返回，probability - 概率（1的概率），rank - 排序，class - 分类名称：

# predictions
    # output_type：{'probability', 'margin', 'class', 'probability_vector'}
    # - `probability`: Probability associated with each label in the prediction.
    # - `rank`       : Rank associated with each label in the prediction.
    # - `margin`     : Margin associated with each label in the prediction.

4 模型评估

# Evaluate the model and print the results
metrics = model.evaluate(testingBuffers)
print(metrics['accuracy'])

目前最近版本的评估是错误的，会报错：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-77aa635d24e6> in <module>
      1 # Evaluate the model and print the results
----> 2 metrics = model.evaluate(testingBuffers[:10])
      3 print(metrics['accuracy'])

/usr/local/lib/python3.6/dist-packages/turicreate/toolkits/image_classifier/image_classifier.py in evaluate(self, dataset, metric, verbose, batch_size)
    798         vectors = map(lambda l: {'name': l, 'pos':list(sf_conf_mat[sf_conf_mat['target_label']==l].sort('predicted_label')['norm_prob'])},
    799                     labels)
--> 800         evaluation_result['sorted_labels'] = hclusterSort(vectors, l2Dist)[0]['name'].split("|")
    801 
    802         # Get recall and precision per label

/usr/local/lib/python3.6/dist-packages/turicreate/toolkits/image_classifier/image_classifier.py in hclusterSort(vectors, dist_fn)
    752                     distances.append({'from': v, 'to': new_vec, 'dist': total/len(v.get('members', [v]))/len(new_vec['members'])})
    753 
--> 754                 vecs.append(new_vec)
    755                 distances = sorted(distances, key=lambda d: d['dist'])
    756 

AttributeError: 'filter' object has no attribute 'append'

那么就可以自己通过sklearn写：

from sklearn.metrics import classification_report,accuracy_score,recall_score,f1_score

f1_score(test_data['y'], test_data['pre_y'])  
accuracy_score(test_data['y'], test_data['pre_y'])
recall_score(y_true, y_pred, average='micro')
print(classification_report(test_data['y'], test_data['pre_y']))

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2019年05月22日，如有侵权请联系 cloudcommunity@tencent.com 删除

analysis