原创译文 | 直击苹果发布会,深度学习功能Create ML,似乎看起来没什么用?

导读:今年WWDC苹果一款硬件都没有发布,被称为“史上最软苹果发布会”。苹果发布了 iOS12、macOS Mojave,MacOS和iOS的联动堪称生产力工具,但是很多人忽略了苹果面向开发者推出的 Create ML 功能,本文将进行详细介绍。(文末更多往期译文推荐)

苹果在发布会上向开发者推出了一项新功能——Create ML。

机器学习目前已成为开发者的常用工具,因而苹果也想要做出如此改进。但目前推出的本质上是对本地应用进行训练,看起来用处不大。

但最重要的一步是机器学习模型的建立,就比如能够识别面部和语言并将之转化为文本,就是“训练”。计算机也通过“训练”处理大量诸如图片和音频类的数据,并将音频转化为合适的文字。

训练的过程是极其占据CPU的。机器学习所需的算力和内存与我们平时工作所需有着数量级的差距,就好比制作一部商业大电影对比打一局游戏。你也可以在笔记本电脑上完成机器学习,但是四核的Intel处理器和板载GPU实在算力太小,可能要花费几十小时或者几天的时间。

正因如此,“训练”一般在云端完成,因为云端可以集合多台计算机的算力。

Create ML的意义在于,让你在你自己的笔记本上就能完成机器学习。就介绍来看,把数据拖放到界面上,进行一些个性化设置,如果你使用的是顶配iMac Pro,只需20分钟即可准备好训练模型。它还会压缩模型,以便你可以更轻松地将其应用在APP里(这些功能似乎已包含在Apple ML工具中)。这主要是因为它应用了Apple自己的愿景和语言模型,而不是从头构建新的模型。

但实际上,模型的质量在很大程度上取决于训练网络的“层”的性质、安排和精度,以及训练的时间。比如使用MacBook Pro训练,一小时可完成十万亿次的训练量。如果您将这些数据发送到云端,您可以选择在10台计算机之间分配这十万亿次的训练量,在6分钟内即可获得相同的结果,或者可以在一小时内完成百万亿次的训练量,反正肯定会得到一个更好的模型。

这种灵活性是计算服务的核心便利之一,所以目前有很多公司提供云服务,像亚马逊云服务、深蓝云服务等。

一般人不会把敏感的数据放在云端存储,比如病史或者X光片。况且,我认为那些没有经验的单一开发者也根本获取不了某些敏感数据。一块装载有500000人PET扫描数据的硬盘简直就是一场灾难。所以真正的隐私数据都是集中存储的,不会放在云端。

研究机构、医院和大学都与云服务有合作关系,甚至可能有它们自己专用的计算集群。但他们的要求也是不同的,苹果的产品还达不到需求。

笔者似乎在有意挖苦这种本地学习的模式。但苹果的设计理念让人觉得,任何人都可以轻松地把专业的“训练”转到自己的笔记本电脑上,并得到同样的结果。这是不切实际的。也许未来随着平台的多样化,开发人员应用这种本地“训练”模式,但目前它感觉像是一个没什么目的的功能。

原文

Apple’s Create ML is a nice feature with an unclear purpose

Apple announced a new feature for developers today called Create ML. Because machine learning is a commonly used tool in the developer kit these days, it makes sense that Apple would want to improve the process. But what it has here, essentially local training, doesn’t seem particularly useful.

The most important step in the creation of a machine learning model, like one that detects faces or turns speech into text, is the “training.” That’s when the computer is chugging through reams of data like photos or audio and establishing correlations between the input (a voice) and the desired output (distinct words).

This part of the process is extremely CPU-intensive, though. It generally requires orders of magnitude more computing power (and often storage) than you have sitting on your desk. Think of it like the difference between rendering a 3D game like Overwatch and rendering a Pixar film. You could do it on your laptop, but it would take hours or days for your measly four-core Intel processor and onboard GPU to handle.

That’s why training is usually done “in the cloud,” which is to say, on other people’s computers set up specifically for the task, equipped with banks of GPUs and special AI-inclined hardware.

Create ML is all about doing it on your own PC, though: as briefly shown onstage, you drag your data onto the interface, tweak some stuff and you can have a model ready to go in as little as 20 minutes if you’re on a maxed-out iMac Pro. It also compresses the model so you can more easily include it in apps (a feature already included in Apple ML tools, if I remember correctly). This is mainly possible because it’s applying Apple’s own vision and language models, not building new ones from scratch.

The quality of a model depends in great part on the nature, arrangement and precision of the “layers” of the training network, and how long it’s been given to cook. Given an hour of real time, a model trained on a MacBook Pro will have, let’s just make up a number, 10 teraflop-hours of training done. If you send that data to the cloud, you could choose to either have those 10 teraflop-hours split between 10 computers and have the same results in six minutes, or after an hour it could have 100 teraflop-hours of training, almost certainly resulting in a better model.

That kind of flexibility is one of the core conveniences of computing as a service, and why so much of the world runs on cloud platforms like AWS and Azure, and soon dedicated AI processing services like Lobe.

My colleagues suggested that people who are dealing with sensitive data in their models, for example medical history or x-rays, wouldn’t want to put that data in the cloud. But I don’t think that single developers with little or no access to cloud training services are the kind that are likely, or even allowed, to have access to privileged data like that. If you have a hard drive loaded with the PET scans of 500,000 people, that seems like a catastrophic failure waiting to happen. So access control is the name of the game, and private data is stored centrally.

Research organizations, hospitals and universities have partnerships with cloud services and perhaps even their own dedicated computing clusters for things like this. After all, they also need to collaborate, be audited and so on. Their requirements are also almost certainly different and more demanding than Apple’s off the shelf stuff.

I guess I sound like I’m ragging for no reason on a tool that some will find useful. But the way Apple framed it made it sound like anyone can just switch over from a major training service to their own laptop easily and get the same results. That’s just not true. Perhaps as the platform diversifies developers will find ways to make it useful, but for now it feels like a feature without a purpose.

文章编辑:小柳

往期译文推荐:

原创译文 | 中国学校应用人工智能为学生批作文,与老师打分相差无几

原创译文 | 研究人员利用虚拟现实技术训练人工智能无人机,减少无人驾驶汽车的碰撞

原创译文 | 英伟达的詹森•黄谈白宫的人工智能倡议

原创译文 | 比特币上涨,金融专家起诉Facebook加密货币快速致富广告骗局

原创译文 | 为什么AI不能解决Facebook的虚假新闻问题

原文发布于微信公众号 - 灯塔大数据(DTbigdata)

原文发表时间:2018-06-06

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏大数据文摘

通过西部世界来盘点近来人工智能研究的发展

2267
来自专栏IT大咖说

Kubeflow用户研究:Data Scientist是一群什么生物?

内容来源:2018 年 04 月 22 日,Pinlan创始人兼CEO李一帆在“全球首发| Kubeflow Meetup 4.22 杭州场,开拓 AI 新视野...

1472
来自专栏PPV课数据科学社区

【工具】为什么有些公司在机器学习业务方面倾向使用 R + Hadoop 方案?

引言:众所周知,R在解决统计学问题方面无与伦比。但是R在数据量达到2G以上速度就很慢了,于是就催生出了与Hadoop相结合跑分布式算法这种解决方案,但是,pyt...

2847
来自专栏大数据文摘

不再神秘的量子计算,用Python就能实现(视频+代码)

1983
来自专栏腾讯数据中心

腾讯数据中心基础设备质量检测之电流传感器、智能电表篇

背景 2015年8月9日,我们发表了腾讯数据中心基础设备质量检测之温湿度传感器篇,详细阐述了腾讯数据中心近年来严把基础设备质量的前因后果。据后台结果显示,此文送...

2953
来自专栏大数据挖掘DT机器学习

用R语言爬取美国新总统-川普的twitte进行数据分析

Twitter是一个流行的社交网络,这里有大量的数据等着我们分析。Twitter R包是对twitter数据进行文本挖掘的好工具。 本文是关于如何使用Twitt...

4395
来自专栏IT派

TensorFlow成员说:深度学习的未来,在单片机的身上

Pete Warden,是谷歌TensorFlow团队成员,也是TensorFLow Mobile的负责人,常年遨游在深度学习的大海。

1473
来自专栏企鹅号快讯

2017年Dataversity 最受欢迎文章 Top 20 榜单

引言 本文是 TalkingData 艺敏翻译自 DATAVERSITY 的一篇文章,总结了 DATAVERSITY 2017 年最受欢迎的 20 篇文章。 ?...

2238
来自专栏PPV课数据科学社区

还在迷茫?点进来马上get→从零开始学数据分析最佳路线!

? 俗话说读万卷书,行万里路.不如阅人无数,阅人无数不如名师指路.可见一个好的导师是多么的重要,选择正确的路线,就能避免走许多弯路, 让自己站在巨人的肩膀上去...

2926
来自专栏ATYUN订阅号

MIT设计高机动性的自动驾驶船,旨在减轻水路众多的城市交通负担

阿姆斯特丹,曼谷和威尼斯等富含水路的城市,交通运输未来可能会是自动驾驶船,用它来运送货物和人员,帮助清理道路拥堵。

1454

扫码关注云+社区

领取腾讯云代金券