原创译文 | 为什么AI不能解决Facebook的虚假新闻问题

转载声明

本文为灯塔大数据原创内容,欢迎个人转载至朋友圈,其他机构转载请在文章开头标注:“转自:灯塔大数据;微信:DTbigdata”

导读:上一期了解了关于将AI应用到供应链中的相关内容,今天我们来了解一下AI不能解决Facebook棘手问题的原因(文末更多往期译文推荐)

Facebook现在面临着许多问题,但其中有一个需要长时间面对——虚假新闻。当公司的用户群成长到超过地球人口的四分之一,它就要努力控制用户的发表和分享。对于Facebook来说,不受欢迎的内容可以是轻微的裸露,也可以是严重的暴力,但事实证明,对公司来说最敏感和最具破坏性的是恶作剧和错误信息,尤其是当它具有政治倾向时就更为严重。

那么Facebook将如何应对?目前,公司似乎还没有一个明确的战略。Facebook实验了很多方法——雇佣更多的人类版主(截至今年二月约7500人左右)、为用户提供更多关于新闻来源的信息,而在最近的采访中,Mark Zuckerberg表示公司可能建立某种独立部门来决定哪些内容是符合犹太教规的。专家称,Facebook若想把所有工作交给人工智能,则需要非常小心。

在纽约时报采访剑桥分析丑闻时,Zuckerberg透露,去年阿拉巴马州的特别选举,公司“部署了新的AI工具来识别假户口、假信息”。他指出,这些是马其顿账户(假新闻营利业务中建立的中心),公司后来澄清说,已经部署机器学习来发现“可疑行为而无需评估内容本身”。

这是一个明智之选,因为当涉及到假新闻时,人工智能也并没有那么“智能”。

1

AI不理解假新闻,因为AI不理解人类的写作方式

应用人工智能来自动过滤假新闻其实很难。从技术角度来看,AI根本无法理解人类写作的方式,因此屡屡碰壁。它可以提取某些事实并进行粗略的情绪分析(比如根据关键词猜测某段内容是“快乐”还是“愤怒”),但它考虑不到语气的细微差别、文化背景,更不能联系到新闻牵涉的人员以证实信息。退一步讲,即使这些它都能做到,也只能消除最明显的错误信息以及规避恶作剧,但最终还是会遇到一些混淆视听的边缘案例。如果人类都不能对“假新闻”作出判断,那么我们就更无法教导机器判断。

过去,使用人工智能处理假新闻很快就遇到了问题,比如去年举办的“虚假新闻挑战赛”——众包机器学习解决方案的竞赛(众包指的是一个公司或机构把过去由员工执行的工作任务,以自由自愿的形式外包给非特定的(而且通常是大型的)大众网络的做法)。卡耐基梅隆大学的院长Pomerleau帮助组织挑战,但他和他的团队很快意识到,AI无法独自解决这个问题。

“实际上,我们制定了一个更宏伟的目标,即创建一个系统,让它可以回答‘这是不是一条假新闻?’,但我们很快意识到,机器学习无法完成这个任务。”

Pomerleau强调,理解力是关键,语言表达非常微妙,特别是在线上,我们可以转向Tide pod(汰渍洗衣球)这个例子。康奈尔大学教授James Grimmelmann在最近关于假新闻和平台节制的文章中讲到,互联网的反讽使人很难判断真诚和意图。 Facebook和YouTube也在今年1月试图删除Tide Pod挑战视频时发现了这一点(Tide Pod挑战:人们发现Tide Pod长得特别像一种美味的小吃,但它其实只是一种洗涤产品,不可食用)。

一个YouTube视频的缩略图,它想表达得可能是赞成Tide Pod挑战,也可能是反对,还有可能是二者兼有。

Grimmelmann讲到,在决定删除哪些视频时,公司会面临两难的境地。“很容易就能找到人们拿着Tide Pod的视频,他们摆出很想吃的表情,然后又告诉大家不能食用Tide Pod,很危险。但这些视频是真的告诉大家不要食用Tide Pod吗?还是他们表面上声称要抵制食用,只是以此来激起对食物的兴趣?又或者是两种意思都有?”

用AI处理这个问题太过复杂,最终,Pomerleau在“虚假新闻挑战赛”中只要求团队完成一项简单的任务:制作一种算法,只需找出涵盖相同主题的文章即可,这是他们非常擅长的。

有了这个工具,人类可以将一个事件标记为假新闻(例如,声称某个名人已经死亡),然后AI将删除该假新闻的所有相关消息。Pomerleau说,“机器学习能做的最好的事情就是帮助审核人员完成工作。”

2

即使事实核查员在岗,Facebook仍然依赖算法

这似乎是Facebook的首选方法。例如,在今年的意大利选举中,公司聘请了独立的审核人员来标记虚假新闻和骗局。若有问题的链接没有被删除,当用户分享时,链接会被贴上了“第三方事实核查员存在争议”的标签。不幸的是,即使这种方法也存在问题,最近哥伦比亚新闻评论的一篇报道强调了事实核查员对Facebook的许多不满。参与调查的记者说,他们经常不清楚为什么Facebook的算法会要求他们检查某些事件,而以散布谎言和阴谋论而闻名的网站却根本没有被检查过。

然而,在这些方面的确有算法的一席之地。虽然人工智能不能在清除假新闻上做复杂的工作,但它可以做类似于过滤垃圾邮件一样的重复工作。例如,任何拼写和语法不好的东西都会被删掉,或者是那些依靠抄袭来吸引读者的网站。在阿拉巴马州举行的特别选举中,Facebook的目标是“试图散布假消息”的假新闻,而当它来自已知的虚假信息时,就能相对容易地锁定假新闻。

不过专家说,这是人工智能目前能力的极限。康奈尔科技大学信息科学副教授Mor Naaman讲到,即使是这些简单的过滤器也会产生问题,“分类通常基于语言模式和其他简单的信号,这很可能会把诚信可靠的出版商和虚假新闻的发布者混淆在一起”。

另外,Facebook还面临着一个潜在的困境。为了避免受到审查的指责,社交网络都会公开其用来识别假新闻的算法。但如果太开放的话,人们就可以避开算法过滤违规操作。

对于纽约大学法学院的教学研究员Amanda Levendowski来说,这就是她所说的“硅谷谬论”。在谈到Facebook的人工智能时,她认为这是一个常见的错误,“公司开始说,‘我们存在问题,我们必须做点什么’,而不是仔细考虑这是否会产生新的问题。”Levendowski补充说,尽管存在这些问题,科技公司追求人工智能仍有很多理由,比如改善用户体验,甚至是减轻法律责任风险。

原文

Why AI isn’t going to solve Facebook’s fake news problem

Facebook has a lot of problems right now, but one that’s definitely not going away any time soon is fake news. As the company’s user base has grown to include more than a quarter of the world’s population, it has (understandably) struggled to control what they all post and share. For Facebook, unwanted content can be anything from mild nudity to serious violence, but what’s proved to be most sensitive and damaging for the company is hoaxes and misinformation — especially when it has a political bent.

So what is Facebook going to do about it? At the moment, the company doesn’t seem to have a clear strategy. Instead, it’s throwing a lot at the wall and seeing what works. It’s hired more human moderators (as of February this year it had around 7,500); it’s giving users more information in-site about news sources; and in a recent interview, Mark Zuckerberg suggested that the company might set up some sort of independent body to rule on what content is kosher. (Which could be seen as democratic, an abandonment of responsibility, or an admission that Facebook is out of its depth, depending on your view.) But one thing experts say Facebook needs to be extremely careful about is giving the whole job over to AI.

So far, the company seems to be just experimenting with this approach. During and interview with The New York Times about the Cambridge Analytica scandal, Zuckerberg revealed that for the special election in Alabama last year, the company “deployed some new AI tools to identify fake accounts and false news.” He specified that these were Macedonian accounts (an established hub in the fake-news-for-profit business), and the company later clarified that it had deployed machine learning to find “suspicious behaviors without assessing the content itself.”

This is smart because when it comes to fake news, AI isn’t up to the job.

AI CAN'T UNDERSTAND FAKE NEWS BECAUSE AI CAN'T UNDERSTAND WRITING

The challenges of building an automated fake news filter with artificial intelligence are numerous. From a technical perspective, AI fails on a number of levels because it just can’t understand human writing the way humans do. It can pull out certain facts and do a crude sentiment analysis (guessing whether a piece of content is “happy” or “angry” based on keywords), but it can’t understand subtleties of tone, consider cultural context, or ring someone up to corroborate information. And even if it could do all this, which would knock out the most obvious misinformation and hoaxes, it would eventually run up against edge cases that confuse even humans. If people on the left and the right can’t agree on what is and is not “fake news,” there’s no way we can teach a machine to make that judgement for us.

In the past, efforts to deal with fake news using AI have quickly run into problems, as with the Fake News Challenge — a competition to crowdsource machine learning solutions held last year. Dean Pomerleau of Carnegie Mellon University, who helped organize the challenge says that he and his team soon realized AI couldn't tackle this alone.

“We actually started out with a more ambitious goal of creating a system that could answer the question ‘Is this fake news, yes or no?’ We quickly realized machine learning just wasn’t up to the task.”

Pomerleau stresses that comprehension was the primary problem, and to understand why exactly language can be so nuanced, especially online, we can turn to the example set by Tide pods. As Cornell professor James Grimmelmann explained in a recent essay on fake news and platform moderation, the internet’s embrace of irony has made it extremely difficult to judge sincerity and intent. And Facebook and YouTube have found this out for themselves when they tried to remove Tide Pod Challenge videos in January this year.

As Grimmelmann explains, when it came to deciding which videos to delete, the companies would have been faced with a dilemma. “It’s easy to find videos of people holding up Tide Pods, sympathetically noting how tasty they look, and then giving a finger-wagging speech about not eating them because they’re dangerous,” he says. “Are these sincere anti-pod-eating public service announcements? Or are they surfing the wave of interest in pod-eating by superficially claiming to denounce it? Both at once?”

Considering this complexity, it’s no wonder that Pomerleau’s Fake News Challenge ended up asking teams to complete a simpler task: make an algorithm that can simply spot articles covering the same topic. Something they turned out to be pretty good at.

With this tool a human could tag a story as fake news (for example, claiming a certain celebrity has died) and then the algorithm would knock out any coverage repeating the lie. “We talked to real-life fact-checkers and realized they would be in the loop for quite some time,” says Pomerleau. “So the best we could do in the machine learning community would be to help them do their jobs.”

EVEN WITH HUMAN FACT-CHECKERS IN TOW, FACEBOOK RELIES ON ALGORITHMS

his seems to be Facebook’s preferred approach. For the Italian elections this year, for example, the company hired independent fact-checkers to flag fake news and hoaxes. Problematic links weren’t deleted, but when shared by a user they were tagged with the label “Disputed by 3rd Party Fact Checkers.” Unfortunately, even this approach has problems, with a recent report from the Columbia Journalism Review highlighting fact-checker’s many frustrations with Facebook. The journalists involved said it often wasn’t clear why Facebook’s algorithms were telling them to check certain stories, while sites well-known for spreading lies and conspiracy theories never got checked at all.

However, there’s definitely a role for algorithms in all this. And while AI can’t do any of the heavy lifting in stamping out fake news, it can filter it in the same way spam is filtered out of your inbox. Anything with bad spelling and grammar can be knocked out, for example; or sites that rely on imitating legitimate outlets to entice readers. And as Facebook has shown with its targeting of Macedonian accounts “that were trying to spread false news” during the special election in Alabama, it can be relatively easy to target fake news when it’s coming from known trouble-spots.

Experts say, though, that is the limit of AI’s current capabilities. Mor Naaman, an associate professor of information science at Cornell Tech, adds that even these simpler filters can create problems. “Classification is often based on language patterns and other simple signals, which may ‘catch’ honest independent and local publishers together with producers of fake news and misinformation,” says Naaman.

And even here, there is a potential dilemma for Facebook. Although in order to avoid accusations of censorship, the social network should be open about the criteria its algorithms use to spot fake news, if it’s too open people could game the system, working around its filters.

For Amanda Levendowski, a teaching fellow at NYU law, this is an example of what she calls the “Valley Fallacy.” Speaking about Facebook’s AI moderation she suggests this is a common mistake, “where companies start saying, ‘We have a problem, we must do something, this is something, so we must do this,’ without carefully considering whether this could create new or different problems.” Levendowski adds that despite these problems, there are plenty of reasons tech firms will continue to pursue AI moderation, ranging from “improving users’ experiences to mitigating the risks of legal liability.”

These are surely temptations for Zuckerberg, but even then, it seems that leaning too hard on AI to solve its moderation problems would be unwise. And not something he would want to explain to Congress next week.

文章编辑:小柳

原文发布于微信公众号 - 灯塔大数据(DTbigdata)

原文发表时间:2018-04-11

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏吉浦迅科技

终极DIY打造Jetson拉力赛车全过程(1)

本文作者为kangalow,由Jetsoner开发者论坛版主xID翻译 本文中的圆括号内的内容均为译者注释,方括号内的内容为原文作者注释 本文是“复杂环境下的阿...

4779
来自专栏CDA数据分析师

野生码农的逆袭之路:在跨界中找到自我

本文由CDA作者库成员HarryZhu原创,并授权发布。 CDA作者库凝聚原创力量,只做更有价值的分享。 ? Day Job and Night Job 我非常...

2246
来自专栏腾讯NEXT学位

据说只有程序员看得懂这些段子

39610
来自专栏数据的力量

漫画趣味图解云计算的起源

4514
来自专栏大数据文摘

【译】十张图看懂3D打印未来

27115
来自专栏Java架构沉思录

为什么你写了好几页的简历,还是被拒了

我打开Word文档一看,该说的内容确实也都说了,但就是没说到点子上。就好比明明知道痒在哪里,然而却挠不着。真够急人的。

1173
来自专栏玉树芝兰

知识工作者的便携利器

工业时代,工作时间工作,休息时间休息。现在你如果能25分钟不看手机、不刷朋友圈专注读书(哪怕是小说),那就叫完成了一个番茄钟,是非常难得的成就了(保守地说,也足...

892
来自专栏AhDung

码农录歌心得

最近喜欢上录歌,不是自己写歌编曲演奏演唱那么高段,也不是自弹自唱(这个后面可以有),也不是去录音棚交钱只管唱那么悠然。就是在自己的电脑上,翻唱一些自己喜欢的歌,...

1453
来自专栏Java帮帮-微信公众号-技术文章全总结

历史上最伟大的12位程序员

历史上最伟大的12位程序员 所谓程序员,是指那些能够创造、编写计算机程序的人。不论一个人是什么样的程序员,或多或少,他都在为我们这个社会贡献着什么东西。然而,有...

3515
来自专栏量子位

艾玛不哭!AI引起的“换脸”问题,AI正在解决

这个年,希望艾玛·沃森(Emma Watson)过得踏实。 之前我们报道过,纸醉金迷的资本主义网站Reddit最近搞出可怕黑技术:用AI技术,把爱情动作片中的主...

4004

扫码关注云+社区