专栏首页用户6517667的专栏How Do You Test AI Systems?--你如何测试AI系统

How Do You Test AI Systems?--你如何测试AI系统

RonSchmelzerContributor(Ron SchmelzerContributor公司)

COGNITIVEWORLDContributor Group(认知世界贡献者组)

Everyone who hasever worked on an application development project knows that you don’t justsimply put code and content out in production, to your customers, employees, orstakeholders without first testing it to make sure it’s not broken or dead ondelivery. Quality Assurance (QA) is such a core part of any technology orbusiness delivery that it’s one of the essential components of any developmentmethodology. You build. You test. You deploy. And the best way to do all thisis in an agile fashion, in small, iterative chunks so you make sure to respondto the continuously evolving and changing needs of the customer. Surely AIprojects are no different. There are iterative design, development, testing,and delivery phases, as we’ve discussed in our previous content on AImethodologies.

一个参与过应用程序开发项目的人都知道,您不只是简单地将代码和内容输出到生产环境中,给您的客户、员工或利益相关者,而无需首先对其进行测试,以确保它在交付时没有损坏或死机。质量保证(Quality Assurance,QA)是任何技术或业务交付的核心部分,是任何开发方法的重要组成部分之一。你构建,你来测试,你部署。最好的方法是以敏捷的方式,以小的、迭代的方式,这样你就可以确保对客户不断变化的需求做出响应。当然人工智能项目也不例外。正如我们在前面关于人工智能方法的内容中所讨论的,有迭代的设计、开发、测试和交付阶段。

However, AIoperationalization is different than traditional deployment in that you don’tjust put machine learning models into “production”. This is because models arecontinuously evolving and learning and must be continuously managed.Furthermore, models can end up on a wide variety of endpoints with differentperformance and accuracy metrics. AI projects are unlike traditional projectseven with regards to QA. Simply put, you don’t do QA for AI projects like youQA other projects. This is because the concept of what we’re testing, how wetest, and when we test is significantly different for AI Projects.


Testing andquality assurance in the training and inference phases of AI


Thoseexperienced with machine learning model training know that testing is actuallya core element of making AI projects work. You don’t simply develop an AIalgorithm, throw training data at it and call it a day. You have to actuallyverify that the training data does a good enough job of accurately classifyingor regressing data with sufficient generalization without overfitting orunderfitting the data. This is done using validation techniques and settingaside a portion of the training data to be used during the validation phase. Inessence, this is a sort of QA testing where you’re making sure that thealgorithm and data together in a way that also takes into accounthyperparameter configuration data and associated metadata all working togetherto provide the predictive results you’re looking for. If you get it wrong inthe validation phase, you’re supposed to go back, change the hyperparameters,and rebuild the model again, perhaps with better training data if you have it.After this is done, you go back and use other set-aside testing data to verifythat the model indeed works as it is supposed to. While this is all aspects oftesting and validation, it happens during the training phase of the AI project.This is before the AI model is put into operation.


Today In: AI


3 DigitalRealities Arising From The Covid19 Pandemic

IBM AugmentedReality Working To Support And Accelerate How Support Services Are Changing

Google TriesTo Match Apple’s H1 Headphone Chip Features With Updates To Fast Pair

3 由Covid19流行病引起的数字现实


谷歌试图将苹果H1耳机芯片的功能与Fast Pair的更新相匹配

Even in thetraining phase, we’re testing a few different things. First, we need to makesure the AI algorithm itself works. There’s no sense in tweakinghyperparameters and training the model if the algorithm is implemented wrong.However, in reality, there’s no reason for a poorly implemented algorithmbecause most of these algorithms are already baked into the various AIlibraries. If you need K-Means Clustering or different flavors of neuralnetworks or Support Vector Machines or K-Nearest Neighbors, you can simply justcall that library function in Python scikit-learn or whatever your tool ofchoice is, and it should work. There’s just one way to do the math! MLdevelopers should not be coding those algorithms from scratch unless you have areally good reason to do so. That means if you’re not coding them from scratch,there’s very little to be tested as far as the actual code goes – assume thatthe algorithms have already passed their tests. In an AI project, QA will neverbe focused on the AI algorithm itself or the code, assuming it has all beenimplemented as supposed to be.

即使在训练阶段,我们也在测试一些不同的东西。首先,我们需要确保人工智能算法本身的工作。如果算法实现错误,调整超参数和训练模型是没有意义的。然而,在现实中,并没有理由实现不好的算法,因为这些算法中的大多数已经被烘焙到各种人工智能库中。如果您需要K-Means聚类或不同风格的神经网络、支持向量机或K近邻,您只需在Pythonscikit learn或您选择的任何工具中调用该库函数,它就会工作。只有一种方法可以计算!机器学习开发人员不应该从头开始编写这些算法,除非您有充分的理由这样做。这意味着如果你不是从头开始编码的话,就实际代码而言,几乎没有什么需要测试的——假设算法已经通过了测试。在人工智能项目中,QA永远不会关注人工智能算法本身或代码,假设它们已经按预期实现了。

This leaves twothings to be tested in the training phase for the AI model itself: the trainingdata and the hyperparameter configuration data. In the latter case, we alreadyaddressed testing of hyperparameter settings through the use of validationmethods, including K-fold cross-validation and other approaches. If you aredoing any AI Model training at all, then you should know how to do validation.This will help determine if your hyperparameter settings are correct. Knockanother activity off the QA task list.


As such, thenall that remains is testing the data itself for QA of the AI Model. But whatdoes that mean? This means not just data quality, but also completeness. Doesthe training model adequately represent the reality of what you’re trying togeneralize? Have you inadvertently included any informational or human-inducedbias in your training data? Are you skipping over things that work in trainingbut will fail during inference because the real-world data is more complex? QAfor the AI model here has to do with making sure that the training dataincludes a representative sample of the real world and eliminates as much humanbias as possible.


Outside of themachine learning model, the other aspects of the AI system that need testingare actually external to the AI model. You need to test the code that puts theAI model into production – the operationalization component of the AI system.This can happen prior to the AI model being put into production, but thenyou’re not actually testing the AI model. Instead, you’re testing the systemsthat use the model. If the model is failing during testing, the other code thatuses the model has a problem with either the training data or the configurationsomewhere. You should have picked that up when you were testing the trainingmodel data and doing validation as we discussed above.


To do QA for AI,you need to test in production


If you’vefollowed along what’s written above then you know that a properly validated,well-generalizing system using representative training data and usingalgorithms from an already-tested and proven source should result in expectedresults. But what happens when you don’t get those expected results? Reality isobviously messy. Things happen in the real world that don’t happen in your testenvironment. Yet we did everything we were supposed to do in the training phaseand our model passed meeting expectations, but it’s not passing in the“inference” phase when the model is operationalized. This means we need to havea QA approach to deal with models in production.


Problems thatarise with models in the inference phase are almost always issues of data ormismatches in the way that the model was trained versus real-world data. Weknow the algorithm works. We know that our training model data andhyperparameters were configured to the best of our ability. That means thatwhen models are failing we have data or real-world mismatch problems. Is theinput data bad? If the problem is bad data – fix it. Is the model not generalizing well? Is theresome nuance of the data that needs to be added to further train the model? Ifthe answer is the latter, that means we need to go through a whole new cycle ofdeveloping an AI model with new training data and hyperparameter configurationsto deal with the right level of fitting to that data. Regardless of the issue,organizations that operationalize AI models need a solid approach by which theycan keep close tabs on how the AI models are performing and version controlwhich ones are in operation.


This isresulting in the emergence of a new field of technology called “ML ops”, thatfocuses not on building or developing models, but rather managing them inoperation. ML ops is focused on model versioning, governance, security,iteration, and discovery. Basically, everything that happens after the modelsare trained and developed and while they are out in production.

这就产生了一个新的技术领域,称为“MLOps”,它的重点不是建立或开发模型,而是在运行中管理模型。ML ops主要关注模型版本控制、治理、安全性、迭代和发现。基本上,在模型经过训练和开发之后,以及在生产过程中发生的一切。

AI projects arereally unique in that they revolve around data. Data is the one thing intesting that is guaranteed to continuously grow and change. As such, you needto consider AI projects as also continuously growing and changing. This shouldgive you a new perspective on QA in the context of AI.


本文分享自微信公众号 - 软件测试培训(iTestTrain),作者:软件测试培训

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。




0 条评论
登录 后参与评论


  • 未来创新的人工智能测试自动化工具:第三次浪潮


  • 如何测试深度学习

    One of thelargest challenges when starting our company was learning how to use d...

  • DevOps工具介绍连载(17)——Debian-Preseed(全局配置)

    对于大批量安装Linux服务器来说,Kickstart是个不错的选择,我比较热衷的方式是:TFPT+HTTP+DHCP+PXE(configure file)

  • The Apache Way--Building Tech Community in China

    感谢暨南大学 JSLeung 同学义务贡献的翻译,让这个演讲从四六级的水准变为专业级水准。下面文字有差错的话,是我后期修改、降低难度的原因。

  • QQ & PUPU 动画设定

    ? 腾讯ISUX isux.tencent.com 社交用户体验设计 ? ? 01 概述 | Overview SpaceQQ 和 PUPU 是在QQ20周...

  • 不完全免疫算法简介AU-DHEIA--AIS学习笔记6

  • Academic writing 课程精华

    这个暑假在华师上英语培训,分为ARW(academic reading & writing)和ASL(academic speaking & listening...

  • Inception-V3论文翻译——中英文对照

    Rethinking the Inception Architecture for Computer Vision Abstract Convolutional...

  • Very Deep Convolutional Networks for Large-Scale Image Recognition—VGG论文翻译—中英文对照

    Very Deep Convolutional Networks for Large-Scale Image Recognition ABSTRACT In t...

  • AOT:基于外观最优传输的身份交换用于伪造检测(CS CV)