前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >测试人工智能自动语音识别系统

测试人工智能自动语音识别系统

原创
作者头像
赵云龙龙
修改2020-06-15 10:46:01
1.3K0
修改2020-06-15 10:46:01
举报
文章被收录于专栏:python爱好部落python爱好部落

ASR 自动语音识别(Automatic Speech Recognition)是一种将人的语音转换为文本的技术。

以前的ASR太难用了。瑞士那边做了一款厉害的ASR来替换。据说是基于人工智能的,大数据的。反正就是很牛的,让我来测试,供他们拍脑袋来做决策。 我只测反应时间,至于准不准,不在此次范围内(噪音,精度等)。 开发将ASR的SDK集成在一个demo里面,我来人工测试,也不是调用接口。就是模拟用户反复使用测试,来判断反应时间是否能达到要求。

先给了一个Android版本。 开始手工感受了一下,如果完全人工测试,太浪费时间了。

后面定了一个策略:就是用我录制的样本,每台机器跑4个样本,每个样本跑30遍,记录反应时间。 然后我用UI自动化的形式来完全模拟人工。

样本是这四句话: Due to delays, we need to reconsider our schedule this week. As we've discussed, we need to put our most experienced staff on this. Can you suggest an alternative to the restructuring? We'll implement quality assurance processes before the final review. 故意读得磕磕巴巴,每个音频大约在13秒。 但是录制出来的是m4a格式,我得转换一下,这里用ffmpeg

一、ffmpeg安装

1.ffmpeg下载:http://ffmpeg.org/download.html

2.解压到指定目录,将bin文件目录添加到path路径(电脑-属性-高级系统设置-环境变量-path-新建) 命令行(windows+r 输入cmd)输入:ffmpeg -version出结果表示成功。

二、ffmpeg使用

1.视频格式转换:ffmpeg -i num.mp4 -codec copy num2.avi

将num.mp4复制并转换为num2.avi

注:-i后表示要进行操作的文件

2.gif制作:ffmpeg -i num.mp4 -vframes 20 -y -f gif num3.gif

将num.mp4的前20帧制作为gif并命名为num3

3.视频截取:ffmpeg -i num.mp4 -ss 0 -t 3 -codec copy cut1.mp4

-ss后数字表示截取时刻,-t后数字表示截取时长

截取视频某一时刻为图片:ffmpeg -i num.mp4 -y -f image2 -ss 2 -t 0.001 -s 400x300 pic.jpg

将2s时刻截取为400x300大小的名为pic.jpg的图片(-ss后的数字为截取时刻)

4.每秒截取一张图片:ffmpeg -i num.mp4 -r 1 image%d.jpg

将视频num.mp4进行每秒截取一张图片,并命名为imagei.jpg(i=1,2,3...)

注:-r后的数字表示每隔多久截取一张。

然后写个脚本,批量转换就完成了。

代码语言:javascript
复制
import os
#current_path = os.path.dirname(os.path.abspath(__file__))
current_path = "C:\\work\\code\\android"
audio_file = os.path.join(current_path, "audio")

m4a_files = os.listdir(audio_file)

for i, m4a in enumerate(m4a_files):
     if m4a.endswith(".m4a"):
         os.system("ffmpeg -i " + audio_file +"\\" + m4a
              + " " + audio_file +"\\"+ str(i+1) + ".mp3")

然后调试了一下脚本,就跑起来了。 然后就是在点击播放,就开始录制。 python的音频播放,我尝试了几个,用pygame可以自己设置时间长度。 用起来大概是这样的:

代码语言:javascript
复制
import pygame
    # play the audio
    pygame.mixer.init()
    pygame.mixer.music.load(file)
    pygame.mixer.music.play()
    sleep(SLEEPTIME)
    pygame.mixer.music.stop()

    #playsound(file)

通过判断分数是否在界面上出来,来判断反应时间,所以得不停的轮询:

代码语言:javascript
复制
 start_time = datetime.datetime.now()

    score = "textScore"

    #  Get the end time from judge the score whether appear or not
    while not is_element_appear(driver, score):
        print("please wait, the response still didn't back at {}".format(datetime.datetime.now()))
        timeout_diff = datetime.datetime.now() - start_time

        print(timeout_diff.seconds)
        if timeout_diff.seconds >= TIME_OUT:
            print("overtime")
            # end_time = start_time.shift(seconds=+TIME_OUT)
            break

    end_time = datetime.datetime.now()

    used_time = (end_time - start_time).total_seconds()

多次执行,为了防止中间出问题,写了个log, 这样前面执行的结果,可以记录下来:

代码语言:javascript
复制
def write_log(content):
    # write a log in case crash or any breakout
    with open(log_path, 'a+') as f:
        f.writelines(content)

def get_sentence_audio():
    # get each sentence and mp3 map
    with open(sentence_file, "r") as f:
        for index, line in enumerate(f.readlines()):
            line = line.strip("\n")
            file = audio_file_path + "\\" + str(index + 1) + ".mp3"
            mutiply_times(line, file)

def mutiply_times(text, audio_file):
    # play mutiply times
    each_audio_play_time = []

    type_text(text)

    for i in range(TIMES):
        start_record(audio_file, each_audio_play_time)

    result.append(each_audio_play_time)
    # in case it was crash.
    write_log("current result is {}{} {} {}".format(get_brand(),get_version(),text,each_audio_play_time))

大约跑起来就是这个样子的:

代码语言:javascript
复制
from appium import webdriver
from time import sleep
import datetime
import pygame
import os
import numpy as np
import pandas as pd
from playsound import playsound

TIME_OUT = 30
TIMES = 30
element_time = 10
DURATION = 0.5
SLEEPTIME = 12

current_path = os.path.dirname(os.path.abspath(__file__))

audio_file_path = os.path.join(current_path, "audio")
sentence_file = current_path + "\\sentence.txt"
result_path = current_path + "\\result"
log_path = current_path + "\\result\\log.txt"

import subprocess

def cmd(cmd):
    return subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)


def get_device():
    for device in cmd('adb devices').stdout.readlines():
        if 'devices' not in str(device):
            device = device.decode('utf-8')
            return device.split('\t')[0]

def get_version():
    if cmd('adb shell getprop | findstr version.release').stdout != "":
        result = cmd('adb shell getprop | findstr ro.build.version.release').stdout.readline()
        result = str(result).split(":")[1]
        return result[result.index("[") + 1:result.index("]")]


def get_brand():
    if cmd('adb shell getprop | findstr ro.product.brand').stdout != "":
        result = cmd('adb shell getprop | findstr ro.product.brand').stdout.readline()

        result = str(result).split(":")[1]
        return result[result.index("[") + 1:result.index("]")]

desired_caps = {}

desired_caps["platformName"] = "android"
desired_caps["platformVersion"] = get_version()
desired_caps["deviceName"] = get_device()
desired_caps["appPackage"] = "com.library.speechscoringsdk"
desired_caps["appActivity"] = "com.library.speechscoringsdk.RootActivity"
desired_caps["autoGrantPermissions"] = True
desired_caps["automationName"] = "UiAutomator1"
desired_caps["noReset"] = True

driver = webdriver.Remote("http://localhost:4723/wd/hub", desired_caps)

result = []

#
def wait_element_appear(driver, element):
    try:
        # 设置等待
        wait = WebDriverWait(driver, element_time, DURATION)
        # 使用匿名函数
        wait.until(lambda diver: driver.find_element_by_id(element))
        return True
    except:
        return False

def is_element_appear(driver, element):
    try:
        driver.find_element_by_id(element)
        return True
    except:
        return False

def type_text(text):
    permission = "permission_allow_button"
    if wait_element_appear(driver, permission):
        driver.find_element_by_id("permission_allow_button")

    driver.find_element_by_id("settings").click()
    sleep(1)
    driver.find_element_by_id("record").click()

def start_record(file, each_time):
    # add this part to clear the result in app

    # driver.hide_keyboard()
    driver.find_element_by_id("btnRecord").click()

    # play the audio
    pygame.mixer.init()
    pygame.mixer.music.load(file)
    pygame.mixer.music.play()
    sleep(SLEEPTIME)
    pygame.mixer.music.stop()

    #playsound(file)

    # start to calculate the start time after click the sentout button.
    driver.find_element_by_id("btnRecord").click()
    start_time = datetime.datetime.now()

    score = "textScore"

    #  Get the end time from judge the score whether appear or not
    while not is_element_appear(driver, score):
        print("please wait, the response still didn't back at {}".format(datetime.datetime.now()))
        timeout_diff = datetime.datetime.now() - start_time

        print(timeout_diff.seconds)
        if timeout_diff.seconds >= TIME_OUT:
            print("overtime")
            # end_time = start_time.shift(seconds=+TIME_OUT)
            break

    end_time = datetime.datetime.now()

    used_time = (end_time - start_time).total_seconds()
    print("time is {}".format(used_time))
    each_time.append(used_time)

    driver.find_element_by_id("settings").click()
    driver.find_element_by_id("record").click()


    # write log in case break out occur even it will lost some performance
    # content = get_brand() + get_version() + ": run {}".format(text) + "used time: {}".format(used_time)+"\r\n"
    # write_log(content)

def write_log(content):
    # write a log in case crash or any breakout
    with open(log_path, 'a+') as f:
        f.writelines(content)


def get_sentence_audio():
    # get each sentence and mp3 map
    with open(sentence_file, "r") as f:
        for index, line in enumerate(f.readlines()):
            line = line.strip("\n")
            file = audio_file_path + "\\" + str(index + 1) + ".mp3"
            mutiply_times(line, file)

def mutiply_times(text, audio_file):
    # play mutiply times
    each_audio_play_time = []
    type_text(text)
    for i in range(TIMES):
        start_record(audio_file, each_audio_play_time)

    result.append(each_audio_play_time)
    # in case it was crash.
    write_log("current result is {}{} {} {}".format(get_brand(),get_version(),text,each_audio_play_time))

if __name__ == "__main__":
    # s start to play and record time
    get_sentence_audio()

    # get result for each device
    result_arrary = np.array(result)
    result_pd = pd.DataFrame(result_arrary, columns=range(1, TIMES + 1))
    # add the average result
    result_pd['AVG'] = result_pd.mean(axis=1)  # axis 0为列,1为行

    # add the sample in result
    sentences = open(sentence_file, 'r')
    text = sentences.readlines()
    sentences.close()
    sentence_pd = pd.DataFrame(text, columns=["sentence"])
    result_final = sentence_pd.join(result_pd)

    # add the device type in result

    device_arrary = np.array([get_brand() + get_version()] * len(text)).T
    device_pd = pd.DataFrame(device_arrary)
    result_final = device_pd.join(result_final)

    # write the result to excel
    result_final.to_excel(result_path + "\\" + get_brand() + get_version() + ".xlsx", index=False)

excel里面,多个sheet文件写在一起,可以这样用:

代码语言:javascript
复制
# 第一步:调用pandas包

# 第二步:读取数据
iris = pd.read_excel("C:\\work\\code\\android\\android.xlsx", None)  # 读入数据文件,多个sheet
 keys = list(iris.keys())
# 第三步:数据合并
iris_concat = pd.DataFrame()
for i in keys:
     iris1 = iris[i]
     iris_concat = pd.concat([iris_concat, iris1])
 iris_concat.to_excel("C:\\work\\code\\android\\result.xlsx", index=False)  # 数据保存路径

最后我数据处理的文件是:

代码语言:javascript
复制
test_path = "C:\\work\\code\\android\\result"
import pandas as pd
import os

def get_result(path):
    final = []
    for index, file in enumerate(os.listdir(path)):

        pds = pd.read_excel(os.path.join(path, file))
            # print(pds)
        final.append(pds)

    #     final.append(pds.keys)
    return final

s = get_result(test_path)
print(s)

final_result=pd.concat(s)

current_path = os.path.dirname(os.path.abspath(__file__))

result_file = current_path + "\\result.xlsx"

    最后测了几轮,发现CN的时间大约是US的一半,然后offline的是online的四分之一,机器性能好的,稍微快点,差别不太大。然后boss看到这个结果,就可以拍板了。

    后面还有个IOS版本的,还有ASR其他方面的测试,下回再说。

更多精彩,请关注微信公众号:python粉丝团

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
语音识别
腾讯云语音识别(Automatic Speech Recognition,ASR)是将语音转化成文字的PaaS产品,为企业提供精准而极具性价比的识别服务。被微信、王者荣耀、腾讯视频等大量业务使用,适用于录音质检、会议实时转写、语音输入法等多个场景。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档