文章/答案/技术大牛

发布

Python 爬取Android xml页面信息数据采集分析

文章来源：企鹅号 - Python研发测试

背景介绍

在app测试过程中需要对部分页面的数据进行抓取操作，由于数据、设备较多的情况下人工并不能高效快速的实现数据抓取采集操作，人工在长时间操作下可能造成很多错误数据故需要自动化数据采集服务。

程序实现原理介绍

通过Android uiautomator获取当前页面元素布局信息和所显示的元素的信息，并通过滑动操作捕获多个页面的想xml信息再通过Python简单的爬虫数据处理分析得出需要的测试数据并统一整理。

所需技能：

uiautomator

python

adb shell

bat 脚本

数据统计并整理成CSV报告

抓取页面数据示意图：

实现步骤：

代码目录结构总览图：

1. 首先封装 dump.bat 脚本：

adb pull /sdcard/ludashi/ue_benchmark_summary.txt

adb shell rm /sdcard/one.xml

adb shell rm /sdcard/two.xml

adb shell uiautomator dump /sdcard/one.xml

adb shellinputswipe200 1350 200 100 500

adb shell uiautomator dump /sdcard/two.xml

adb pull /sdcard/one.xml

adb pull /sdcard/two.xml

adb shell rm /sdcard/one.xml

adb shell rm /sdcard/two.xml

脚本解析：

1. pull 原始数据

2. 删除历史遗留数据

3. dump 页面xml信息，滑动页面，再次dump 页面xml信息

4. pull 捕获的xml页面数据

5. 删除历史遗留数据

one.xml 示例代码

2. AnalysisOFXML.py 代码：

#!/usr/bin/env python

# -*- coding: utf-8 -*-

# @Author : Benjamin

# @Time : 2018/6/24 1:26

importre

importos

onename ="one.xml"

twoname ="two.xml"

defremove_dir(path):

path = path.replace('\\','/')

if(os.path.isdir(path)):

forpinos.listdir(path):

remove_dir(os.path.join(path,p))

if(os.path.exists(path)):

os.rmdir(path)

else:

if(os.path.exists(path)):

os.remove(path)

defTiYanData():

# 读写文件

onefiles =open(onename,"rb").read().decode('utf-8')

twofiles =open(twoname,"rb").read().decode('utf-8')

# 匹配分数正则表达式

reg = re.compile(r'text="([0-9][.][0-9]*).*?" resource-id="com.ludashi.benchmark:id/tv_value"')

total_score = re.compile(r'text="([0-9][.][0-9]*)" resource-id="com.ludashi.benchmark:id/tv_total_score"')

# 正则表达式查找

onexmldata = re.findall(reg,str(onefiles))

twoxmldata = re.findall(reg,str(twofiles))

total_score = re.findall(total_score,str(onefiles))

# 获取重复数据在下一个数组中的索引位置，并删除重复数据

CheckNum = twoxmldata.index(onexmldata[-1])

foriinrange(CheckNum+1):

deltwoxmldata[]

# 删除顶部重复数据

delonexmldata[]

# 数据拼接

EndData = onexmldata + twoxmldata + total_score

iflen(EndData) ==24:

print("swap listdata")

# 替换 APP 使用元素

AppUseScore = EndData.pop(1)

EndData.insert(4,AppUseScore)

# 替换网页加载与滑动

PageLoadSlide = EndData.pop(5)

EndData.insert(8,PageLoadSlide)

# 替换照片查看和操作

ImageViewDo = EndData.pop(9)

EndData.insert(13,ImageViewDo)

# 替换文件拷贝与闪存测试

FileCopyTest = EndData.pop(14)

EndData.insert(17,FileCopyTest)

# 替换开机自启

PowerBoot = EndData.pop(18)

EndData.insert(22,PowerBoot)

returnEndData

else:

print("Data error!")

print(onexmldata)

print(onexmldata[-1])

print(twoxmldata)

代码解读：

1. 正则匹配分数段数据

2. 数组list，索引，弹出，插入，合并等操作

3. ......

3. DOcsv.py 代码：

importAnalysisOFXML

importcsv

filename ="ue_benchmark_summary.txt"

files =open(filename,'r')

TiYanDict = {}

foriinfiles.readlines():

data = i.split("\n")[].split("=")

TiYanDict[data[]] = data[1]

key = ["LAUNCHER_SCROLL_FPS","APP_BASIC_USE_SWITCH_TAB_FPS","APP_BASIC_USE_SWITCH_BANNER_FPS","APP_BASIC_USE_SCROLL_LIST_FPS"," / ","WEB_PARSE_DURATION","WEB_LOAD_DURATION","WEB_SCROLL_FPS"," / ","DECODE_BITMAP_DURATION","GALLERY_SCROLL_FPS","IMAGE_SCALE_FPS","SCREEN"," / ","SDCARD_COPY_SCORE","SDCARD_READ_SPEED","SDCARD_WRITE_SPEED"," / ","MEM_BOOT_APP","BOOT_APP_COUNT","MEM_TOTAL","MEM_FREE"]

liststrs = []

foriinkey:

ifi ==" / ":

liststrs.append("/")

else:

liststrs.append(TiYanDict[i])

defWriteCsv(FileName,lists,TiYanData,liststrs):

name =str(FileName)+".csv"

withopen(name,"w+")ascsvfile:

writer = csv.writer(csvfile)

writer.writerow(lists)

writer.writerow(TiYanData)

writer.writerow(liststrs)

print("Write Success!")

print(lists)

print(TiYanData)

print(liststrs)

FileName =input("Please input FileName,Not Chinese ：")

WriteCsv(FileName,key,AnalysisOFXML.TiYanData(),liststrs)

代码截图：

1. 读取txt并进行字符切割

2. Python 字典操作，赋值新增元素，字典取值

3. 数组list，添加数据

4. 函数 csv 操作方法封装

4.最后 delfile.bat 清理脚本：

del*.txt

del*.xml

del*.csv

5. 程序运行结果，项目示例：

运行结果：

产物：

赠言

空谈误国,实干兴邦

发表于: 2018-06-262018-06-26 17:22:53
原文链接：https://kuaibao.qq.com/s/20180626G1AOOD00?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

Python 爬取Android xml页面信息数据采集分析

相关快讯

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐