首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何从JavaScript函数调用中提取数据

如何从JavaScript函数调用中提取数据
EN

Stack Overflow用户
提问于 2018-06-20 17:14:23
回答 4查看 103关注 0票数 0

我在Python中使用scrapy从一个网站上抓取数据。

所需数据位于脚本标记中,如下所示:

代码语言:javascript
运行
复制
<script type="text/javascript">
getDetailsfrmBean("storePg","564","Berwyn, IL","7180 W CERMAK RD.","SPACE A1","","BERWYN","IL","US","60402","(708) 788-5097","{Monday-Saturday=10-9,sunday=11-6}","41.8507029","-87.8033709");
</script>

我可以使用xpath获取此内容,如下所示:

代码语言:javascript
运行
复制
item['lat'] = tree.xpath('//script[@type="text/javascript"]/text()'.extract()[0].encode('utf-8')
item['long'] = tree.xpath('//script[@type="text/javascript"]/text()'.extract()[0].encode('utf-8')

然后

代码语言:javascript
运行
复制
item['lat'] = 'getDetailsfrmBean("storePg","564","Berwyn, IL","7180 W CERMAK RD.","SPACE A1","","BERWYN","IL","US","60402","(708) 788-5097","{Monday-Saturday=10-9,sunday=11-6}","41.8507029","-87.8033709");'

item['long'] = 'getDetailsfrmBean("storePg","564","Berwyn, IL","7180 W CERMAK RD.","SPACE A1","","BERWYN","IL","US","60402","(708) 788-5097","{Monday-Saturday=10-9,sunday=11-6}","41.8507029","-87.8033709");'

但是,我如何解析这些内容,以便

代码语言:javascript
运行
复制
item['lat'] is equal to "41.8507029"
item['long'] is equal to "-87.8033709"
item['city'] is equal to "BERWYN"
item['state'] is equal to "IL"

我能得到任何解决这个问题的建议吗?

EN

回答 4

Stack Overflow用户

发布于 2018-06-20 17:26:21

由于此调用也是有效的Python语法,因此我们可以使用ast模块。此外,参数都是字符串文字,这使得事情变得更简单。

代码语言:javascript
运行
复制
import ast

line = 'getDetailsfrmBean("storePg","564","Berwyn, IL","7180 W CERMAK RD.","SPACE A1","","BERWYN","IL","US","60402","(708) 788-5097","{Monday-Saturday=10-9,sunday=11-6}","41.8507029","-87.8033709");'

print([arg.s for arg in ast.parse(line).body[0].value.args])

输出:

代码语言:javascript
运行
复制
['storePg', '564', 'Berwyn, IL', '7180 W CERMAK RD.', 'SPACE A1', '', 'BERWYN', 'IL', 'US', '60402', '(708) 788-5097', '{Monday-Saturday=10-9,sunday=11-6}', '41.8507029', '-87.8033709']

解释:

代码语言:javascript
运行
复制
print([arg.s           # value of string literal
       for arg in
       ast.parse(line)
      .body            # module (list of statements)
       [0]             # first statement (an Expr node)
      .value           # expression (a Call)
      .args            # arguments to function call
       ])
票数 2
EN

Stack Overflow用户

发布于 2018-06-20 17:32:06

re上试试这个

代码语言:javascript
运行
复制
import re
temp_string = 'getDetailsfrmBean("storePg","564","Berwyn, IL","7180 W CERMAK RD.","SPACE A1","","BERWYN","IL","US","60402","(708) 788-5097","{Monday-Saturday=10-9,sunday=11-6}","41.8507029","-87.8033709");'
split_list = filter(None, re.split("[, \-!?:\"]+",temp_string))
print split_list

应产生以下输出:

代码语言:javascript
运行
复制
['getDetailsfrmBean(', 'storePg', '564', 'Berwyn', 'IL', '7180', 'W', 'CERMAK', 'RD.', 'SPACE', 'A1', 'BERWYN', 'IL', 'US', '60402', '(708)', '788', '5097', '{Monday', 'Saturday=10', '9', 'sunday=11', '6}', '41.8507029', '87.8033709', ');']

我从下面的答案中学到了这个:https://stackoverflow.com/a/23720594/5907969

票数 1
EN

Stack Overflow用户

发布于 2018-06-20 17:34:16

您可以使用一个简单的正则表达式来仅提取逗号分隔的引号字符串部分:

代码语言:javascript
运行
复制
import re

line = 'getDetailsfrmBean("storePg","564","Berwyn, IL","7180 W CERMAK RD.","SPACE A1","","BERWYN","IL","US","60402","(708) 788-5097","{Monday-Saturday=10-9,sunday=11-6}","41.8507029","-87.8033709");'

args_string = re.match(r'getDetailsfrmBean\((.+)\);$', line.strip()).group(1)
print(args_string)

输出:

代码语言:javascript
运行
复制
"storePg","564","Berwyn, IL","7180 W CERMAK RD.","SPACE A1","","BERWYN","IL","US","60402","(708) 788-5097","{Monday-Saturday=10-9,sunday=11-6}","41.8507029","-87.8033709"

然后,有多种方法可以从此类数据中解析字符串列表:

代码语言:javascript
运行
复制
import ast
import json
import csv

args_array = '[%s]' % args_string

assert (json.loads(args_array)
        == ast.literal_eval(args_array)
        == next(csv.reader([args_string]))
        == ['storePg', '564', 'Berwyn, IL', '7180 W CERMAK RD.', 'SPACE A1', '', 'BERWYN', 'IL', 'US', '60402',
            '(708) 788-5097', '{Monday-Saturday=10-9,sunday=11-6}', '41.8507029', '-87.8033709'])
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50944529

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档