首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >使用python从文本/javascript中提取数据

使用python从文本/javascript中提取数据
EN

Stack Overflow用户
提问于 2022-04-25 01:39:58
回答 1查看 226关注 0票数 -1

如何提取'https://www.example.com/get_file/4/b315c8e0d6fad43d89445378b5292eed6981a999ba/174000/174469/174469.mp4/?br=707' &使用video_url

代码语言:javascript
运行
复制
[<script src="https://www.example.com/player/kt_player.js?v=5.5.1" type="text/javascript"></script>, <script type="text/javascript">
                            /* <![CDATA[ */
                                                            function getEmbed(width, height) {
                                    if (width && height) {
                                        return '<iframe width="' + width + '" height="' + height + '" src="https://www.example.com/embed/174469" frameborder="0" allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen></iframe>';
                                    }
                                    return '<iframe width="852" height="480" src="https://www.example.com/embed/174469" frameborder="0" allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen></iframe>';
                                }
                            
                            var flashvars = {
/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释 
* video_id: '174469',                                                                     video_categories: 'example_category1', 'example_category2',                                                                     video_tags: 'example_tag1', 'example_tag2',                                                                     license_code: '$603825119921245',                                                                   rnd: '1650848189',                                                                  video_url: 'https://www.example.com/get_file/4/b315c8e0d6fad43d89445378b5292eed6981a999ba/174000/174469/174469.mp4/?br=707',                                                                    postfix: '.mp4',                                                                    video_url_text: '480p',                                                                     video_alt_url: 'https://www.example.com/get_file/4/ffafbe6913656c2250c34bf20fd945a5f86898d749/174000/174469/174469_720p.mp4/?br=1290',                                                                  video_alt_url_text: '720p',                                                                     video_alt_url_hd: '1',                                                                  video_alt_url2: 'https://www.example.com/get_file/4/66c8876a9fd8cd3d823d06880c1797b1424f3200df/174000/174469/174469_1080p.mp4/?br=2559',                                                                    video_alt_url2_text: '1080p',                                                                   video_alt_url2_hd: '1',                                                                     preview_url: 'https://www.example.com/contents/videos_screenshots/174000/174469/preview.jpg',                                                                   preview_url1: 'https://www.example.com/contents/videos_screenshots/174000/174469/preview.mp4.jpg',                                                                  preview_height1: '480',                                                                     preview_url2: 'https://www.example.com/contents/videos_screenshots/174000/174469/preview_720p.mp4.jpg',                                                                     preview_height2: '720',                                                                     preview_url3: 'https://www.example.com/contents/videos_screenshots/174000/174469/preview_1080p.mp4.jpg',                                                                    preview_height3: '1080',                                                                    skin: 'youtube.css',                                                                    logo_position: '0,0',                                                                   logo_anchor: 'topleft',                                                                     hide_controlbar: '1',                                                                   hide_style: 'fade',                                                                     volume: '1',                                                                    related_src: 'https://www.example.com/related_videos_html/174469/',                                                                     adv_pre_vast: 'https://twinrdsrv.com/preroll.engine?id=613eb379-62dd-49ef-8299-db2b5b2af4d7&zid=12861&cvs={ClientVideoSupport}&time={TimeOffset}&stdtime={StdTimeOffset}&abr={IsAdblockRequest}&pageurl={PageUrl}&tid={TrackingId}&res={Resolution}&bw={BrowserWidth}&bh={BrowserHeight}&kw={Keywords}&referrerUrl={ReferrerUrl}&pw={PlayerWidth}&ph={PlayerHeight}',                                                                   adv_pre_skip_duration: '5',                                                                     adv_pre_skip_text_time: 'Skip ad in %time',                                                                     adv_pre_skip_text: 'Skip ad',                                                                   adv_post_vast: 'https://twinrdsrv.com/preroll.engine?id=613eb379-62dd-49ef-8299-db2b5b2af4d7&zid=12861&cvs={ClientVideoSupport}&time={TimeOffset}&stdtime={StdTimeOffset}&abr={IsAdblockRequest}&pageurl={PageUrl}&tid={TrackingId}&res={Resolution}&bw={BrowserWidth}&bh={BrowserHeight}&kw={Keywords}&referrerUrl={ReferrerUrl}&pw={PlayerWidth}&ph={PlayerHeight}',                                                                  adv_post_skip_duration: '5',                                                                    adv_post_skip_text_time: 'Skip ad in %time',                                                                    adv_post_skip_text: 'Skip ad',                                                                  lrcv: '1651572296480833989009946',                                                                  vast_timeout1: '10',                                                                    player_width: '882',                                                                    player_height: '496.9014084507',                                                                    embed: '1'                                                          };
*/
                                                        var player_obj = kt_player('kt_player', 'https://www.example.com/player/kt_player.swf?v=5.5.1', '100%', '100%', flashvars);
                                                                window.onload = function() {
                                        $('.pop-adv .btn').click(function(e) {
                                            player_obj.play();
                                        });
                                    };
/* ]]> */
                        </script>]

我试过了

代码语言:javascript
运行
复制
import json

script= """[<script src="https://www.example.com/player/kt_player.js?v=5.5.1" type="text/javascript"></script>, <script type="text/javascript">
                            /* <![CDATA[ */
                                                            function getEmbed(width, height) {
                                    if (width && height) {
                                        return '<iframe width="' + width + '" height="' + height + '" src="https://www.example.com/embed/174469" frameborder="0" allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen></iframe>';
                                    }
                                    return '<iframe width="852" height="480" src="https://www.example.com/embed/174469" frameborder="0" allowfullscreen webkitallowfullscreen mozallowfullscreen oallowfullscreen msallowfullscreen></iframe>';
                                }
                            
                            var flashvars = {
/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释 
* video_id: '174469',                                                                     video_categories: 'example_category1, example_category2',                                                                   video_tags: 'example_tag1, esample_tag2',                                                                   license_code: '$603825119921245',                                                                   rnd: '1650848189',                                                                  video_url: 'https://www.example.com/get_file/4/b315c8e0d6fad43d89445378b5292eed6981a999ba/174000/174469/174469.mp4/?br=707',                                                                    postfix: '.mp4',                                                                    video_url_text: '480p',                                                                     video_alt_url: 'https://www.example.com/get_file/4/ffafbe6913656c2250c34bf20fd945a5f86898d749/174000/174469/174469_720p.mp4/?br=1290',                                                                  video_alt_url_text: '720p',                                                                     video_alt_url_hd: '1',                                                                  video_alt_url2: 'https://www.example.com/get_file/4/66c8876a9fd8cd3d823d06880c1797b1424f3200df/174000/174469/174469_1080p.mp4/?br=2559',                                                                    video_alt_url2_text: '1080p',                                                                   video_alt_url2_hd: '1',                                                                     preview_url: 'https://www.example.com/contents/videos_screenshots/174000/174469/preview.jpg',                                                                   preview_url1: 'https://www.example.com/contents/videos_screenshots/174000/174469/preview.mp4.jpg',                                                                  preview_height1: '480',                                                                     preview_url2: 'https://www.example.com/contents/videos_screenshots/174000/174469/preview_720p.mp4.jpg',                                                                     preview_height2: '720',                                                                     preview_url3: 'https://www.example.com/contents/videos_screenshots/174000/174469/preview_1080p.mp4.jpg',                                                                    preview_height3: '1080',                                                                    skin: 'youtube.css',                                                                    logo_position: '0,0',                                                                   logo_anchor: 'topleft',                                                                     hide_controlbar: '1',                                                                   hide_style: 'fade',                                                                     volume: '1',                                                                    related_src: 'https://www.example.com/related_videos_html/174469/',                                                                     adv_pre_vast: 'https://twinrdsrv.com/preroll.engine?id=613eb379-62dd-49ef-8299-db2b5b2af4d7&zid=12861&cvs={ClientVideoSupport}&time={TimeOffset}&stdtime={StdTimeOffset}&abr={IsAdblockRequest}&pageurl={PageUrl}&tid={TrackingId}&res={Resolution}&bw={BrowserWidth}&bh={BrowserHeight}&kw={Keywords}&referrerUrl={ReferrerUrl}&pw={PlayerWidth}&ph={PlayerHeight}',                                                                   adv_pre_skip_duration: '5',                                                                     adv_pre_skip_text_time: 'Skip ad in %time',                                                                     adv_pre_skip_text: 'Skip ad',                                                                   adv_post_vast: 'https://twinrdsrv.com/preroll.engine?id=613eb379-62dd-49ef-8299-db2b5b2af4d7&zid=12861&cvs={ClientVideoSupport}&time={TimeOffset}&stdtime={StdTimeOffset}&abr={IsAdblockRequest}&pageurl={PageUrl}&tid={TrackingId}&res={Resolution}&bw={BrowserWidth}&bh={BrowserHeight}&kw={Keywords}&referrerUrl={ReferrerUrl}&pw={PlayerWidth}&ph={PlayerHeight}',                                                                  adv_post_skip_duration: '5',                                                                    adv_post_skip_text_time: 'Skip ad in %time',                                                                    adv_post_skip_text: 'Skip ad',                                                                  lrcv: '1651572296480833989009946',                                                                  vast_timeout1: '10',                                                                    player_width: '882',                                                                    player_height: '496.9014084507',                                                                    embed: '1'                                                          };
*/
                                                        var player_obj = kt_player('kt_player', 'https://www.example.com/player/kt_player.swf?v=5.5.1', '100%', '100%', flashvars);
                                                                window.onload = function() {
                                        $('.pop-adv .btn').click(function(e) {
                                            player_obj.play();
                                        });
                                    };
/* ]]> */
                        </script>]"""

json_data= json.loads(script)
print(json_data['video_url'])

得到了这个错误,

json.decoder.JSONDecodeError:期望值:第1行第2列(char 1)

EN

Stack Overflow用户

回答已采纳

发布于 2022-04-25 02:23:33

这不是一个理想的解决方案,因为它依赖于具有非常一致的格式的源文档,但是您可以尝试使用正则表达式“手动”解析它:

下面假设您已经将script设置为包含上面下载的文本。

代码语言:javascript
运行
复制
import re

def extract(name, script):
    return re.search(rf'\b{name}\s*:\s*(\'|")(.*?)\1', script)[2]

extract('video_url', script)
'https://www.example.com/get_file/4/b315c8e0d6fad43d89445378b5292eed6981a999ba/174000/174469/174469.mp4/?br=707'

extract('video_alt_url', script)
'https://www.example.com/get_file/4/ffafbe6913656c2250c34bf20fd945a5f86898d749/174000/174469/174469_720p.mp4/?br=1290'

extract('video_alt_url2', script)
'https://www.example.com/get_file/4/66c8876a9fd8cd3d823d06880c1797b1424f3200df/174000/174469/174469_1080p.mp4/?br=2559'

regex的工作方式如下:

  • \b{name}\s*:\s*用可变间距的allowed
  • ('|")(.*?)\1匹配name:部件,allowed
  • ('|")(.*?)\1匹配'some_text'"some_text"样式的字符串
  • 结尾处的[2]使用第二组,即匹配引号

中的文本的(.*?)

票数 0
EN
查看全部 1 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71993695

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档