首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >美汤萃取首道

美汤萃取首道
EN

Stack Overflow用户
提问于 2020-04-03 12:04:16
回答 2查看 298关注 0票数 0

我试图使用漂亮的汤从HTML表中提取数据。

代码语言:javascript
运行
复制
import requests
import urllib.request
import time
from bs4 import BeautifulSoup

import webbrowser
import httplib2

import pyodbc 
from datetime import datetime
from pprint import pprint

quote_page = 'https://ph.investing.com/economic-calendar/'

table = soup.find_all('table', attrs={'id': 'economicCalendarData'})

req = urllib.request.Request(quote_page,headers={'User-Agent':"Magic Browser"})
resp = urllib.request.urlopen(req)
data = resp.read()
html = data.decode('ISO-8859-1')
#print(html)
soup = BeautifulSoup(html, 'html5lib')
print (soup.prettify())

table = soup.find_all('table', attrs={'id': 'economicCalendarData'})
print(table)

res = []
for tr in table:
td = tr.find_all('td') 
if row: 
    res.append(row)
print (res)

但是表的第一个TD有它的日期。

代码语言:javascript
运行
复制
https://ph.investing.com/economic-calendar/

我想将这个日期保存在一个变量中,然后将其余的数据保存到一个表中。

代码语言:javascript
运行
复制
import pandas as pd
df = pd.DataFrame(res)
df

提前谢谢。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-04-03 12:42:50

用于后面的css选择器,以获得表的日期第一列值。

代码语言:javascript
运行
复制
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'}
r=requests.get("https://ph.investing.com/economic-calendar/",headers=headers)
soup=BeautifulSoup(r.text,"html.parser")
table=soup.find('table',attrs={"id":"economicCalendarData"})
print(table.select_one('tbody>tr>td.theDay').text)

或者你可以用。

代码语言:javascript
运行
复制
 print(soup.select_one('table#economicCalendarData>tbody>tr>td.theDay').text)

代码语言:javascript
运行
复制
  print(soup.select_one('table#economicCalendarData td.theDay').text)

在数据帧上打印整个表并导入csv。

代码语言:javascript
运行
复制
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'}
r=requests.get("https://ph.investing.com/economic-calendar/",headers=headers)
soup=BeautifulSoup(r.text,"html.parser")
print(soup.select_one('table#economicCalendarData td.theDay').text)
table=soup.find('table',attrs={"id":"economicCalendarData"})
df=pd.read_html(str(table))[0]
df1=df.iloc[1:,:7]
print(df1)
df1.to_csv("index.csv", index=False)

**Output**:



   Friday, April 3, 2020
     Time Cur. Imp.  ...   Actual Forecast Previous
     Time Cur. Imp.  ...   Actual Forecast Previous
1   05:00  KRW  NaN  ...  400.21B      NaN  409.17B
2   05:30  AUD  NaN  ...     37.9      NaN     42.7
3   06:00  AUD  NaN  ...     38.5     39.8     49.0
4   08:01  EUR  NaN  ...     32.5      NaN     59.9
5   08:30  AUD  NaN  ...     0.5%     0.4%    -0.3%
6   08:30  JPY  NaN  ...     33.8     32.7     46.8
7   08:30  HKD  NaN  ...     34.9      NaN     33.1
8   09:45  CNY  NaN  ...     43.0      NaN     26.5
9   13:00  SGD  NaN  ...    -8.6%      NaN    -5.3%
10  13:00  SGD  NaN  ...    -8.9%      NaN     0.2%
11  14:30  SEK  NaN  ...     46.9      NaN     56.4
12  14:45  EUR  NaN  ...   -35.2B      NaN   -20.0B
13  15:00  EUR  NaN  ...    -1.3%     2.1%    -2.2%
14  15:15  EUR  NaN  ...     23.0     25.5     52.1
15  15:15  ZAR  NaN  ...     44.5      NaN     48.4
16  15:30  THB  NaN  ...    34.4B      NaN    35.1B
17  15:30  THB  NaN  ...   227.2B      NaN   219.9B
18  15:45  EUR  NaN  ...     20.2      NaN     50.7
19  15:45  EUR  NaN  ...     17.4     22.0     52.1
20  15:50  EUR  NaN  ...     28.9     30.2     52.0
21  15:50  EUR  NaN  ...     27.4     29.0     52.5
22  15:55  EUR  NaN  ...     35.0     36.8     50.7
23  15:55  EUR  NaN  ...     31.7     34.3     52.5
24  16:00  EUR  NaN  ...    -2.4%      NaN     2.2%
25  16:00  NOK  NaN  ...   10.70%   13.50%    2.30%
26  16:00  EUR  NaN  ...     29.7     31.4     51.6
27  16:00  EUR  NaN  ...     26.4     28.4     52.6
28  16:30  GBP  NaN  ...     36.0     36.2     53.0
29  16:30  GBP  NaN  ...     34.5     34.8     53.2
30  17:00  NOK  NaN  ...    1.50%      NaN    3.60%
31  17:00  EUR  NaN  ...     0.9%     0.1%     0.7%
32  17:00  EUR  NaN  ...     3.0%     1.7%     2.2%
33  19:30  INR  NaN  ...  475.56B      NaN  475.56B
34  20:30  USD  NaN  ...     3.1%     3.0%     3.0%
35  20:30  USD  NaN  ...     0.4%     0.2%     0.3%
36  20:30  USD  NaN  ...     34.2     34.1     34.4
37  20:30  USD  NaN  ...    12.0K      NaN    33.0K
38  20:30  USD  NaN  ...     -18K     -20K      13K
39  20:30  USD  NaN  ...    -701K    -100K     275K
40  20:30  USD  NaN  ...    62.7%    63.3%    63.4%
41  20:30  USD  NaN  ...    -713K    -163K     242K
42  20:30  USD  NaN  ...     8.7%      NaN     7.0%
43  20:30  USD  NaN  ...     4.4%     3.8%     3.5%
44  21:00  BRL  NaN  ...     37.6      NaN     50.9
45  21:00  BRL  NaN  ...     34.5      NaN     50.4
46  21:00  SGD  NaN  ...     45.4      NaN     48.7
47  21:45  USD  NaN  ...      NaN     40.5     49.6
48  21:45  USD  NaN  ...      NaN     39.1     49.4
49  22:00  USD  NaN  ...      NaN     45.0     57.8
50  22:00  USD  NaN  ...      NaN      NaN     55.6
51  22:00  USD  NaN  ...      NaN      NaN     63.1
52  22:00  USD  NaN  ...      NaN     44.0     57.3
53  22:00  USD  NaN  ...      NaN      NaN     50.8
票数 0
EN

Stack Overflow用户

发布于 2020-04-03 12:45:57

代码语言:javascript
运行
复制
import requests
import pandas as pd

headers = {'User-Agent': 'Mozila'}

r = requests.get(
    "https://ph.investing.com/economic-calendar/", headers=headers)

df = pd.read_html(r.content, attrs={'id': 'economicCalendarData'})[0]

date = df.iloc[0][0]

print(date)

df.to_csv("data.csv", index=False)

输出:

代码语言:javascript
运行
复制
Friday, April 3, 2020

data.csv 查看-联机

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61011513

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档