文章/答案/技术大牛

发布

社区首页 >问答首页 >在熊猫DataFrame中添加多个重塑列表

问在熊猫DataFrame中添加多个重塑列表
EN

Stack Overflow用户

提问于 2017-11-22 21:40:28

回答 2查看 232关注 0票数 1

我正在废除英格兰的联合数据，当我一次只做一家医院的工作时，我会以正确的格式得到结果。我最终想对所有医院进行迭代，但首先决定由三个不同的医院组成一个数组，并计算出迭代。

当我只有一家医院时，下面的代码给出了熊猫DataFrame的最终结果的正确格式：

import requests
from bs4 import BeautifulSoup
import pandas
import numpy as np
r=requests.get("http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?
hospitalName=Norfolk%20and%20Norwich%20Hospital")
c=r.content
soup=BeautifulSoup(c,"html.parser")

all=soup.find_all(["div"],{"class":"toggle_container"})[1]

i=0
temp = []
for item in all.find_all("td"):
    if i%4 ==0:
        temp.append(soup.find_all("span")[4].text)
        temp.append(soup.find_all("h5")[0].text)
    temp.append(all.find_all("td")[i].text.replace("   ",""))
    i=i+1
table = np.array(temp).reshape(12,6)
final = pandas.DataFrame(table)
final

在我的迭代版本中，我无法找到将每个结果集附加到最终DataFrame中的方法：

hosplist = ["http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Norfolk%20and%20Norwich%20Hospital",
            "http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Barnet%20Hospital",
            "http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Altnagelvin%20Area%20Hospital"]
temp2 = []
df_final = pandas.DataFrame()
for item in hosplist:
    r=requests.get(item)
    c=r.content
    soup=BeautifulSoup(c,"html.parser")

    all=soup.find_all(["div"],{"class":"toggle_container"})[1]
    i=0
    temp = []
    for item in all.find_all("td"):
        if i%4 ==0:
            temp.append(soup.find_all("span")[4].text)
            temp.append(soup.find_all("h5")[0].text)
        temp.append(all.find_all("td")[i].text)
        i=i+1
    table = np.array(temp).reshape((int(len(temp)/6)),6)
    temp2.append(table)
    #df_final = pandas.DataFrame(df)

最后，“表”拥有我想要的所有数据，但操作起来不容易，所以我想把它放在DataFrame中。然而，我得到一个"ValueError:必须通过2-d输入“错误。

我认为这个错误是说我有3个数组，这将使它成为三维的。这只是一个实践迭代，有超过400家医院的数据，我计划把数据，但我现在被困在这里。

pandas

dataframe

beautifulsoup

iteration

python

回答 2

Stack Overflow用户

回答已采纳

发布于 2017-11-23 01:05:55

我对代码进行了一些重新组织，并且能够在不需要编码的情况下创建dataframe。

解决方案：

hosplist = ["http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Norfolk%20and%20Norwich%20Hospital",
            "http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Barnet%20Hospital",
            "http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Altnagelvin%20Area%20Hospital"]
temp = []
temp2 = []
df_final = pandas.DataFrame()
for item in hosplist:
    r=requests.get(item)
    c=r.content
    soup=BeautifulSoup(c,"html.parser")

    all=soup.find_all(["div"],{"class":"toggle_container"})[1]
    i=0

    for item in all.find_all("td"):
        if i%4 ==0:
            temp.append(soup.find_all("span")[4].text)
            temp.append(soup.find_all("h5")[0].text)
        temp.append(all.find_all("td")[i].text.replace("-","NaN").replace("+",""))
        i=i+1
temp2.append(temp)
table = np.array(temp2).reshape((int(len(temp2[0])/6)),6)
df_final = pandas.DataFrame(table, columns=['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])
df_final

票数 1

Stack Overflow用户

发布于 2017-11-23 00:10:55

你的问题的简单答案是HERE。

最困难的部分是拿出你的代码，找出不正确的地方。

使用完整的代码，我修改了它，如下所示。请与你的复制和区别。

import requests
from bs4 import BeautifulSoup
import pandas
import numpy as np

hosplist = ["http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Norfolk%20and%20Norwich%20Hospital",
            "http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Barnet%20Hospital",
            "http://www.njrsurgeonhospitalprofile.org.uk/HospitalProfile?hospitalName=Altnagelvin%20Area%20Hospital"]
temp2 = []
df_final = pandas.DataFrame()
for item in hosplist:
    r=requests.get(item)
    c=r.content
    soup=BeautifulSoup(c,"html.parser")

    all=soup.find_all(["div"],{"class":"toggle_container"})[1]
    i=0
    temp = []
    for item in all.find_all("td"):
        if i%4 ==0:
            temp.append(soup.find_all("span")[4].text)
            temp.append(soup.find_all("h5")[0].text)
        temp.append(all.find_all("td")[i].text)
        i=i+1
    table = np.array(temp).reshape((int(len(temp)/6)),6)
    for array in table:
        newArray = []
        for x in array:
            try:
                x = x.encode("ascii")
            except:
                x = 'cannot convert'
            newArray.append(x)
        temp2.append(newArray)

df_final = pandas.DataFrame(temp2, columns=['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])
print df_final

我试图为ascii转换使用一个列表理解，这对于字符串显示在dataframe中是绝对必要的，但是理解是抛出了一个错误，所以我构建了一个异常，而异常从未显示。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/47444269

复制

相似问题

问在熊猫DataFrame中添加多个重塑列表
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在熊猫DataFrame中添加多个重塑列表EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在熊猫DataFrame中添加多个重塑列表
EN