我正在使用clinicaltrials.gov的API获取临床试验数据的列表,并将其转换为XML文件,然后解析数据以最终导出到Excel数据集中。
在我的代码中提供的URL中,有9个结果,但是我的代码只提取了5/9的数据。我已经意识到这是因为对于其中一个字段(详细描述),只有一些试验有这些数据。当我删除细节描述,只使用另外两个字段(nctid和简要描述)时,我可以得到9/9。除了创建一个单独的数据描述和合并之外,我还能做些什么呢?
的底线:我从一个包含9个临床试验的XML文件中提取3个字段: nctid、简要摘要和详细描述,但我的输出只是提取5/9个临床试验。如果不从输出中取出详细描述字段,我的输出如何获得全部9/9?
import requests
from bs4 import BeautifulSoup
import pandas as pd
out = []
url = 'https://clinicaltrials.gov/api/query/full_studies?expr=diabetes+telehealth+peer+support&+AREA%5BStartDate%5D+EXPAND%5BTerm%5D+RANGE%5B01%2F01%2F2020%2C+09%2F01%2F2020%5D&min_rnk=1&max_rnk=50&fmt=xml'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
nctids = soup.find_all("field", {"name" : "NCTId"})
briefsummaries = soup.find_all("field", {"name" : "BriefSummary"}) if soup.find_all("field", {"name" : "BriefSummary"}) is not None else 'nothing'
detaileddescriptions = soup.find_all("field", {"name" : "DetailedDescription"}) if soup.find_all("field", {"name" : "DetailedDescription"}) is not None else 'nothing'
for nctid, briefsummary, detaileddescription in zip(nctids, briefsummaries, detaileddescriptions):
data = {'nctid': nctid, 'briefsummary': briefsummary, 'detaileddescription': detaileddescription}
out.append(data)
df = pd.DataFrame(out)
df.to_excel('clinicaltrialstresults.xlsx')发布于 2021-11-19 03:49:36
您可以尝试循环浏览学习列表,只需对代码进行细微的更改。
import requests
from bs4 import BeautifulSoup
import pandas as pd
out = []
url = 'https://clinicaltrials.gov/api/query/full_studies?expr=diabetes+telehealth+peer+support&+AREA%5BStartDate%5D+EXPAND%5BTerm%5D+RANGE%5B01%2F01%2F2020%2C+09%2F01%2F2020%5D&min_rnk=1&max_rnk=50&fmt=xml'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
study_list = soup.find_all("fullstudy")
for study in study_list:
nctid = study.find("field", {"name" : "NCTId"})
briefsummary = study.find("field", {"name" : "BriefSummary"}) if study.find("field", {"name" : "BriefSummary"}) is not None else 'nothing'
detaileddescription = study.find("field", {"name" : "DetailedDescription"}) if study.find("field", {"name" : "DetailedDescription"}) is not None else 'nothing'
data = {'nctid': nctid, 'briefsummary': briefsummary, 'detaileddescription': detaileddescription}
out.append(data)
df = pd.DataFrame(out)
df.to_excel('clinicaltrialstresults.xlsx', index=False)https://stackoverflow.com/questions/70029265
复制相似问题