我得到这个错误:InvalidSchema(“找不到{!r}的连接适配器”.format(Url))
当我尝试运行这段代码时:
import pandas as pd
pd.set_option('display.max_colwidth', -1)
url_file = 'https://github.com/MarissaFosse/ryersoncapstone/raw/master/DailyNewsArticles.xlsx'
tstar_articles = pd.read_excel(url_file, "TorontoStar Articles", header=0) 
url_to_sents = {}
for url in tstar_articles:
  url = tstar_articles['URL']
  page = requests.get(url)
  soup = BeautifulSoup(page.content, 'html.parser')
  results = soup.find(class_='c-article-body__content') 
  results_text = [tag.get_text().strip() for tag in results]
  sentence_list = [sentence for sentence in results_text if not '\n' in sentence]
  sentence_list = [sentence for sentence in sentence_list if '.' in sentence]
  article = ' '.join(sentence_list)
  url_to_sents[url] = article我正在尝试使用request()从我创建的Excel文件中读取URL。我怀疑这是由于看不见的字符,但不知道如何检查任何字符。
发布于 2020-05-18 03:14:05
当您遍历返回的dataframe时,它只返回列名。因此,您的原始代码首先将Date分配给url,然后是Category,依此类推;这些字符串没有URL,因此出现错误。
相比之下,查找dataframe中的任何列都会返回一个可以迭代的序列。因此,不是在需要URL时遍历tstar_articles,而是遍历tstar_articles['URL']
因此,不是:
for url in tstar_articles:
    url = tstar_articles['URL']
    page = requests.get(url)...use:
for url in tstar_articles['URL']:
    page = requests.get(url) https://stackoverflow.com/questions/61857260
复制相似问题