问使用Selenium Scraper时消除%符号(Python)
EN

Stack Overflow用户

提问于 2018-06-15 07:42:20

回答 2查看 65关注 0票数 0

下面是一个selenium web抓取器，它遍历该网站页面(https://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2018&month=0&season1=2018&ind=0)的不同选项卡，选择“导出数据”按钮，下载数据，添加一个yearid列，然后将数据加载到MySQL表中。

import sys
import pandas as pd
import os
import time
from datetime import datetime
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from sqlalchemy import create_engine


button_text_to_url_type = {
    'dashboard': 8,
    'standard': 0,
     'advanced': 1,
     'batted_ball': 2,
     'win_probability': 3,
     'pitch_type': 4,
     'pitch_values': 7,
     'plate_discipline': 5,
     'value': 6
}

download_dir = os.getcwd()
profile = FirefoxProfile("C:/Users/PATHTOFIREFOX")
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", 'text/csv')
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.dir", download_dir)
profile.set_preference("browser.download.folderList", 2)
driver = webdriver.Firefox(firefox_profile=profile)


today = datetime.today()
for button_text, url_type in button_text_to_url_type.items():

    default_filepath = os.path.join(download_dir, 'Fangraphs Leaderboard.csv')
    desired_filepath = os.path.join(download_dir,
                                    '{}_{}_{}_Leaderboard_{}.csv'.format(today.year, today.month, today.day,
                                                                         button_text))

    driver.get(
        "https://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=0&type={}&season=2018&month=0&season1=2018&ind=0&team=&rost=&age=&filter=&players=".format(
            url_type))
    driver.find_element_by_link_text('Export Data').click()
    if os.path.isfile(default_filepath):
        os.rename(default_filepath, desired_filepath)
        print('Renamed file {} to {}'.format(default_filepath, desired_filepath))
    else:
        sys.exit('Error, unable to locate file at {}'.format(default_filepath))

    df = pd.read_csv(desired_filepath)
    df["yearid"] = datetime.today().year
    df.to_csv(desired_filepath)

    engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
                           .format(user="walker",
                                   pw="password",
                                   db="data"))
    df.to_sql(con=engine, name='fg_test_hitting_{}'.format(button_text), if_exists='replace')

time.sleep(10)
driver.quit()

但是，当我下载数据时，一些列下载数据时，会在整数(即25%)后加上%符号，这会影响我在MySQL中的格式。在将数据抓取到Pandas数据框中时，是否可以更改包含%符号的列，使其仅显示整数？如果是这样的话，我该如何在我创建的从网站的各个选项卡中抓取数据的循环中实现这一点呢？我还想从这个过程中排除第一行数据，因为这是我保存列名的行。提前感谢！

python

mysql

pandas

selenium

web-scraping

回答 2

Stack Overflow用户

发布于 2018-06-15 07:46:35

在你把所有东西都收集起来并放到一个pandas数据框中之后，你可以用replace替换所有的%符号。

df.replace('%','')

票数 0

Stack Overflow用户

发布于 2018-06-15 13:09:50

使用Pandas.Series.str.replace()。

df.str.replace('%','')

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50867356

复制

相似问题

问使用Selenium Scraper时消除%符号(Python)
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Selenium Scraper时消除%符号(Python)EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Selenium Scraper时消除%符号(Python)
EN