我试着学习html抓取一个项目,我使用python和lxml。到目前为止,我已经成功地获得了我需要的数据,但现在我有了另一个问题。当您向下滚动时,我正在从(op.gg)中刮走的站点,它添加了包含更多信息的新表。当我运行我的脚本(下面),它只得到前50个条目,而没有更多。我的问题是,我怎样才能在页面上得到至少200个名字,或者是否有可能。
from lxml import html
import requests
page = requests.get('https://na.op.gg/ranking/ladder/')
tree = html.fromstring(page.content)
names = tree.xpath('//td[@class="SummonerName Cell"]/a/text()')
print (names)
发布于 2017-04-30 08:13:55
借用佩德罗的观点,https://na.op.gg/ranking/ajax2/ladders/start=number
会给你50张从number开始的唱片,例如:
https://na.op.gg/ranking/ajax2/ladders/start=0
get (1-50)
https://na.op.gg/ranking/ajax2/ladders/start=50
get (51-100),
https://na.op.gg/ranking/ajax2/ladders/start=100
get (101-150),
https://na.op.gg/ranking/ajax2/ladders/start=150
get (151-200),
等等..。
在此之后,您可以更改废代码,因为该页与原来的页不同,假设您想获得首200个名称,下面是修改后的代码:
from lxml import html
import requests
start_url = 'https://na.op.gg/ranking/ajax2/ladders/start='
names_200 = list()
for i in [0,50,100,150]:
dest_url = start_url + str(i)
page = requests.get(dest_url)
tree = html.fromstring(page.content)
names_50 = tree.xpath('//a[not(@target) and not(@onclick)]/text()')
names_200.extend(names_50)
print names_200
print len(names_200)
输出:
[u'am\xc3\xa9liorer', 'pireaNn', 'C9 Ray', 'P1 Pirean', 'Pobelter', 'mulgokizary', 'consensual clown', 'Jue VioIe Grace', 'Deep Learning', 'Keegun', 'Free Papa Chau', 'C9 Gun', 'Dhokla', 'Arrowlol', 'FOX Brandini', 'Jurassiq', 'Win or Learn', 'Acoldblazeolive', u'R\xc3\xa9venge', u'M\xc3\xa9ru', 'Imaqtpie', 'Rohammers', 'blaberfish2', 'qldurtms', u'd\xc3\xa0wolfsclaw', 'TheOddOrange', 'PandaTv 656826', 'stuntopolis', 'Butler Delta', 'P1 Shady', 'Entranced', u'Linsan\xc3\xadty', 'Ablazeolive', 'BukZacH', 'Anivia Kid', 'Contractz', 'Eitori', 'MistyStumpey', 'Prodedgy', 'Splitting', u'S\xc4\x99b B\xc4\x99rnal', 'N For New York', 'Naeun', '5tunt', 'C9 Winter', 'Doubtfull', 'MikeYeung', 'Rikara', u'RAH\xc3\x9cLK', ' Sudzzi', 'joong ki song', 'xWeixin VinLeous', 'rhubarbs', u'Ch\xc3\xa0se', 'XueGao', 'Erry', 'C9 EonYoung', 'Yeonbee', 'M ckg', u'Ari\xc3\xa1na Lovato', 'OmarGod', 'Wiggily', 'lmpactful', 'Str1fe', 'LL Stylish', '2017', 'FlREFLY', 'God Fist Monk', 'rWeiXin VinLeous', 'Grigne', 'fantastic ad', 'bobqinX', 'grigne 1v10', 'Sora1', 'Juuichi san ', 'duoking2', 'SandPaperX', 'Xinthus', 'TwichTv CoMMa', 'xFSN Rin', 'UBC CJ', 'PotIuck', 'DarkWingsForSale', 'Get After lt', 'old chicken', u'\xc4\x86ris', 'VK Deemo', 'Pekin Woof', 'YIlIlIlIlI', 'RiceLegend', 'Chimonaa1', 'DJNDREE5', u'CloudNguy\xc3\xa9n', 'Diamond 1 Khazix', 'dawolfsfang', 'clg imaqtpie69', 'Pyrites', 'Lava', 'Rathma', 'PieCakeLord', 'feed l0rd', 'Eygon', 'Autolycus1', 'FateFalls 20xx', 'nIsHIlEzHIlA', 'C9 Sword', 'TET Fear', 'a very bad time', u'Jur\xc3\xa1ssiq', 'Ginormous Noob', 'Saskioo', 'S D 2 NA', 'C9 Smoothie', 'dufTlalgkqtlek', 'Pants are Dragon', u'H\xc3\xb3llywood', 'Serenitty', 'Waggily ', 'never lucky help', u'insan\xc3\xadty', 'Joyul', 'TheeBrandini', 'FoTheWin', 'RyuShoryu', 'avi is me', 'iKingVex', 'PrismaI', 'An Obese Panda', 'TdollasAKATmoney', 'feud999', 'Soligo', 'Steel I', 'SNH48 Ruri', 'BillyBoss1', 'Annie Bot', 'Descraton', 'Cris', 'GrayHoves', 'RegisZZ', 'lron Pyrite', 'Zaion', 'Allorim', 't d', u'Alex \xc3\xafch', 'godrjsdnd', 'DOUBLELIFTSUCKS', 'John Mcrae', u'Lobo Solitari\xc3\xb3', 'MikeYeunglol', 'i xo u', 'NoahMost', 'Vsionz', 'GladeGleamBright', 'Tuesdayy', 'RealDarkness', 'CC Dean', 'na mid xd LFT', 'Piggy Kitten', 'Abou222', 'TG Strompest', 'MooseHater', 'Day after Day', 'bat8man', 'AxAxAxAxA', 'Boyfriend', 'EvanRL', '63FYWJMbam', 'Fiftygbl', u'Br\xc4\xb1an', 'MlST', u'S\xc3\xb8ren Bjerg', 'FOX Akaadian', '5word', 'tchikou', 'Hakuho', 'Noobkiller291', 'woxiangwanAD', 'Doublelift', 'Jlaol', u'z\xc3\xa3ts', 'Cow Goes Mooooo', u'Be Like \xc3\x91e\xc3\xb8\xc3\xb8', 'Liquid Painless', 'Zergy', 'Huge Rooster', 'Shiphtur', 'Nikkone', 'wiggily1', 'Dylaran', u'C\xc3\xa0m', 'byulbit', 'dirtybirdy82', 'FreeXpHere', u'V\xc2\xb5lcan', 'KaNKl', 'LCS Actor 4', 'bie sha wo', 'Mookiez', 'BKSMOOTH', 'FatMiku']
200
顺便说一句,你可以根据你的需求来扩展它。
https://stackoverflow.com/questions/43699975
复制相似问题