我只想获得文本,在下面的html中保存,使用bs并丢弃文本del。我怎么能这么做?
<div>
<span class="chk_box">
<input id="subj5" name="subj" onclick="subjSel();" type="checkbox" value="5"/>
<label for="subj5">
SAVE1
</label>
</span>
<span class="chk_box">
<input id="subj6" name="subj" onclick="subjSel();" type="checkbox" value="6"/>
<label for="subj6">
SAVE2
</label>
</span>
<span class="chk_box">
<input disabled="" id="subj7" name="subj" onclick="subjSel();" type="checkbox" value="7"/>
<label for="" subj7""="">
DEL1
</label>
</span>
<span class="chk_box">
<input disabled="" id="subj8" name="subj" onclick="subjSel();" type="checkbox" value="8"/>
<label for="subj78">
DEL2
</label>
</span>
</div>发布于 2022-02-06 07:38:48
看起来,要提取的项缺少了disabled=''中的input,这是一个区别性的特性。所以你可以过滤它:
from bs4 import BeautifulSoup
html = '''<div>
<span class="chk_box">
<input id="subj5" name="subj" onclick="subjSel();" type="checkbox" value="5"/>
<label for="subj5">
SAVE1
</label>
</span>
<span class="chk_box">
<input id="subj6" name="subj" onclick="subjSel();" type="checkbox" value="6"/>
<label for="subj6">
SAVE2
</label>
</span>
<span class="chk_box">
<input disabled="" id="subj7" name="subj" onclick="subjSel();" type="checkbox" value="7"/>
<label for="" subj7""="">
DEL1
</label>
</span>
<span class="chk_box">
<input disabled="" id="subj8" name="subj" onclick="subjSel();" type="checkbox" value="8"/>
<label for="subj78">
DEL2
</label>
</span>
</div>'''
soup = BeautifulSoup(html)
results = [i.find_next_sibling().get_text().strip() for i in soup.find_all('input', {'disabled':None})]输出:
['SAVE1', 'SAVE2']https://stackoverflow.com/questions/71004863
复制相似问题