我正在寻找一个解决方案,以剥离一些HTML从抓取的HTML页面。页面中有一些我想要删除的重复数据,所以我尝试使用preg_replace()删除变量数据。
我想要剥离的数据:
Producent:<td class="datatable__body__item" data-title="Producent">Example
Groep:<td class="datatable__body__item" data-title="Produkt groep">Example1
Type:<td class="datatable__body__item" data-title="Produkt type">Example2
....
...
之后一定是这样的:
Producent:Example
Groep:Example1
Type:Example2
因此,除了data-title块中的单词之外,大块是相同的。如何删除这段数据?
我尝试了一些像这样的东西:
$pattern = '/<td class=\"datatable__body__item\"(.*?)>/';
$tech_specs = str_replace($pattern,"", $tech_specs);
但这并不管用。对此有什么解决方案吗?
发布于 2018-08-27 17:57:18
好吧,也许我的问题写得不是很好。我有一张桌子,我需要从一个网站上刮下来。我需要表格中的信息,但必须清理前面提到的一些部分。我最终做出的解决方案是这个,并且它起作用了。它仍然有一些工作要做手动替换,但那是因为愚蠢的“他们使用英寸。;-)
解决方案:
\\ find the table in the sourcecode
foreach($techdata->find('table') as $table){
\\ filter out the rows
foreach($table->find('tr') as $row){
\\ take the innertext using simplehtmldom
$tech_specs = $row->innertext;
\\ strip some 'garbage'
$tech_specs = str_replace(" \t\t\t\t\t\t\t\t\t\t\t<td class=\"datatable__body__item\">","", $tech_specs);
\\ find the first word of the string so I can use it
$spec1 = explode('</td>', $tech_specs)[0];
\\ use the found string to strip down the rest of the table
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"" . $spec1 . "\">",":", $tech_specs);
\\ manual correction because of the " used
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"tbv Montage benodigde 19\">",":", $tech_specs);
\\ manual correction because of the " used
$tech_specs = str_replace("<td class=\"datatable__body__item\" data-title=\"19\">",":", $tech_specs);
\\ strip some 'garbage'
$tech_specs = str_replace("\t\t\t\t\t\t\t\t\t\t","\n", $tech_specs);
$tech_specs = str_replace("</td>","", $tech_specs);
$tech_specs = str_replace(" ","", $tech_specs);
\\ put the clean row in an array ready for usage
$specs[] = $tech_specs;
}
}
发布于 2018-08-28 07:25:38
只需使用通配符:
$newstr = preg_replace('/<td class="datatable__body__item" data-title=".*?">/', '', $str);
.*?
的意思是匹配任何东西,但不要贪婪
https://stackoverflow.com/questions/51902407
复制相似问题