文章/答案/技术大牛

发布

社区首页 >问答首页 >Foreach循环中preg_replace()的故障

问Foreach循环中preg_replace()的故障
EN

Stack Overflow用户

提问于 2016-02-06 20:39:22

回答 1查看 215关注 0票数 2

长话短说，我的客户因为争执而无法访问他们的服务器，他们需要他们所有的俱乐部照片，这样我就可以给他们建一个新的网站。我不得不通过URL下载它们，它们由PHP输出处理，该输出提供不同的大小以减少服务器负载。

他们有3000多人，我不打算一个一个地浪费时间。

所以，我决定写一个快速的，非常脏的PHP脚本，它将使用DOMDocument爬行页面，查找到图像的链接，横跨每个相册，然后跨相册子页面。

一切正常，除了在相册页上查找的脚本的这一特定部分外：

(1)与图像的链接，该链接为

<div class='imagethumb'>
    <a href="/gallery/index.php?album=blowout1&image=blahblah.jpg" title="Blahblah>
        <img src="/gallery/index.php?album=blowout1&image=blahblah_thumb.jpg />
    </a>
</div>

(2)指向后续页的链接，该链接为

<li>
    <a href="/gallery/index.php?album=beginning&amp;page=2" title="Page 2">2</a>
</li>

(3)相册“最后一页”或“.”的链接

<li>
    <a href="/gallery/index.php?album=recognition&page=9" title="Page 9">...</a>
</li>

下面是脚本的相关部分：

//$url is an argument in the function wrapping this script

//look on albums for links
foreach ($album_links as $a_url) {
    $album_html = file_get_contents($a_url['url']);
    $album = new DOMDocument;
    $album->loadHTML($album_html);
    $i_links = $album->getElementsByTagName('a');
    $album_title = $album->getElementsByTagName('title')->item(0)->textContent;

    //to keep track of the number of sub-page links found, exclude page 1
    $num_page_lnks = 1;

    //search through all links on the page, look for:
    foreach ($i_links as $link) {

        //Links contained in div with class='imagethumb'
        if ($link->parentNode->getAttribute('class') == 'imagethumb' ) {
            array_push($image_links, ["album" => str_replace(" | ", "", $album_title), "title" => $link->getAttribute('title'), "url" => "http://" . parse_url($url, PHP_URL_HOST) . $link->getAttribute('href') . "&p=*full-image"]);
        }

        //links contained in li with no class, has a page number in the title, and is not a "..." link
        elseif ($link->parentNode->getAttribute('class') == '' && preg_match('/Page\040\d*/', $link->getAttribute('title')) && $link->textContent != "...") {
            //add to the number of sub page links found
            $num_page_lnks++;
            array_push($image_page_links,  "http://" . parse_url($url, PHP_URL_HOST) . $link->getAttribute('href'));
        }

        //links containing the text "..." (link to last album page, if more than 7 pages)
        elseif($link->textContent == "...") {

            //Parse the url into parts
            $url_parse=[];
            parse_str($link->getAttribute('href'), $url_parse);

            //Last Page links appear when greater than 7 pages, so start at 8 ($num_page_links + 1)
            for ($count = ($num_page_lnks + 1); $count < ($url_parse['page'] + 1); $count++) {
                array_push($image_page_links,  "http://" . parse_url($url, PHP_URL_HOST) . preg_replace("/[^\=]\d+$/", $count, $link->getAttribute('href')));
            }
        }
    }
    unset($album);
    unset($album_html);
    unset($i_links);
}

如果脚本找到一个子页面链接，它就会添加到$num_page_links中，这样当它找到一个"..."链接时，它将知道在创建中间页面链接时从哪里开始。

它的回报是：

{
    "0": "http://club.website.com/gallery/index.php?album=beginning&page=2",
    "1": "http://club.website.com/gallery/index.php?album=beginning&page=3",
    "2": "http://club.website.com/gallery/index.php?album=history&page=2",
    "3": "http://club.website.com/gallery/index.php?album=history&page=3",
    "4": "http://club.website.com/gallery/index.php?album=history&page=4",
    "5": "http://club.website.com/gallery/index.php?album=history&page=5",
    "6": "http://club.website.com/gallery/index.php?album=history&page=6",
    "7": "http://club.website.com/gallery/index.php?album=history&page=7",
    "8": "http://club.website.com/gallery/index.php?album=memorial&page=2",
    "9": "http://club.website.com/gallery/index.php?album=memorial&page=3",
    "10": "http://club.website.com/gallery/index.php?album=memorial&page=4",
    "11": "http://club.website.com/gallery/index.php?album=memorial&page=5",
    "12": "http://club.website.com/gallery/index.php?album=memorial&page=6",
    "13": "http://club.website.com/gallery/index.php?album=memorial&page=7",
    "14": "http://club.website.com/gallery/index.php?album=memorial&page=9",
    "15": "http://club.website.com/gallery/index.php?album=memorial&page=9",
    "16": "http://club.website.com/gallery/index.php?album=members&page=2",
    "17": "http://club.website.com/gallery/index.php?album=members&page=3",
    "18": "http://club.website.com/gallery/index.php?album=members&page=4",
    "19": "http://club.website.com/gallery/index.php?album=members&page=5",
    "20": "http://club.website.com/gallery/index.php?album=members&page=6",
    "21": "http://club.website.com/gallery/index.php?album=members&page=7",
    "22": "http://club.website.com/gallery/index.php?album=members&page=8",
    "23": "http://club.website.com/gallery/index.php?album=members&page=9",
    "24": "http://club.website.com/gallery/index.php?album=members&page=10",
    "25": "http://club.website.com/gallery/index.php?album=members&page=11",
    "26": "http://club.website.com/gallery/index.php?album=toy_run&page=2",
    "27": "http://club.website.com/gallery/index.php?album=toy_run&page=3",
    "28": "http://club.website.com/gallery/index.php?album=toy_run&page=4",
    "29": "http://club.website.com/gallery/index.php?album=toy_run&page=5",
    "30": "http://club.website.com/gallery/index.php?album=toy_run&page=6",
    "31": "http://club.website.com/gallery/index.php?album=toy_run&page=7",
    "32": "http://club.website.com/gallery/index.php?album=toy_run&page=8",
    "33": "http://club.website.com/gallery/index.php?album=recognition&page=2",
    "34": "http://club.website.com/gallery/index.php?album=recognition&page=3",
    "35": "http://club.website.com/gallery/index.php?album=recognition&page=4",
    "36": "http://club.website.com/gallery/index.php?album=recognition&page=5",
    "37": "http://club.website.com/gallery/index.php?album=recognition&page=6",
    "38": "http://club.website.com/gallery/index.php?album=recognition&page=7",
    "39": "http://club.website.com/gallery/index.php?album=recognition&page=9",
    "40": "http://club.website.com/gallery/index.php?album=recognition&page=9",
    "41": "http://club.website.com/gallery/index.php?album=blowout1&page=2",
    "42": "http://club.website.com/gallery/index.php?album=blowout1&page=3",
    "43": "http://club.website.com/gallery/index.php?album=blowout1&page=4",
    "44": "http://club.website.com/gallery/index.php?album=blowout1&page=5",
    "45": "http://club.website.com/gallery/index.php?album=blowout1&page=6",
    "46": "http://club.website.com/gallery/index.php?album=blowout1&page=7",
    "47": "http://club.website.com/gallery/index.php?album=blowout1&page=8",
    "48": "http://club.website.com/gallery/index.php?album=blowout1&page=9",
    "49": "http://club.website.com/gallery/index.php?album=blowout1&page=10"
}

这个对象中的子页面数量是正确的，但问题是：

当有7个或更少的相册页(6个子页)时，脚本工作得很好
当有8个相册页(7个子页)时，脚本工作正常
当有9个相册页(8分页-1当前页，246页.最后一页(9)，脚本加倍第9页
当有10个或更多的专辑页，没有问题。

我不知道我做错了什么。

编辑：

以下是$i_links的源代码

<ul class="pagelist">
    <li class="prev"><span class="disabledlink">« prev</span></li>
    <li class="current"><a href="/gallery/index.php?album=recognition" title="Page 1 (Current Page)">1</a></li>
    <li><a href="/gallery/index.php?album=recognition&amp;page=2" title="Page 2">2</a></li>
    <li><a href="/gallery/index.php?album=recognition&amp;page=3" title="Page 3">3</a></li>
    <li><a href="/gallery/index.php?album=recognition&amp;page=4" title="Page 4">4</a></li>
    <li><a href="/gallery/index.php?album=recognition&amp;page=5" title="Page 5">5</a></li>
    <li><a href="/gallery/index.php?album=recognition&amp;page=6" title="Page 6">6</a></li>
    <li><a href="/gallery/index.php?album=recognition&amp;page=7" title="Page 7">7</a></li>
   <li><a href="/gallery/index.php?album=recognition&amp;page=9" title="Page 9">...</a></li>
    <li class="next"><a href="/gallery/index.php?album=recognition&amp;page=2" title="Next Page">next »</a></li>
</ul>

preg-replace

domdocument

php

foreach

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-02-06 22:18:31

问题在您的上一个嵌套循环中：

//Last Page links appear when greater than 7 pages, so start at 8 ($num_page_links + 1)
for ($count = ($num_page_lnks + 1); $count < ($url_parse['page'] + 1); $count++) {
     array_push($image_page_links,  "http://" . parse_url($url, PHP_URL_HOST) . preg_replace("/[^\=]\d+$/", $count, $link->getAttribute('href')));
 }

当您到达第七个子链接时(带有文本内容“.”)$num_page_lnks变量有值7，$url_parse['page']有值9。因此，将有两个迭代，其中$count变量将被分配给8，然后-使用9。

但是..。这些联系保持不变：

"http://club.website.com/gallery/index.php?album=recognition&page=9"
"http://club.website.com/gallery/index.php?album=recognition&page=9"

因为你的regex模式没有做出预期的替换。

var_dump(preg_replace("/[^\=]\d+$/",8,"/gallery/index.php?album=recognition&amp;page=9"));
// will output:
string(47) "/gallery/index.php?album=recognition&page=9"

将regex模式更改为以下模式：/\d+$/或考虑其他一些逻辑。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/35246198

复制

相似问题

问Foreach循环中preg_replace()的故障
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Foreach循环中preg_replace()的故障EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Foreach循环中preg_replace()的故障
EN