当我以如下方式运行文件时,此代码在终端中运行正常
$php webcrawler.php
然而,我很好奇我需要做些什么才能让它在控制台中指定的URL上运行。
$php webcrawler.php http://samplesite.com
以下是我到目前为止拥有的完整代码:
class Ga_track
{
function get_ga_implemented($url)
{
$options = array(
CURLOPT_RETURNTRANSFER => TRUE, // return web page
CURLOPT_HEADER => TRUE, // don't return headers
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; WOW64)", // who am i
CURLOPT_SSL_VERIFYHOST => FALSE, //ssl verify host
CURLOPT_SSL_VERIFYPEER => FALSE, //ssl verify peer
CURLOPT_NOBODY => FALSE
);
$ch = curl_init($url);
curl_setopt_array($ch, $options);
//2> Grab content of the url using CURL
$content = curl_exec($ch);
$flag1_trackpage = false; //FLag for the phrase '_trackPageview'
$flag2_ga_js = false; //FLag for the phrase 'ga.js'
// Script Regex
$script_regex = "/<script\b[^>]*>([\s\S]*?)<\/script>/i";
// UA_ID Regex
$ua_regex = "/UA-[0-9]{5,}-[0-9]{1,}/";
preg_match_all($script_regex, $content, $inside_script);
for ($i = 0; $i < count($inside_script[0]); $i++)
{
if (stristr($inside_script[0][$i], "ga.js"))
$flag2_ga_js = TRUE;
if (stristr($inside_script[0][$i], "_trackPageview"))
$flag1_trackpage = TRUE;
}
preg_match_all($ua_regex, $content, $ua_id);
//6> Check whether all 3 word phrases are present or not.
if ($flag2_ga_js && $flag1_trackpage && count($ua_id > 0))
return ($ua_id);
else
return (NULL);
}
}
$ga_obj = new Ga_track();
$url = "http://www.samplesite.com";
$ua_id = $ga_obj->get_ga_implemented($url);
if ($ua_id == NULL)
{
echo "USING GA: NO\r\n";
}
else
{
echo "USING GA: YES\r\n";
}
发布于 2015-05-06 04:14:01
因此,我找到了在命令行中使用URL的解决方案。我必须使用cli并传入参数。我在阅读内容时仍然有一些问题,但这是另一个问题。以下是更新后的代码:
class track {
function __construct($arg1, $arg2, $arg3) {
if(!$this->isCli()) die("Please use php-cli!");
if (!function_exists('curl_init')) die("Please install cURL!");
$this->hp = $arg1;
$this->rlevel = $arg2;
$this->rmax = $arg3;
}
function isCli() {
return php_sapi_name()==="cli";
}
function getContent() {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $this->hp);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec($ch);
curl_close($ch);
return $content;
}
function checkGA($content) {
$flag_ga = false;
$script_regex = "/<script\b[^>]*>([\s\S]*?)<\/script>/i";
preg_match_all($script_regex, $content, $inside_script);
for ($i = 0; $i < count($inside_script[0]); $i++)
{
if (stristr($inside_script[0][$i], "ga.js"))
$flag_ga = TRUE;
else
return (NULL);
}
}
}
/*
*
* Echo Output
*
*/
$track_obj = new track();
$ga = $track_obj->checkGA($content);
if ($ga == NULL) {
echo "USING GA: NO\r\n";
}
else {
echo "USING GA: YES\r\n";
}
https://stackoverflow.com/questions/30044101
复制相似问题