我正在尝试想出一个函数,它可以很好地清理某些字符串,这样它们就可以安全地在URL中使用(就像post slug一样),也可以安全地用作文件名。例如,当有人上传文件时,我希望确保删除名称中的所有危险字符。
到目前为止,我已经想出了下面的函数,我希望它能解决这个问题,并允许国外的UTF-8数据。
/**
* Convert a string to the file/URL safe "slug" form
*
* @param string $string the string to clean
* @param bool $is_filename TRUE will allow additional filename characters
* @return string
*/
function sanitize($string = '', $is_filename = FALSE)
{
// Replace all weird characters with dashes
$string = preg_replace('/[^\w\-'. ($is_filename ? '~_\.' : ''). ']+/u', '-', $string);
// Only allow one dash separator at a time (and make string lowercase)
return mb_strtolower(preg_replace('/--+/u', '-', $string), 'UTF-8');
}
有没有人有什么棘手的样本数据可以用来对比一下--或者有没有更好的方法来保护我们的应用程序不受坏名声的影响?
$is-filename允许一些额外的字符,如临时vim文件
更新:删除了星号,因为我想不出一个有效的用法
发布于 2010-04-23 09:00:24
对您的解决方案的一些观察:
模式末尾的
"word“字符是任何字母或数字或下划线字符,也就是说,可以是Perl "word”的任何字符。字母和数字的定义由PCRE的字符表控制,如果发生特定于区域设置的匹配,则可能会有所不同。例如,在"fr“(法语)区域设置中,一些大于128的字符代码用于重音字母,这些字符代码由\w.
匹配
创建slug
你可能不应该在你的帖子段塞中包含重音等字符,因为从技术上讲,它们应该是百分比编码的(根据URL编码规则),这样你就会有丑陋的URL。
因此,如果我是你,在小写之后,我会将任何“特殊”字符转换为它们的等效字符(例如é-> e),并将非a-z字符替换为“-”,限制为单个“-”的运行,就像你所做的那样。这里有一个转换特殊字符的实现:https://web.archive.org/web/20130208144021/http://neo22s.com/slug
一般情况下的卫生
OWASP有一个企业安全API的PHP实现,其中包括用于安全编码和解码应用程序中的输入和输出的方法。
编码器接口提供:
canonicalize (string $input, [bool $strict = true])
decodeFromBase64 (string $input)
decodeFromURL (string $input)
encodeForBase64 (string $input, [bool $wrap = false])
encodeForCSS (string $input)
encodeForHTML (string $input)
encodeForHTMLAttribute (string $input)
encodeForJavaScript (string $input)
encodeForOS (Codec $codec, string $input)
encodeForSQL (Codec $codec, string $input)
encodeForURL (string $input)
encodeForVBScript (string $input)
encodeForXML (string $input)
encodeForXMLAttribute (string $input)
encodeForXPath (string $input)
https://github.com/OWASP/PHP-ESAPI https://www.owasp.org/index.php/Category:OWASP_Enterprise_Security_API
发布于 2010-04-20 00:06:15
我在Chyrp代码中发现了这个更大的函数:
/**
* Function: sanitize
* Returns a sanitized string, typically for URLs.
*
* Parameters:
* $string - The string to sanitize.
* $force_lowercase - Force the string to lowercase?
* $anal - If set to *true*, will remove all non-alphanumeric characters.
*/
function sanitize($string, $force_lowercase = true, $anal = false) {
$strip = array("~", "`", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "=", "+", "[", "{", "]",
"}", "\\", "|", ";", ":", "\"", "'", "‘", "’", "“", "”", "–", "—",
"—", "–", ",", "<", ".", ">", "/", "?");
$clean = trim(str_replace($strip, "", strip_tags($string)));
$clean = preg_replace('/\s+/', "-", $clean);
$clean = ($anal) ? preg_replace("/[^a-zA-Z0-9]/", "", $clean) : $clean ;
return ($force_lowercase) ?
(function_exists('mb_strtolower')) ?
mb_strtolower($clean, 'UTF-8') :
strtolower($clean) :
$clean;
}
这是wordpress代码中的一个
/**
* Sanitizes a filename replacing whitespace with dashes
*
* Removes special characters that are illegal in filenames on certain
* operating systems and special characters requiring special escaping
* to manipulate at the command line. Replaces spaces and consecutive
* dashes with a single dash. Trim period, dash and underscore from beginning
* and end of filename.
*
* @since 2.1.0
*
* @param string $filename The filename to be sanitized
* @return string The sanitized filename
*/
function sanitize_file_name( $filename ) {
$filename_raw = $filename;
$special_chars = array("?", "[", "]", "/", "\\", "=", "<", ">", ":", ";", ",", "'", "\"", "&", "$", "#", "*", "(", ")", "|", "~", "`", "!", "{", "}");
$special_chars = apply_filters('sanitize_file_name_chars', $special_chars, $filename_raw);
$filename = str_replace($special_chars, '', $filename);
$filename = preg_replace('/[\s-]+/', '-', $filename);
$filename = trim($filename, '.-_');
return apply_filters('sanitize_file_name', $filename, $filename_raw);
}
更新2012年9月
Alix Axel在这一领域做了一些令人难以置信的工作。他的函数框架包括几个很好的文本过滤器和转换。
发布于 2010-04-28 16:00:50
这将使您的文件名安全...
$string = preg_replace(array('/\s/', '/\.[\.]+/', '/[^\w_\.\-]/'), array('_', '.', ''), $string);
一个更深层次的解决方案是:
// Remove special accented characters - ie. sí.
$clean_name = strtr($string, array('Š' => 'S','Ž' => 'Z','š' => 's','ž' => 'z','Ÿ' => 'Y','À' => 'A','Á' => 'A','Â' => 'A','Ã' => 'A','Ä' => 'A','Å' => 'A','Ç' => 'C','È' => 'E','É' => 'E','Ê' => 'E','Ë' => 'E','Ì' => 'I','Í' => 'I','Î' => 'I','Ï' => 'I','Ñ' => 'N','Ò' => 'O','Ó' => 'O','Ô' => 'O','Õ' => 'O','Ö' => 'O','Ø' => 'O','Ù' => 'U','Ú' => 'U','Û' => 'U','Ü' => 'U','Ý' => 'Y','à' => 'a','á' => 'a','â' => 'a','ã' => 'a','ä' => 'a','å' => 'a','ç' => 'c','è' => 'e','é' => 'e','ê' => 'e','ë' => 'e','ì' => 'i','í' => 'i','î' => 'i','ï' => 'i','ñ' => 'n','ò' => 'o','ó' => 'o','ô' => 'o','õ' => 'o','ö' => 'o','ø' => 'o','ù' => 'u','ú' => 'u','û' => 'u','ü' => 'u','ý' => 'y','ÿ' => 'y'));
$clean_name = strtr($clean_name, array('Þ' => 'TH', 'þ' => 'th', 'Ð' => 'DH', 'ð' => 'dh', 'ß' => 'ss', 'Œ' => 'OE', 'œ' => 'oe', 'Æ' => 'AE', 'æ' => 'ae', 'µ' => 'u'));
$clean_name = preg_replace(array('/\s/', '/\.[\.]+/', '/[^\w_\.\-]/'), array('_', '.', ''), $clean_name);
这假设您需要在文件名中使用一个点。如果您想将其转换为小写,只需使用
$clean_name = strtolower($clean_name);
最后一行。
https://stackoverflow.com/questions/2668854
复制相似问题