使用RegEx在引号中检测引用

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (2)
  • 关注 (0)
  • 查看 (61)

我正在寻找一种方法来检测和删除引号中的引号,例如:"something "something something" something"

在上面的例子中,你可以看到斜体的东西用双引号括起来。我想从这些外部引号中去掉内部字符串。

因此,该表达式应该简单地在它们之间加上一段文本加上另一组文本包装文本来查找引号,然后删除包裹最后一段的引号。

这是我现在的代码(php):

    preg_match_all('/".*(".*").*"/', $text, $matches);
    if(is_array($matches[0])){
        foreach($matches[0] as $match){
            $text = str_replace($match, '"' . str_replace('"', '', $match) . '"', $text);
        }
    }
提问于
用户回答回答于
<?php

$data = <<<DATA
something "something "something something" something" something
DATA;

# set up the needed variables
$needle = '"';
$lastPos = 0;
$positions = array();

# find all quotes
while (($lastPos = strpos($data, $needle, $lastPos)) !== false) {
    $positions[] = $lastPos;
    $lastPos = $lastPos + strlen($needle);
}

# replace them if there are more than 2
if (count($positions) > 2) {
    for ($i=1;$i<count($positions)-1;$i++) {
        $data[$positions[$i]] = "";
    }
}

# check the result
echo $data;
?>

产生了

something "something something something something" something

甚至

class unquote {
    # set up the needed variables
    var $data = "";
    var $needle = "";
    var $positions = array();

    function cleanData($string="", $needle = '"') {
        $this->data = $string;
        $this->needle = $needle;
        $this->searchPositions();
        $this->replace();
        return $this->data;
    }

    private function searchPositions() {
        $lastPos = 0;
        # find all quotes
        while (($lastPos = strpos($this->data, $this->needle, $lastPos)) !== false) {
            $this->positions[] = $lastPos;
            $lastPos = $lastPos + strlen($this->needle);
        }
    }

    private function replace() {
        # replace them if there are more than 2
        if (count($this->positions) > 2) {
            for ($i=1;$i<count($this->positions)-1;$i++) {
                $this->data[$this->positions[$i]] = "";
            }
        }

    }
}

然后用

$q = new unquote();
$data = $q->cleanData($data);
用户回答回答于

如果字符串以a开头,"并且字符串中的双引号始终保持平衡,则可以使用:

^"(*SKIP)(*F)|"([^"]*)"

这将匹配字符串开头的双引号,然后使用SKIP FAIL跳过该匹配。然后它会匹配",在一个组中捕获什么,"然后"再匹配一次。

在替换中,你可以使用捕获组1 $1

$pattern = '/^"(*SKIP)(*F)|"([^"]+)"/';
$str = "\"something \"something something\" and then \"something\" something\"";
echo preg_replace($pattern, "$1", $str); 

扫码关注云+社区

领取腾讯云代金券