blocks|key|1119002|text|将所有单词组合在一个正则表达式中以一次性替换所有内容如何？我不确定它的性能如何，但它可能会更快。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1119003|例如。|1119004|preg_replace('/('+.+implode('%7C',+$badwords)+.+')/i',+'',+$text);|code-block|syntax|javascript|1119005|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|L|8|@]|9|@]|A|$]]|$1|D|3|E|5|F|7|M|8|@]|9|@]|A|$G|H]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

How about combining all the words in a regex to replace everything in one go? I'm not sure how it will go for performance but it might be faster.

E.g.

<pre><code>preg_replace('/(' . implode('|', $badwords) . ')/i', '', $text);
</code></pre>

blocks|key|1119026|text|定义“慢”？任何要处理30,000篇文章的工作都可能需要一些时间才能完成。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1119027|也就是说，一种选择(我还没有对其进行基准测试，只是将其抛出以供考虑)是将单词组合到正则表达式中，并通过preg_replace运行该正则表达式(只需使用%7C运算符将它们放在一起)。|offset|length|style|CODE|1119028|entityMap^0|0|24|1|0^^$0|@$1|2|3|4|5|6|7|J|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|K|8|@$D|L|E|M|F|G]]|9|@]|A|$]]|$1|H|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|I|$]]

Define "slow"? Anything that's going to be processing 30,000 articles is probably going to take a bit of time to complete.

That said, one option (which I have not benchmarked, just tossing it out there for consideration) would be to combine the words into a regex and run that through preg_replace (just using the <code>|</code> operator to put them together).

blocks|key|4830483|text|我以前在当地的报社工作。我所做的不是修改文本来删除原始文件中的坏词，而是在用户请求查看文章时运行过滤器。这样，如果您需要原始文本，您可以保留它，但也可以为您的观众提供一个干净的版本。应该没有必要一次处理30,000篇文章，除非我误解了什么。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4830484|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|D|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|E|8|@]|9|@]|A|$]]]|C|$]]

i used to work at my local newspaper office. instead of modifying the text to delete badwords from the original files, what i did was just run a filter when a user requested to view the article. this way you preserve the original text should you ever need it, but also dish out a clean version for your viewers. there should be no need to process 30,000 articles at once unless i am misunderstanding something.

blocks|key|1119036|text|如果前面的这些问题有用：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1119037|1119038|How+do+you+implement+a+good+profanity+filter?|unordered-list-item|offset|length|1119039|How+do+I+replace+bad+words+with+php?|1119040|Blacklist+of+words+on+content+to+filter+message.|1119041|Trouble+with+simple+PHP+profanity+filter|1119042|1119043|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/questions/273516/how-do-you-implement-a-good-profanity-filter|1|https://stackoverflow.com/questions/1020451/how-do-i-replace-bad-words-with-php|2|https://stackoverflow.com/questions/1327112/blacklist-of-words-on-content-to-filter-message|3|https://stackoverflow.com/questions/1215929/trouble-with-simple-php-profanity-filter^0|0|0|0|19|0|0|0|10|1|0|0|1C|2|0|0|14|3|0|0^^$0|@$1|2|3|4|5|6|7|12|8|@]|9|@]|A|$]]|$1|B|3|-4|5|6|7|13|8|@]|9|@]|A|$]]|$1|C|3|D|5|E|7|14|8|@]|9|@$F|15|G|16|1|17]]|A|$]]|$1|H|3|I|5|E|7|18|8|@]|9|@$F|19|G|1A|1|1B]]|A|$]]|$1|J|3|K|5|E|7|1C|8|@]|9|@$F|1D|G|1E|1|1F]]|A|$]]|$1|L|3|M|5|E|7|1G|8|@]|9|@$F|1H|G|1I|1|1J]]|A|$]]|$1|N|3|-4|5|6|7|1K|8|@]|9|@]|A|$]]|$1|O|3|-4|5|6|7|1L|8|@]|9|@]|A|$]]]|P|$Q|$5|R|S|T|A|$U|V]]|W|$5|R|S|T|A|$U|X]]|Y|$5|R|S|T|A|$U|Z]]|10|$5|R|S|T|A|$U|11]]]]

In case these previous questions are useful:

<ul>
<li><a href="https://stackoverflow.com/questions/273516/how-do-you-implement-a-good-profanity-filter">How do you implement a good
profanity filter?</a></li>
<li><a href="https://stackoverflow.com/questions/1020451/how-do-i-replace-bad-words-with-php">How do I replace bad words with
php?</a></li>
<li><a href="https://stackoverflow.com/questions/1327112/blacklist-of-words-on-content-to-filter-message">Blacklist of words on content to
filter message.</a></li>
<li><a href="https://stackoverflow.com/questions/1215929/trouble-with-simple-php-profanity-filter">Trouble with simple PHP profanity
filter</a></li>
</ul>

The subject is probably not as clear as it could be, but I was struggling to think of a better way to easily describe it.

I am implementing a badword filter on some articles that we pick up from an XML feed. At the moment I have the badwords in an array and simply check the text like so;

<pre><code>str_replace($badwords, '', $text, $count); 
if ($count &gt; 0) // We have bad words... 
</code></pre>

But this is SLOW! So slow! And when I am trying to process 30,000+ articles at a time, I start wondering if there is a better way to achieve this. If only strpos supported arrays! Even then I dont think it'd be faster... 

I'd love any suggestions. Thanks in advance!

EDIT:

I have now tested a few methods between calls to microtime() to time them. 
str_replace() = 990 seconds
preg_match() = 1029 seconds (Remember I only need to identify them, not replace them)
no bad word filtering = 1057 seconds (presumably because it has another thousand or so bad-worded articles to process.

Thanks for all the answers, I will just still with str_replace. :)

Fast way to match an array of words with a block of text?

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

这个主题可能不像它可能的那样清晰，但我正在努力想出一种更好的方式来描述它。我正在对我们从XML提要中提取的一些文章实现一个坏词过滤器。目前，我将坏词放入数组中，简单地检查文本，如下所示；str_replace($badwords, '', $text, $count); if ($count > 0) // We ha...

问将单词数组与文本块进行匹配的快速方法？
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将单词数组与文本块进行匹配的快速方法？EN