我想使用微软的python中的Presidio库来识别我的自定义模式。在传递正则表达式时,我得到了这个错误。AttributeError:“str”对象没有特性“regex”
from presidio_analyzer import PatternRecognizer
regex = ("^[2-9]{1}[0-9]{3}\\" +
"s[0-9]{4}\\s[0-9]{4}$")
#p = re.compile(regex)
aadhar_number_recognizer = PatternRecognizer(supported_entity="AADHAR_NUMBER",
patterns=[regex])```发布于 2021-04-19 16:35:22
PatternRecognizer接收'Pattern‘对象的列表作为'patterns’参数。您正在传递纯正则表达式字符串。
应该是:
from presidio_analyzer import PatternRecognizer, Pattern
aadhar_number_recognizer = PatternRecognizer(supported_entity = "AADHAR_NUMBER",
deny_list=[],
patterns=[Pattern(name="AADHAR Number", score=0.8,
regex="(^[2-9]{1}[0-9]{3}\\s[0-9]{4}\\s[0-9]{4}$)")],
context=[])要获得更多参考,您可以查看Presidio如何在其内置识别器中实现PatternRecognizer。
发布于 2021-12-06 09:05:25
我正在做这件事。虽然我使用的是不同的正则表达式,但您可以使用下面的代码作为模板。
在这个例子中,我使用了基本句子:"Hi,Java太棒了“
通过使用Presidio自定义Regex,它将被“匿名”为:"Hi,Python太棒了“
下面的代码只是一个例子,如果你只是想用"Python“替换"Java”,还有更简单的方法。这只是我想到的第一件事。在匿名化时,将"Java“或"Python”替换为像这样的东西更有意义。
from presidio_analyzer import PatternRecognizer, Pattern
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig
base_sentence = "Hi, Java is awesome!"
# Define the regex pattern in a Presidio `Pattern` object:
java_pattern = Pattern(name="java_pattern",regex="Java", score = 0.5)
# Define the recognizer with one or more patterns
java_pattern = PatternRecognizer(supported_entity="JAVA", patterns = [java_pattern])
java_pattern_result = java_pattern.analyze(text=base_sentence, entities=["JAVA"])
print("Sentence:", base_sentence)
print("Found:", java_pattern_result)
print()
# Now anonymize
# Initialize the engine:
engine = AnonymizerEngine()
anonymize_result = engine.anonymize(
text=base_sentence,
analyzer_results=java_pattern_result,
operators={"JAVA":OperatorConfig("replace",
{"new_value": "Python"})})
print("Anonymized result:")
print(anonymize_result)这将打印以下内容:
Sentence: Hi, Java is awesome!
Found: [type: JAVA, start: 4, end: 8, score: 0.5]
Anonymized result:
text: Hi, Python is awesome!
items:
[
{'start': 4, 'end': 10, 'entity_type': 'JAVA', 'text': 'Python', 'operator': 'replace'}
]发布于 2021-03-23 18:44:01
尝试以下操作:
from presidio_analyzer import PatternRecognizer
aadhar_number_recognizer = PatternRecognizer(supported_entity = "AADHAR_NUMBER",
deny_list = [],
patterns = [r'(^[2-9]{1}[0-9]{3}\\s[0-9]{4}\\s[0-9]{4}$)'],
context = [])根据以下网址的接口:https://microsoft.github.io/presidio/api/analyzer_python/#patternrecognizer
字段patterns、deny_list和context都是必填字段。
https://stackoverflow.com/questions/66761052
复制相似问题