我正在编写一个regex以查找与文本文件中的Unicode char匹配的行。
!Regex.IsMatch(colCount.line, @"^"[\p{IsBasicLatin}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]"+$")下面是我所写的完整代码
var _fileName = @"C:\text.txt";
BadLinesLst = File
.ReadLines(_fileName, Encoding.UTF8)
.Select((line, index) =>
{
var count = line.Count(c => Delimiter == c) + 1;
if (NumberOfColumns < 0)
NumberOfColumns = count;
return new
{
line = line,
count = count,
index = index
};
})
.Where(colCount => colCount.count != NumberOfColumns || (Regex.IsMatch(colCount.line, @"[^\p{IsBasicLatin}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]")))
.Select(colCount => colCount.line).ToList();文件包含下面的行
264162-03,66,JITK,2007,12,874.000 ,0.000 ,0.000
6420œ50-00,67,JITK,2007,12,2292.000 ,0.000 ,0.000
4804元75-00,67,JITK,2007,12,1810.000 ,0.000 ,0.000
如果行的文件包含除BasicLatin或LatinExtended或LatinExtended之外的任何其他字符,那么我需要得到这些行。上面的Regex没有正常工作,这也显示了那些包含LatinExtended A或B的行
发布于 2016-06-23 09:19:54
只需将Unicode类别类放入negated character class中即可。
if (Regex.IsMatch(colCount.line,
@"[^\p{IsBasicLatin}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]"))
{ /* Do sth here */ }此正则表达式将找到部分匹配(因为Regex.IsMatch在较大的字符串中找到模式匹配)。该模式将匹配\p{IsBasicLatin}、\p{IsLatinExtended-A}和\p{IsLatinExtended-B} Unicode分类集中的字符以外的任何字符。
您还可以检查以下代码:
if (Regex.IsMatch(colCount.line,
@"^[^\p{IsBasicLatin}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]*$"))
{ /* Do sth here */ }如果整个colCount.line字符串不包含在否定字符类中指定的3个Unicode类别类中的任何字符,或者-如果字符串是空的(如果您想不允许取空字符串,请将*替换为+ ),则返回true。
https://stackoverflow.com/questions/37987284
复制相似问题