public class PatternTest {
private static Pattern pattern = Pattern.compile("Ben");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("Hello,my name is Ben.");
boolean result = matcher.find();
if (result) {
System.out.println(matcher.groupCount());
for (int i = 0;i <= matcher.groupCount();i++) {
System.out.println(matcher.group());
}
}
}
}
结果
0 Ben
public class PatternTest {
private static Pattern pattern = Pattern.compile("Be.");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("Hello,my name is Ben.");
boolean result = matcher.find();
if (result) {
System.out.println(matcher.groupCount());
for (int i = 0;i <= matcher.groupCount();i++) {
System.out.println(matcher.group());
}
}
}
}
结果
0 Ben
public class PatternTest {
private static Pattern pattern = Pattern.compile("Be..");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("Hello,my name is Ben.");
boolean result = matcher.find();
if (result) {
System.out.println(matcher.groupCount());
for (int i = 0;i <= matcher.groupCount();i++) {
System.out.println(matcher.group());
}
}
}
}
结果
0 Ben.
假设我现在可以匹配出3个值
public class PatternTest {
private static Pattern pattern = Pattern.compile(".e.");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("Hello,my name is Ben.");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
Hel me Ben
我现在只想要Hel,Ben这两个
public class PatternTest {
private static Pattern pattern = Pattern.compile("[HB]e.");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("Hello,my name is Ben.");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
Hel Ben
假设有这么一段字符串"x1.xml s2.xml f3.xml dd.xml d5.xml",我现在要匹配s和d开头的.xml字符
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd].\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s2.xml dd.xml d5.xml
现在我改变了需求,我只需要中间为数字的.xml字符,现在修改如下
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd][0123456789]\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s2.xml d5.xml
当然[0123456789]可以简写为[0-9]
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd][0-9]\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s2.xml d5.xml
而[0-9]又可以写成\d来表示
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd]\d\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s2.xml d5.xml
再将上面的命题改一下,我只需要中间为字母的.xml字符
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd][a-z]\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
dd.xml
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd][0-9a-z]\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s2.xml dd.xml d5.xml
[0-9a-z]又可以写成\w
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd]\w\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s2.xml dd.xml d5.xml
这里需要注意的是\w不仅包括字母和数字还包括_
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd]\w\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml s_.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s2.xml dd.xml d5.xml s_.xml
所以下划线_不在匹配范围的时候请不要使用\w,而是使用[0-9a-zA-Z] (这里包含了大写)
现在有这么一段字符串"x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml",我要匹配以s、d开头的,中间不需要字母的.xml字符
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd][^a-z]\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s2.xml d5.xml s#.xml
当然我也可以需要中间不为数字的.xml字符
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd][^0-9]\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
dd.xml s#.xml
[^0-9]也可以写成\D
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd]\D\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
dd.xml s#.xml
如果我既不要字母也不要数字
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd][^0-9^a-z]\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s#.xml
[^0-9^a-z]也可以写成\W
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd]\W\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s#.xml
注意,\W虽然不包含字母和数字,也不包含_
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd]\W\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml s_.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s#.xml
所以,如果只排除字母和数字而不排除下划线_的情况下依然使用[^0-9^a-z^A-Z] (此处包含了大写)
public class PatternTest {
private static Pattern pattern = Pattern.compile("[sd][^0-9^a-z]\.xml");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml s_.xml");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
s#.xml s_.xml
[]和.都是正则表达式里面的元子符,所以不能直接进行匹配,需要转意
比如有一段javascript代码"var myArray = new Array();if (myArray[0] == 0) {",我们需要匹配出其中数组的[0],如果我们这么写
public class PatternTest {
private static Pattern pattern = Pattern.compile("[0]");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("var myArray = new Array();if (myArray[0] == 0) {");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
0 0
则只会匹配出其中的数字0,而不是[0]本身,所以我们需要修改如下
public class PatternTest {
private static Pattern pattern = Pattern.compile("\[0\]");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("var myArray = new Array();if (myArray[0] == 0) {");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
[0]
当然你要匹配所有的带索引的数组,可以用全数字匹配
public class PatternTest {
private static Pattern pattern = Pattern.compile("\[[0-9]\]");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("var myArray = new Array();if (myArray[0] == 0)" +
" { myArray[1] = 1;");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
[0] [1]
同理\也是一个正则表达式的元字符
public class PatternTest {
private static Pattern pattern = Pattern.compile("\\\");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("homebensales");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
\ \ \ \
当然我们这里说的空白字符并不是说的空格,而是一些特殊的字符
元字符 | 说明 |
---|---|
[b] | 回退(并删除)一个字符(Backspace键) |
\f | 换页符 |
\n | 换行符 |
\r | 回车符 |
\t | 制表符 |
\v | 垂直制表符 |
public class PatternTest {
private static Pattern pattern = Pattern.compile("\r\ntand");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("you are right\r\n\tand good");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
System.out.println("you are right\r\n\tand good");
}
}
结果
and you are right and good
而\s可以代替这里任意一个空白字符
public class PatternTest {
private static Pattern pattern = Pattern.compile("\\s\\s\\sand");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("you are right\r\n\tand good");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
System.out.println("you are right\r\n\tand good");
}
}
结果
and you are right and good
\S代表任意一个非空白字符(空白字符包括空格)
public class PatternTest {
private static Pattern pattern = Pattern.compile("\\Snd");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("you are right\r\n\tand good");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
System.out.println("you are right\r\n\tand good");
}
}
结果
and you are right and good
用a的十六进制0x61来匹配
public class PatternTest {
private static Pattern pattern = Pattern.compile("\\x61..");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("you are 10 years");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
are ars
用a的八进制0o141来匹配
public class PatternTest {
private static Pattern pattern = Pattern.compile("\\0141..");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("you are 10 years");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
are ars
比如说匹配一个电子邮件
public class PatternTest {
private static Pattern pattern = Pattern.compile("\\w+@\\w+\\.\\w+");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("my e-mail is boot@123.com");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
boot@123.com
这里面\\w+表示匹配包括数字,字母,下划线_的多个字符,其中+也是一个元字符,要匹配+本身也需要使用转义字符\+
但如果我把e-mail地址改成这样ben.boot@123.ben.com,匹配结果如何呢
public class PatternTest {
private static Pattern pattern = Pattern.compile("\\w+@\\w+\\.\\w+");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("my e-mail is ben.boot@123.ben.com");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
boot@123.ben
这并不是我们想要的e-mail地址,所以要将正则表达式进行调整
public class PatternTest {
private static Pattern pattern = Pattern.compile("[\\w.]+@[\\w.]+\\.\\w+");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("my e-mail is ben.boot@123.ben.com");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
ben.boot@123.ben.com
[w.]+表示可以匹配包括字母、数字、下划线加.的多个字符,它等同于[w.]+
public class PatternTest {
private static Pattern pattern = Pattern.compile("[\\w\\.]+@[\\w\\.]+\\.\\w+");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("my e-mail is ben.boot@123.ben.com");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
ben.boot@123.ben.com
我现在有一段字符串"@Mr.Li @@Mr.Li Mr.Li",我要把这三种情况都给匹配出来,如果这样写的话
public class PatternTest {
private static Pattern pattern = Pattern.compile("@+[\\w.]+");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("@Mr.Li @@Mr.Li Mr.Li");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
@Mr.Li @@Mr.Li
很明显,它只能匹配出前面两个,而没有@的匹配不出来,现做出修改
public class PatternTest {
private static Pattern pattern = Pattern.compile("@*[\\w.]+");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("@Mr.Li @@Mr.Li Mr.Li");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
@Mr.Li @@Mr.Li Mr.Li
从结果可以看出,*相比于+,它可以允许字符有多个,也可以没有为零个。而+则必须有一个字符。
我现在要匹配两个网址,一个是http的,一个是https的,"http://www.baidu.com/ https://www.baidu.com/",
public class PatternTest {
private static Pattern pattern = Pattern.compile("https*://[\\w./]+");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("http://www.baidu.com/ https://www.baidu.com/");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
http://www.baidu.com/ https://www.baidu.com/
这样写虽然可以把两个都匹配出来,那假如字符串中有httpssssss://www.baidu.com/,但是这一段并不是我要的
public class PatternTest {
private static Pattern pattern = Pattern.compile("https*://[\\w./]+");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("http://www.baidu.com/ https://www.baidu.com/ httpssssss://www.baidu.com/");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
http://www.baidu.com/ https://www.baidu.com/ httpssssss://www.baidu.com/
现修改如下
public class PatternTest {
private static Pattern pattern = Pattern.compile("https?://[\\w./]+");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("http://www.baidu.com/ https://www.baidu.com/ httpssssss://www.baidu.com/");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
http://www.baidu.com/ https://www.baidu.com/
从结果可以看出,?相比于*,它只匹配一个或零个字符,而*可以匹配多个或零个字符。
我们都知道,颜色的RGB值是一个6位的十六进制数,我现在有一个字符串"#336633 #FFFFFF #1123FD335D "
我现在要取前面两个RGB值,而第三个值并不是我们所需要的
public class PatternTest {
private static Pattern pattern = Pattern.compile("#[0-9a-zA-Z]+");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("#336633 #FFFFFF #1123FD335D");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
#336633 #FFFFFF #1123FD335D
很明显用+号会把第三个值也匹配进来,现做出修改
public class PatternTest {
private static Pattern pattern = Pattern.compile("#[0-9a-zA-Z]{6}\\b");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("#336633 #FFFFFF #1123FD335D");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
#336633 #FFFFFF
注意这里如果不以\b的结尾,#1123FD也会被匹配出来,它代表一种单词边界。#[0-9a-zA-Z]{6}的意思就是说,从字母、数字集合中匹配前6个出来。
我们来看匹配日期的一个例子,我们要求年份必须是2位到4位,现有这样的几组格式"4/8/03 10-6-2004 2/2/2 01-01-01"
public class PatternTest {
private static Pattern pattern = Pattern.compile("\\d{1,2}[-/]\\d{1,2}[-/]\\d{2,4}");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("4/8/03 10-6-2004 2/2/2 01-01-01");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
4/8/03 10-6-2004 01-01-01
其中\\d{1,2}的意思为1到2位任意数字以及\\d{2,4}为2到4位任意数字,这里需要注意的是{}可重复的数字可以是0,也就是说?可以等价于{0,1}
假设有一组钱的数字,我们需要匹配出至少上百元的数额,"$496.80 $1290.43 $24.25 $7.61 $414.32 $21.00"
public class PatternTest {
private static Pattern pattern = Pattern.compile("\\$\\d{3,}\\.\\d{2}");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("$496.80 $1290.43 $24.25 $7.61 $414.32 $21.00");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
$496.80 $1290.43 $414.32
其中\\d{3,}表示匹配的数字最少要3个起,最多不限
在HTML文件中有这么一段代码"<B>I like you</B> and <B>I love you</B>",我现在需要匹配<B>和</B>之间。
public class PatternTest {
private static Pattern pattern = Pattern.compile("<B>.*</B>");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("<B>I like you</B> and <B>I love you</B>");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
<B>I like you</B> and <B>I love you</B>
结果它把and也匹配进来了,也就是说它把第一个<B>匹配了最后一个</B>,而我们的本意是两两匹配,并不需要中间的and,现做出修改
public class PatternTest {
private static Pattern pattern = Pattern.compile("<B>.*?</B>");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("<B>I like you</B> and <B>I love you</B>");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
<B>I like you</B> <B>I love you</B>
其原因是+和*都是贪婪性元字符,它们在匹配时的行为模式是多多益善而不是适可而止的。而与之对应的是它们的懒惰型版本,而懒惰型元字符只需要在贪婪型后面加上一个?的后缀即可。
贪婪型元字符 | 懒惰型元字符 |
---|---|
* | *? |
+ | +? |
{n,} | {n,}? |
前面我们说了\b代表单词的边界,但是一个单独的-并不构成一个单词
public class PatternTest {
private static Pattern pattern = Pattern.compile("\\b-\\b");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("passkey color - coded");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
运行是没有任何打印输出的,要匹配这个单独的-,可以修改如下
public class PatternTest {
private static Pattern pattern = Pattern.compile("\\B-\\B");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("passkey color - coded");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
-
由此可见要匹配非单词边界的字符,可以使用\B
现在我们要检测这样一个文件的内容是不是一个正确mybatis的mapper xml文件
"<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
"\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">"
如果我们只是这样去检测的话
public class PatternTest {
private static Pattern pattern = Pattern.compile("<\\?xml.*\\?>");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
"\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
<?xml version="1.0" encoding="UTF-8" ?>
那如果在文件内容的前面随意加了一些字符
public class PatternTest {
private static Pattern pattern = Pattern.compile("<?xml.*\\?>");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("This is bad,real bad! <?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
"\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
<?xml version="1.0" encoding="UTF-8" ?>
同样含有这样的代码,但是整个xml文件的结构就被破坏掉了,它就不再是一个合法的xml文件,修改检测条件如下
public class PatternTest {
private static Pattern pattern = Pattern.compile("^s*<\\?xml.*?>");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("This is bad,real bad! <?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
"\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
运行后没有任何打印结果,说明它不是一个合格的xml文件
public class PatternTest {
private static Pattern pattern = Pattern.compile("^\s*<\?xml.*\?>");
public static void main(String[] args) {
Matcher matcher = pattern.matcher(" <?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
"\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
只有<?xml version="1.0" encoding="UTF-8" ?>位于文件开头的时候,才能说明这是一个合格的xml文件,即便前面有几个空白符号,都是可以认可的。
所以^在这里是作为一个字符串的开头符而存在的
当然还有相对应的结尾符
public class PatternTest {
private static Pattern pattern = Pattern.compile("\"http:.*.dtd\">\s*$");
public static void main(String[] args) {
Matcher matcher = pattern.matcher(" <?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
"\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
"http://mybatis.org/dtd/mybatis-3-mapper.dtd">
\\s*$在这里是作为字符串的结尾符来处理的
如果在结尾处增加其他字符(非空白字符)将无法匹配
public class PatternTest {
private static Pattern pattern = Pattern.compile("\"http:.*\\.dtd\">\\s*$");
public static void main(String[] args) {
Matcher matcher = pattern.matcher(" <?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
"<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
"\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">This is bad,really bad");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
此时结果没有任何输出。
我现在要匹配一段代码所有的带//的注释以及注释前面的空格
public class PatternTest {
private static Pattern pattern = Pattern.compile("(?m)^\\s*//.*$");
public static void main(String[] args) {
Matcher matcher = pattern.matcher("//这是一个开头\n" +
" public void print() {\n" +
" System.out.println("I am in Boot ClassLoader\");\n" +
" }\n" +
" //这是一个结尾");
List<String> list = new ArrayList<>();
while (matcher.find()) {
list.add(matcher.group());
}
list.stream().forEach(System.out::println);
}
}
结果
//这是一个开头 //这是一个结尾
(?m)带上^以及$,^代表对每一行的开头和$代表每一行的结尾结尾