我正在编写一些代码,用于从一个非常大的数据集中解析日期。我有以下正则表达式来匹配不同的日期变化
"(((0?[1-9]|1[012])(/|-)(0?[1-9]|[12][0-9]|3[01])(/|-))|"
+"((january|february|march|april|may|june|july|august|september|october|november|december)"
+ "\\s*(0?[1-9]|[12][0-9]|3[01])(th|rd|nd|st)?,*\\s*))((19|20)\\d\\d)"
匹配格式为'Month dd,yyyy‘、'mm/dd/yyyy’和'mm-dd-yyyy‘的日期。这对这些格式很有效,但是我现在遇到了欧洲的'dd月,yyyy‘格式的日期。我尝试添加(\d{1,2})?在正则表达式的开头添加一个?正则表达式的当天匹配部分之后的限定符
"((\\d{1,2})?((0?[1-9]|1[012])(/|-)(0?[1-9]|[12][0-9]|3[01])(/|-))|"
+"((january|february|march|april|may|june|july|august|september|october|november|december)"
+ "\\s*(0?[1-9]|[12][0-9]|3[01])?(th|rd|nd|st)?,*\\s*))((19|20)\\d\\d)"
但这并不完全可行,因为它有时会捕获月份之前和之后的数字字符(例如。'00 1月15日,2013'),有时两者都没有(‘2013年1月’)。有没有办法确保恰好捕获到其中的一个呢?
发布于 2014-08-02 02:32:06
为您的需求提供一个Java实现(从inpiut文本中搜索日期):
String input = "which matches dates of format 'january 31, 1976', '9/18/2013', "
+ "and '11-20-1988'. This works fine for those formats, but I'm now encountering dates" +
"in the European '26th May, 2020' format. I tried adding (\\d{1,2})? at the"+
"beginning of the regex and adding a ? quantifier after the current day matching section of the regex as such";
String months_t = "(january|february|march|april|may|june|july|august|september|october|november|december)";
String months_d = "(1[012]|0?[1-9])";
String days_d = "(3[01]|[12][0-9]|0?[1-9])"; //"\\d{1,2}";
String year_d = "((19|20)\\d\\d)";
String days_d_a = "(" + days_d + "(th|rd|nd|st)?)";
// 'mm/dd/yyyy', and 'mm-dd-yyyy'
String regexp1 = "(" + months_d + "[/-]" + days_d + "[/-]"
+ year_d + ")";
// 'Month dd, yyyy', and 'dd Month, yyyy'
String regexp2 = "(((" + months_t + "\\s*" + days_d_a + ")|("
+ days_d_a + "\\s*" + months_t + "))[,\\s]+" + year_d + ")";
String regexp = "(?i)" + regexp1 + "|" + regexp2;
Pattern pMod = Pattern.compile(regexp);
Matcher mMod = pMod.matcher(input);
while (mMod.find()) {
System.out.println(mMod.group(0));
}
输出为:
january 31, 1976
9/18/2013
11-20-1988
26th May, 2020
https://stackoverflow.com/questions/24916882
复制相似问题