首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >其他字符串附近的字符串的正则表达式?

其他字符串附近的字符串的正则表达式?
EN

Stack Overflow用户
提问于 2018-09-11 02:11:02
回答 1查看 111关注 0票数 0

我想为grep编写一个灵活的正则表达式,它将返回在一定距离内找到的搜索词。

理想的行为类似于研究数据库;例如,您可以搜索capitalGDP彼此之间不超过15个单词的文章,这将包括字符串capitalGDP可以由5、6、7等字母数字字符串分隔的文章,这些字符串的长度未指定。regex语句将包括标点符号(例如,逗号、句点、连字符),但也包括重音符号和变音符号。因此,结果中chechèlavi之间的距离不超过五个字符串。

我想声明中会涉及到lookaheads,以及像{1,15}这样的短语,或者可能通过一个grep通过另一个grep来传输,但这就失去了GREP_OPTIONS='--color=auto'的好处。构建它真的超出了我的技能范围。我有一组要运行搜索的.txt文档,但是使正则表达式能够灵活地更改字符串之间的距离或截断术语,对于拥有fieldnotes或以标准格式阅读笔记的其他人也很有用。

编辑

下面是摘自“圣经”的一段话。

代码语言:javascript
复制
Ye shall buy meat of them for money, that ye may eat; and ye shall also buy water of them for money, that ye may drink. For the Lord thy God hath blessed thee in all the works of thy hand: he knoweth thy walking through this great wilderness: these forty years the Lord thy God hath been with thee; thou hast lacked nothing... Thou shalt sell me meat for money, that I may eat; and give me water for money, that I may drink: only I will pass through on my feet: (as the children of Esau which dwell in Seir, and the Moabites which dwell in Ar, did unto me:) until I shall pass over Jordan into the land which the Lord our God giveth us. But Sihon king of Heshbon would not let us pass by him: for the Lord thy God hardened his spirit, and made his heart obstinate, that he might deliver him into thy hand, as appeareth this day. And the Lord said unto me, Behold, I have begun to give Sihon and his land before thee: begin to possess, that thou mayest inherit his land. Then Sihon came out against us, he and all his people, to fight at Jahaz. And the Lord our God delivered him before us; and we smote him, and his sons, and all his people. And if the way be too long for thee, so that thou art not able to carry it; or if the place be too far from thee, which the Lord thy God shall choose to set his name there, when the Lord thy God hath blessed thee: then shalt thou turn it into money, and bind up the money in thine hand, and shalt go unto the place which the Lord thy God shall choose: and thou shalt bestow that money for whatsoever thy soul lusteth after, for oxen, or for sheep, or for wine, or for strong drink, or for whatsoever thy soul desireth: and thou shalt eat there before the Lord thy God, and thou shalt rejoice, thou, and thine household, and the Levite that is within thy gates; thou shalt not forsake him: for he hath no part nor inheritance with thee... Now it came to pass, that at what time the chest was brought unto the king’s office by the hand of the Levites, and when they saw that there was much money, the king’s scribe and the high priest’s officer came and emptied the chest, and took it, and carried it to his place again. Thus they did day by day, and gathered money in abundance. And when they had finished it, they brought the rest of the money before the king and Jehoiada, whereof were made vessels for the house of the Lord , even vessels to minister, and to offer withal, and spoons, and vessels of gold and silver. And they offered burnt offerings in the house of the Lord continually all the days of Jehoiada. Thou hast bought me no sweet cane with money, neither hast thou filled me with the fat of thy sacrifices; but thou hast made me to serve with thy sins, thou hast wearied me with thine iniquities... Howbeit there were not made for the house of the Lord bowls of silver, snuffers, basins, trumpets, any vessels of gold, or vessels of silver, of the money that was brought into the house of the Lord: but they gave that to the workmen, and repaired therewith the house of the Lord. Moreover they reckoned not with the men, into whose hand they delivered the money to be bestowed on workmen: for they dealt faithfully. The trespass money and sin money was not brought into the house of the Lord: it was the priests’.

如果我希望grep显示shaltmoney在五个单词内(包括标点符号)同时出现的情况,我该如何编写正则表达式呢?

我不确定如何给出预期的结果,因为grep --context=1将包含更多的字符串,其中包含0-5个字符串,但我认为结果将标识:

代码语言:javascript
复制
shalt sell me meat for money
shalt thou turn it into money
money in thine hand, and shalt
shalt bestow that money

但不会返回shall buy meat of them for money,,因为'money‘显示为第六个字符串。

EN

回答 1

Stack Overflow用户

发布于 2018-09-11 08:30:17

好吧,它不是grep,但它似乎做了你所要求的使用GNU awk来处理多字符RS和单词边界:

代码语言:javascript
复制
$ cat tst.awk
BEGIN {
    RS="^$"
    split(words,word)
}
{
    gsub(/@/,"@A"); gsub(/{/,"@B"); gsub(/}/,"@C")
    gsub("\\<"word[1]"\\>","{")
    gsub("\\<"word[2]"\\>","}")
    while ( match($0,/{[^{}]+}|}[^{}]+{/) ) {
        tgt =  substr($0,RSTART,RLENGTH)
        gsub(/}/,word[2],tgt)
        gsub(/{/,word[1],tgt)
        gsub(/@C/,"}",tgt); gsub(/@B/,"{",tgt); gsub(/@A/,"@",tgt)
        if ( gsub(/[[:space:]]+/,"&",tgt) <= range ) {
            print tgt
        }
        $0 = substr($0,RSTART+length(word[1]))
    }
}

$ awk -v words='money shalt' -v range=5 -f tst.awk file
shalt sell me meat for money
shalt thou turn it into money
money in thine hand, and shalt
shalt bestow that money

$ awk -v words='and him' -v range=10 -f tst.awk file
him: for the Lord thy God hardened his spirit, and
and made his heart obstinate, that he might deliver him
him before us; and
and we smote him
him, and

请注意,即使在像shalt sell me meat for money in thine hand, and shalt这样的输入中,其中一个单词(money)出现在另一个单词(shalt)第一次出现后5个单词,以及该第一个单词第二次出现(再次出现,shalt)之前5个单词时,上述方法仍然有效:

代码语言:javascript
复制
$  echo 'shalt sell me meat for money in thine hand, and shalt' |
    awk -v words='shalt money' -v range=5 -f tst.awk
shalt sell me meat for money
money in thine hand, and shalt

对于颜色、文件名和行号:

执行此操作可在终端中查看可用的颜色(每行将以不同的颜色输出):

代码语言:javascript
复制
$ for ((c=0; c<$(tput colors); c++)); do tput setaf "$c"; tput setaf "$c" | cat -v; echo "=$c"; done; tput setaf 0
^[[30m=0
^[[31m=1
^[[32m=2
^[[33m=3
^[[34m=4
^[[35m=5
^[[36m=6
^[[37m=7

现在您可以了解这些转义序列和数字的含义,将awk脚本更新为(\033 = ^[ = Esc):

代码语言:javascript
复制
$ cat tst.awk
BEGIN {
    RS="^$"
    split(words,word)
    c["black"]  = "\033[30m"
    c["red"]    = "\033[31m"
    c["green"]  = "\033[32m"
    c["yellow"] = "\033[33m"
    c["blue"]   = "\033[34m"
    c["pink"]   = "\033[35m"
    c["teal"]   = "\033[36m"
    c["grey"]   = "\033[37m"
    for (color in c) {
        print c[color] color c["black"]
    }
}
{
    gsub(/@/,"@A"); gsub(/{/,"@B"); gsub(/}/,"@C")
    gsub("\\<"word[1]"\\>","{")
    gsub("\\<"word[2]"\\>","}")
    while ( match($0,/{[^{}]+}|}[^{}]+{/) ) {
        tgt =  substr($0,RSTART,RLENGTH)
        gsub(/}/,word[2],tgt)
        gsub(/{/,word[1],tgt)
        gsub(/@C/,"}",tgt); gsub(/@B/,"{",tgt); gsub(/@A/,"@",tgt)
        if ( gsub(/[[:space:]]+/,"&",tgt) <= range ) {
            print FILENAME, FNR, c["red"] tgt c["black"]
        }
        $0 = substr($0,RSTART+length(word[1]))
    }
}

当你运行它时,你会看到一个所有可用颜色的转储文件,对于你的每个目标文本,它的前面会有文件名和该文件中的行号,文本将以红色显示:

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/52263306

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档