前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >SAS删除字符串中的重复项

SAS删除字符串中的重复项

作者头像
专业余码农
发布2020-07-15 16:57:35
5.2K0
发布2020-07-15 16:57:35
举报
文章被收录于专栏:老Z的博客

SAS程序猿/媛有时候会碰到去除字符串中重复值的问题,用常用的字符函数如SCAN,SUBSTR可能会很费劲,用正则表达式来处理就简单了。示例程序如下:

代码语言:javascript
复制
data _null_;
    infile cards truncover;
    input STRING $32767.;
    REX1=prxparse('s/([a-z].+?\.\s+)(.*?)(\1+)/\2\3/i');
    REX2=prxparse('/([a-z].+?\.\s+)(.*?)(\1+)/i');
    do i=1 to 100;
        STRING=prxchange(REX1, -1, compbl(STRING));
        if not prxmatch(REX2, compbl(STRING)) then leave;
    end;
    put STRING=;
cards;
a. The cow jumps over the moon.
a. The cow jumps over the moon. b. The chicken crossed the road. c. The quick brown fox jumped over the lazy dog. a. The cow jumps over the moon. 
b. The chicken crossed the road. a. The cow jumps over the moon. b. The chicken crossed the road. c. The quick brown fox jumped over the lazy dog.
a. The cow jumps over the moon. a. The cow jumps over the moon. b. The chicken crossed the road. b. The chicken crossed the road. c. The quick brown fox jumped over the lazy dog. c. The quick brown fox jumped over the lazy dog.
a. The cows jump over the moon. a. The cows jump over the moon. b. The chickens crossed the road. b. The chickens crossed the road. c. The quick brown foxes jumped over the lazy dog. c. The quick brown foxes jumped over the lazy dog.
a. The cow jumps over the moon. b. The chicken crossed the road.  c. The quick brown fox jumped over the lazy dog. a. The cow jumps over the moon.  b. The chicken crossed the road. c. The quick brown fox jumped over the lazy dog.
;
run;

可以看到上面的重复项是一整个句子,如果重复项是单词,上面的表达式就要改了:

代码语言:javascript
复制
data _null_;
    STRING='cow chicken fox cow chicken fox cows chickens foxes';
    REX1=prxparse('s/(\b\w+\b)(.*?)(\b\1+\b)/\2\3/i');
    REX2=prxparse('/(\b\w+\b)(.*?)(\b\1+\b)/i');
    do i=1 to 100;
        STRING=prxchange(REX1, -1, compbl(STRING));
        if not prxmatch(REX2, compbl(STRING)) then leave;
    end;
    put STRING=;
run;

注意上面的表达式中第一个括号中的\b是用来限定只匹配单词而不是单个字母。第三个括号中的\b表示精确匹配,即匹配一模一样的单词。

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2016-11-26,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档