首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >在SAS中组合具有重叠数据范围的行

在SAS中组合具有重叠数据范围的行
EN

Stack Overflow用户
提问于 2019-10-19 03:00:56
回答 2查看 412关注 0票数 0

因为我是SAS的新手,所以我需要一些帮助来理解如何将重叠日期范围组合到一行中,我想在重叠日期范围具有匹配的Id时组合它们。如果日期不重叠,那么我想让它们保持原样。如果他们通过匹配Id和毒品代码来重叠,那么它应该合并到一行中。请看下面我拥有的相同的ple数据集和预期结果:

代码语言:javascript
运行
复制
            Current Data set:           
            ID  Drug Code   BEG_Date    End_Date
            1   100 1/1/2018    1/1/2019
            1   100 1/1/2018    3/1/2018
            1   100 2/1/2018    04/30/2018
            1   90  4/1/2018    04/30/2018
            1   100 5/1/2018    6/1/2018
            1   98  6/1/2018    8/31/2018
            1   100 9/1/2018    5/4/2019

            Expected results:           
            ID  Drug Code   BEG_Date    End_Date
            1   100 1/1/2018    3/31/2018
            1   90  4/1/2018    04/30/2018
            1   100 5/1/2018    6/1/2018
            1   98  6/2/2018    8/31/2018
            1   100 9/1/2018    5/4/2019

            I wrote some SAS code but I am combining the dates even when there is no overlap. I want to write some code which should work in SAS.           

            PROC SORT DATA=Want OUT=ONE;            
                BY PERSON_ID BEG_DATE DRUG_CODE END_DATE;       
            RUN;            
            data TWO (DROP=PERSON_ID2 DRUG_CODE2 BEG_DATE END_DATE          
                RENAME=(BEG2=BEG_DOS        
                END2=END_DOS));     
                SET ONE;        
                RETAIN BEG2 END2;       
                PERSON_ID2=LAG1(PERSON_ID);     
                DRUG_CODE2=LAG1(DRUG_CODE);     

                IF PERSON_ID2=PERSON_ID AND DRUG_CODE2=DRUG_CODE AND BEG_DATE LE(END2+1) THEN       
                    DO; 
                        BEG2=MIN(BEG_DATE,BEG2);
                        END2=MAX(END_DATE,END2);
                    END;    
                ELSE        
                    DO; 
                        SEG+1;
                        BEG2=BEG_DATE;
                        END2=END_DATE;
                    END;    

                FORMAT BEG2 END2 MMDDYY10.;     
            RUN;            

            DATA THREE(DROP=BEG_DOS END_DOS SEG);           
                RETAIN BEG_DATE END_DATE;       
                SET TWO;        
                BY PERSON_ID SEG;       
                FORMAT BEG_DATE END_DATE MMDDYY10.;     

                IF FIRST.SEG THEN       
                    DO; 
                        BEG_DATE=BEG_DOS;
                    END;    

                IF LAST.SEG THEN        
                    DO; 
                        END_DATE = END_DOS;
                        OUTPUT;
                    END;    
            RUN;    
EN

回答 2

Stack Overflow用户

发布于 2019-10-19 03:37:19

这就是我要做的。为每个ID、药物和日期创建obs。标记差距并按运行进行总结。

代码语言:javascript
运行
复制
data have;
   input ID  Drug_Code   (BEG End)(:mmddyy.);
   format BEG End mmddyyd10.;
   cards;
1   100 1/1/2018    3/1/2018
1   100 2/1/2018    04/30/2018
1   90  4/1/2018    04/30/2018
1   90  6/1/2018    8/15/2018
1   100 5/1/2018    6/1/2018
1   98  6/1/2018    8/31/2018
1   100 9/1/2018    5/4/2019
;;;;
   run;
proc print;
   run;
/*1   100 1/1/2018    1/1/2019*/

data exv/ view=exv;
   set have;
   do date = beg to end;
      output;
      end;
   drop beg end;
   format date mmddyyd10.;
   run;
proc sort data=exv out=ex nodupkey;
   by id drug_code date;
   run;
data breaksV / view=BreaksV;
   set ex;
   by id drug_code;
   dif = dif(date);
   if first.drug_code then do;  dif=1; run=1; end;
   if dif ne 1 then run+1;
   run;
proc summary data=breaksV nway missing;
   class id drug_code run;
   var date;
   output out=want(drop=_type_) min=Begin max=End;
   run;
Proc print;
   run;

票数 0
EN

Stack Overflow用户

发布于 2019-10-19 20:41:21

计算由重叠段范围组成的范围范围需要很好地理解范围条件(情况)。

考虑按开始日期排序时的场景(在任何更大的分组集G中,如iddrug)

within

  • Extent是组合范围,grows

  • Segment是当前
  • 中的范围[ ]是当前行的范围

案例1 -增长。在G段内,在扩展区结束之前开始

段将不贡献到扩展区或扩展它。

代码语言:javascript
运行
复制
   [####]       Extent
+    [#]        Segment range DOES NOT contribute
 --------
   [####]       Extent (do not output a row, still growing)

代码语言:javascript
运行
复制
   [####]       Extent
+    [#####]    Segment range DOES contribute
 --------
   [#######]    Extent (do not output a row, still growing)

案例2 -终端。3种可能性:

G段内的

  1. 扩展区结束后启动
  2. 下一个G到达(不同的id/drug组合),
  3. 到达数据结尾。

可以通过检查适当的last.标志来测试#2和#3。

代码语言:javascript
运行
复制
   [####]       Extent
+        ..[#]  Segment beyond Extent (gap is 2)
 --------
   [####]       output Extent
           [#]  reset Extent to Segment

您可以调整Segment being adjacent (gap=0)或close (gap < threshold)的规则,以表示范围已扩展,或输出并重置为段。

注意:对于现实世界中的以下情况,情况要复杂一些(未显示):

  • missing start表示数据段具有未知的开始日期(假定其为0=01JAN1960,或早于数据中所有日期的某个日期,或end end表示数据段今天处于活动状态(处理数据时,结束日期为date )

示例代码:

代码语言:javascript
运行
复制
data have;
  call streaminit(42);
  do id = 1 to 10;
    do _n_ = 1 to 50;
      drug = ceil(rand('UNIFORM', 10));
      beg_date = intnx ('MONTH', '01JAN2008'D, rand('UNIFORM',20));
      end_date = intnx ('DAY', beg_date, rand('UNIFORM',75));
      OUTPUT;

    end;
  end;
  format beg_date end_date yymmdd10.;
run;

proc sort data=have out=segments;
  by id drug beg_date end_date;
run;

data want;
  set segments;
  by id drug beg_date end_date;  * will error if incoming data is NOT sorted;

  retain ext_beg ext_end;
  retain gap_allowed 0; * set to 1 for contiguously adjacent segment ;

  if first.drug then do;
    ext_beg = beg_date;
    ext_end = end_date;
    segment_count = 0;
  end;

  if beg_date <= ext_end + gap_allowed then do;
    ext_end = max (ext_end, end_date);
    segment_count + 1;
  end;
  else do;
    extent_id + 1;
    OUTPUT;
    ext_beg = beg_date;
    ext_end = end_date;
    segment_count = 1;
  end;

  if last.drug then do;
    extent_id + 1;
    OUTPUT;
    * reset occurs implicitly;
    * it will happen at first. logic when control returns to top of step;
  end;

  format ext_: yymmdd10.;
  keep id drug ext_beg ext_end segment_count extent_id;
run;
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58456767

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档