因为我是SAS的新手,所以我需要一些帮助来理解如何将重叠日期范围组合到一行中,我想在重叠日期范围具有匹配的Id时组合它们。如果日期不重叠,那么我想让它们保持原样。如果他们通过匹配Id和毒品代码来重叠,那么它应该合并到一行中。请看下面我拥有的相同的ple数据集和预期结果:
Current Data set:
ID Drug Code BEG_Date End_Date
1 100 1/1/2018 1/1/2019
1 100 1/1/2018 3/1/2018
1 100 2/1/2018 04/30/2018
1 90 4/1/2018 04/30/2018
1 100 5/1/2018 6/1/2018
1 98 6/1/2018 8/31/2018
1 100 9/1/2018 5/4/2019
Expected results:
ID Drug Code BEG_Date End_Date
1 100 1/1/2018 3/31/2018
1 90 4/1/2018 04/30/2018
1 100 5/1/2018 6/1/2018
1 98 6/2/2018 8/31/2018
1 100 9/1/2018 5/4/2019
I wrote some SAS code but I am combining the dates even when there is no overlap. I want to write some code which should work in SAS.
PROC SORT DATA=Want OUT=ONE;
BY PERSON_ID BEG_DATE DRUG_CODE END_DATE;
RUN;
data TWO (DROP=PERSON_ID2 DRUG_CODE2 BEG_DATE END_DATE
RENAME=(BEG2=BEG_DOS
END2=END_DOS));
SET ONE;
RETAIN BEG2 END2;
PERSON_ID2=LAG1(PERSON_ID);
DRUG_CODE2=LAG1(DRUG_CODE);
IF PERSON_ID2=PERSON_ID AND DRUG_CODE2=DRUG_CODE AND BEG_DATE LE(END2+1) THEN
DO;
BEG2=MIN(BEG_DATE,BEG2);
END2=MAX(END_DATE,END2);
END;
ELSE
DO;
SEG+1;
BEG2=BEG_DATE;
END2=END_DATE;
END;
FORMAT BEG2 END2 MMDDYY10.;
RUN;
DATA THREE(DROP=BEG_DOS END_DOS SEG);
RETAIN BEG_DATE END_DATE;
SET TWO;
BY PERSON_ID SEG;
FORMAT BEG_DATE END_DATE MMDDYY10.;
IF FIRST.SEG THEN
DO;
BEG_DATE=BEG_DOS;
END;
IF LAST.SEG THEN
DO;
END_DATE = END_DOS;
OUTPUT;
END;
RUN;
发布于 2019-10-19 03:37:19
这就是我要做的。为每个ID、药物和日期创建obs。标记差距并按运行进行总结。
data have;
input ID Drug_Code (BEG End)(:mmddyy.);
format BEG End mmddyyd10.;
cards;
1 100 1/1/2018 3/1/2018
1 100 2/1/2018 04/30/2018
1 90 4/1/2018 04/30/2018
1 90 6/1/2018 8/15/2018
1 100 5/1/2018 6/1/2018
1 98 6/1/2018 8/31/2018
1 100 9/1/2018 5/4/2019
;;;;
run;
proc print;
run;
/*1 100 1/1/2018 1/1/2019*/
data exv/ view=exv;
set have;
do date = beg to end;
output;
end;
drop beg end;
format date mmddyyd10.;
run;
proc sort data=exv out=ex nodupkey;
by id drug_code date;
run;
data breaksV / view=BreaksV;
set ex;
by id drug_code;
dif = dif(date);
if first.drug_code then do; dif=1; run=1; end;
if dif ne 1 then run+1;
run;
proc summary data=breaksV nway missing;
class id drug_code run;
var date;
output out=want(drop=_type_) min=Begin max=End;
run;
Proc print;
run;
发布于 2019-10-19 20:41:21
计算由重叠段范围组成的范围范围需要很好地理解范围条件(情况)。
考虑按开始日期排序时的场景(在任何更大的分组集G中,如id
和drug
)
within
Extent
是组合范围,grows
Segment
是当前[
]
是当前行的范围案例1 -增长。在G段内,在扩展区结束之前开始
段将不贡献到扩展区或扩展它。
[####] Extent
+ [#] Segment range DOES NOT contribute
--------
[####] Extent (do not output a row, still growing)
或
[####] Extent
+ [#####] Segment range DOES contribute
--------
[#######] Extent (do not output a row, still growing)
案例2 -终端。3种可能性:
G段内的
id
/drug
组合),可以通过检查适当的last.
标志来测试#2和#3。
[####] Extent
+ ..[#] Segment beyond Extent (gap is 2)
--------
[####] output Extent
[#] reset Extent to Segment
您可以调整Segment being adjacent (gap=0)或close (gap < threshold)的规则,以表示范围已扩展,或输出并重置为段。
注意:对于现实世界中的以下情况,情况要复杂一些(未显示):
示例代码:
data have;
call streaminit(42);
do id = 1 to 10;
do _n_ = 1 to 50;
drug = ceil(rand('UNIFORM', 10));
beg_date = intnx ('MONTH', '01JAN2008'D, rand('UNIFORM',20));
end_date = intnx ('DAY', beg_date, rand('UNIFORM',75));
OUTPUT;
end;
end;
format beg_date end_date yymmdd10.;
run;
proc sort data=have out=segments;
by id drug beg_date end_date;
run;
data want;
set segments;
by id drug beg_date end_date; * will error if incoming data is NOT sorted;
retain ext_beg ext_end;
retain gap_allowed 0; * set to 1 for contiguously adjacent segment ;
if first.drug then do;
ext_beg = beg_date;
ext_end = end_date;
segment_count = 0;
end;
if beg_date <= ext_end + gap_allowed then do;
ext_end = max (ext_end, end_date);
segment_count + 1;
end;
else do;
extent_id + 1;
OUTPUT;
ext_beg = beg_date;
ext_end = end_date;
segment_count = 1;
end;
if last.drug then do;
extent_id + 1;
OUTPUT;
* reset occurs implicitly;
* it will happen at first. logic when control returns to top of step;
end;
format ext_: yymmdd10.;
keep id drug ext_beg ext_end segment_count extent_id;
run;
https://stackoverflow.com/questions/58456767
复制相似问题