我有两个数据文件,我想为它创建一个笛卡儿积。
当我这样做时,正如我所期望的那样,我得到匹配的对,就像我考虑膝上型计算机和无线电=收音机和膝上型计算机。
#2 dataframes to produce the cartesian product
SaleItems<-data.frame(Appliance=c("Radio", "Laptop", "TV", "Fridge"))
SaleItems2<-data.frame(Appliance2=c("Radio", "Laptop", "TV", "Fridge"))
#create cartesian product
SaleItems3<-merge(SaleItems,SaleItems2)
我想要的是摆脱匹配的配对,例如,收音机和笔记本电脑的组合只出现一次。
有人有什么建议来达到这个目的吗?
发布于 2019-09-26 12:46:40
对于带有merge
的笛卡尔联接,通过参数将NULL传递到:
merge(SaleItems, SaleItems2, by=NULL)
然后,若要删除等效匹配并反向重复,请使用subset
扩展它。
subset(merge(SaleItems, SaleItems2, by=NULL),
Appliance <= Appliance2)
如果字段是因素:
subset(merge(SaleItems, SaleItems2, by=NULL),
as.character(Appliance) <= as.character(Appliance2))
# Appliance Appliance2
# 1 Radio Radio
# 2 Laptop Radio
# 4 Fridge Radio
# 6 Laptop Laptop
# 8 Fridge Laptop
# 9 Radio TV
# 10 Laptop TV
# 11 TV TV
# 12 Fridge TV
# 16 Fridge Fridge
发布于 2019-09-26 11:40:31
一种方法是使用pmin
/pmax
按字母顺序重新排列每一行中的列值,并只选择distinct
行。
library(dplyr)
SaleItems3 %>%
mutate(app = pmin(Appliance, Appliance2), app1 = pmax(Appliance, Appliance2)) %>%
dplyr::select(app, app1) %>%
distinct()
# app app1
#1 Radio Radio
#2 Laptop Radio
#3 Radio TV
#4 Fridge Radio
#5 Laptop Laptop
#6 Laptop TV
#7 Fridge Laptop
#8 TV TV
#9 Fridge TV
#10 Fridge Fridge
发布于 2019-09-26 12:30:41
另一种方式在基本R
,也排除相同-相同的匹配。
f <- outer(SaleItems$Appliance, SaleItems2$Appliance2 , FUN = "paste", sep = ",")
as.data.frame(do.call(rbind, strsplit(f[upper.tri(f)], ",")))
V1 V2
1 Radio Laptop
2 Radio TV
3 Laptop TV
4 Radio Fridge
5 Laptop Fridge
6 TV Fridge
编辑:若要包含相同的匹配,请执行以下操作:
as.data.frame(do.call(rbind, strsplit(f[upper.tri(f, diag = T)], ",")))
V1 V2
1 Radio Radio
2 Radio Laptop
3 Laptop Laptop
4 Radio TV
5 Laptop TV
6 TV TV
7 Radio Fridge
8 Laptop Fridge
9 TV Fridge
10 Fridge Fridge
https://stackoverflow.com/questions/58115917
复制相似问题