我有这些数据
UserID Quiz_answers Quiz_Date
1 `a1,a2,a3`Positive 26-01-2017
1 `a1,a4,a3`Positive 26-01-2017
1 `a1,a2,a4`Negative 28-02-2017
1 `a1,a2,a3`Neutral 30-10-2017
1 `a1,a2,a4`Positive 30-11-2017
1 `a1,a2,a4`Negative 28-02-2018
2 `a1,a2,a3`Negative 27-01-2017
2 `a1,a7,a3`Neutral 28-08-2017
2 `a1,a2,a5`Negative 28-01-2017 我想删除重复的行:
重复行的规则如下:
UserID<-c(1,1,1,1,1,1,2,2,2) Quiz\_answers<-c("a1、a2、a3Positive","a1、a4、a3Positive","a1、a2、a4Negative","a1,a2,a3中性值、“a1,a2,a4正”、“a1,a2,a4负值”、“a1,a2,a3负值”、“Negative","a1,a2,a3中性”、“负数”) Quiz_Date<-as.Date(c("26-01-2017“、"26-01-2017”、"28-02-2017“、"30-10-2017”),"30-11-2017“、"28-02-2018”、"27-01-2017“、"28-08-2017”、"28-01-2017")、'%d-%m-%Y')数据<-data.framework(UserID、Quiz_answers、Quiz_Date)-I编写了以下代码
data.removeDuplicates<- function(frames)
{
apply(frames[ ,c(grep("UserID", colnames(data)),grep("Quiz_answers", colnames(data)),grep("Quiz_Date", colnames(data)))],1,function(slice){
Outcome<-paste0("`",tail(strsplit(slice[2],split="`")[[1]],1))
cat("\n\n Searching for records: ",slice[1],Outcome,slice[3])
data<<-data[!( data$UserID == slice[1] & paste0("`",sapply(strsplit(as.character(data[,2]),'`'), tail, 1)) == c(Outcome) & data[,3]==c(slice[3])), ]
})
print(frames)
}
data.removeDuplicates(data)
print(data)
[1] UserID Quiz_answers Quiz_Date
<0 rows> (or 0-length row.names)我在期待产量
UserID Quiz_answers Quiz_Date
1 `a1,a2,a3`Positive 26-01-2017
1 `a1,a2,a4`Negative 28-02-2017
1 `a1,a2,a3`Neutral 30-10-2017
1 `a1,a2,a4`Positive 30-11-2017
1 `a1,a2,a4`Negative 28-02-2018
2 `a1,a2,a3`Negative 27-01-2017
2 `a1,a7,a3`Neutral 28-08-2017
2 `a1,a2,a5`Negative 28-01-2017 根据规则,只有第二行应该从DataFrame中删除,这是满足重复条件的唯一行。我做错什么了?
发布于 2017-10-26 16:50:41
试试看
你的数据
df <- read.table(text="UserID Quiz_answers Quiz_Date
1 `a1,a2,a3`Positive 26-01-2017
1 `a1,a4,a3`Positive 26-01-2017
1 `a1,a2,a4`Negative 28-02-2017
1 `a1,a2,a3`Neutral 30-10-2017
1 `a1,a2,a4`Positive 30-11-2017
1 `a1,a2,a4`Negative 28-02-2018
2 `a1,a2,a3`Negative 27-01-2017
2 `a1,a7,a3`Neutral 28-08-2017
2 `a1,a2,a5`Negative 28-01-2017", header = TRUE, stringsAsFactors=FALSE)解决与输出
library(dplyr)
ans <- df %>%
mutate(grp = sub(".*`(\\D+)$", "\\1", Quiz_answers)) %>%
group_by(grp, UserID, Quiz_Date) %>%
slice(1) %>%
ungroup() %>%
select(-grp) %>%
arrange(UserID, Quiz_Date)
# A tibble: 8 x 3
# UserID Quiz_answers Quiz_Date
# <int> <chr> <chr>
# 1 1 `a1,a2,a3`Positive 26-01-2017
# 2 1 `a1,a2,a4`Negative 28-02-2017
# 3 1 `a1,a2,a4`Negative 28-02-2018
# 4 1 `a1,a2,a3`Neutral 30-10-2017
# 5 1 `a1,a2,a4`Positive 30-11-2017
# 6 2 `a1,a2,a3`Negative 27-01-2017
# 7 2 `a1,a2,a5`Negative 28-01-2017
# 8 2 `a1,a7,a3`Neutral 28-08-2017发布于 2017-10-26 17:11:16
您可以使用sqldf包,如下所示。首先,查找Positive、Negative和Neutral组。然后,使用group by过滤副本
require("sqldf")
result <- sqldf("SELECT * FROM df WHERE Quiz_answers LIKE '%`Positive' GROUP BY UserID, Quiz_Date
UNION
SELECT * FROM df WHERE Quiz_answers LIKE '%`Negative' GROUP BY UserID, Quiz_Date
UNION
SELECT * FROM df WHERE Quiz_answers LIKE '%`Neutral' GROUP BY UserID, Quiz_Date")运行后的result是:
UserID Quiz_answers Quiz_Date
1 1 `a1,a2,a3`Neutral 30-10-2017
2 1 `a1,a2,a4`Negative 28-02-2017
3 1 `a1,a2,a4`Negative 28-02-2018
4 1 `a1,a2,a4`Positive 30-11-2017
5 1 `a1,a4,a3`Positive 26-01-2017
6 2 `a1,a2,a3`Negative 27-01-2017
7 2 `a1,a2,a5`Negative 28-01-2017
8 2 `a1,a7,a3`Neutral 28-08-2017发布于 2017-10-26 17:39:03
这里有一个两行解决方案,只使用基本R:
data[,"group"] <- with(data, sub(".*`", "", Quiz_answers))
data <- data[as.integer(rownames(unique(data[, !(names(data) %in% "Quiz_answers") ]))), !(names(data) %in% "group")]https://stackoverflow.com/questions/46959590
复制相似问题