我有一个文件夹为每个主题在我的研究中包含2-5个csv数据,我想要绑定在一起。对于每个主题/文件夹,数据的名称是相同的。
我想绑定每个主题的数据,并希望创建一个循环。因为我有230个不同的主题,用rbind手动完成这一操作将是过激的。文件夹名是subjectID
有什么想法吗?
FolderStructure:
subject1/day1.csv
subject1/day2.csv
subject1/day3.csv
subject2/day1.csv
subject2/day2.csv
subject3/day1.csv
subject3/day2.csv
subject3/day3.csv
...发布于 2022-07-08 17:00:53
我将使用dplyr::bind_rows进行演示,尽管它在data.table::rbindlist中也能很好地工作。基本-R变体do.call(rbind, ...)没有直接工作,因为它没有.id=/idcol=简单选项(一些肘脂可以围绕这一点工作)。
list_of_files ## make this however you want
# [1] "sub1/day1.csv" "sub1/day2.csv" "sub2/day1.csv" "sub2/day2.csv"
alldat <- lapply(setNames(nm=list_of_files), read.csv)
### fake data for demonstration
# alldat <- setNames(replicate(4, mtcars[sample(32,3),], simplify=FALSE), list_of_files)
lapply(split(alldat, sub("/.*", "", names(alldat))), dplyr::bind_rows, .id = "subj")
# $sub1
# subj mpg cyl disp hp drat wt qsec vs am gear carb
# Merc 230 sub1/day1.csv 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Lincoln Continental sub1/day1.csv 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
# Merc 450SE sub1/day1.csv 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
# Porsche 914-2 sub1/day2.csv 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# Fiat 128 sub1/day2.csv 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Toyota Corona sub1/day2.csv 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# $sub2
# subj mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 sub2/day1.csv 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Fiat X1-9 sub2/day1.csv 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# Lincoln Continental sub2/day1.csv 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
# Merc 240D sub2/day2.csv 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 450SLC sub2/day2.csv 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
# Hornet 4 Drive sub2/day2.csv 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1现在您有了一个列表,每个主题都有一个元素。
发布于 2022-07-10 19:53:20
您可以通过使用来自read_csv包的readr和获取每个subjectID文件夹中的文件路径来轻松地解决这个问题。
让我们假设我有三个文件夹,subject1, subject2, subject3,它的csv文件具有相同的列,但行不同。
library(readr)
subjects <- 3
paths <- sort(list.files(recursive = TRUE))
paths[1] "subject1/cars1.csv" "subject1/cars2.csv" "subject1/cars3.csv"
[4] "subject2/cars1.csv" "subject2/cars2.csv" "subject3/cars1.csv"
[7] "subject3/cars2.csv" "subject3/cars3.csv"所以我们可以看到subject1文件夹有3个csv文件,subject2有2个csv文件,sunject3文件夹有3个csv文件。我们需要为每个主题文件夹将这些路径分组到csv文件,并为每个read_csv提供一组路径。
for (i in seq_len(subjects)) {
subj <- paste0("subject", i)
subject_paths <- paths[grepl(subj, paths)]
file_name <- paste0("subject_", i, "_binded")
assign(file_name, readr::read_csv(subject_paths, id = "paths"))
}这将为每个主题文件夹创建三个绑定的数据文件。
> subject_1_binded
# A tibble: 15 × 4
paths mpg cyl disp
<chr> <dbl> <dbl> <dbl>
1 subject1/cars1.csv 27.3 4 79
2 subject1/cars1.csv 21 6 160
3 subject1/cars1.csv 13.3 8 350
4 subject1/cars1.csv 19.2 8 400
5 subject1/cars1.csv 10.4 8 460
6 subject1/cars2.csv 14.7 8 440
7 subject1/cars2.csv 18.7 8 360
8 subject1/cars2.csv 30.4 4 95.1
9 subject1/cars2.csv 15.5 8 318
10 subject1/cars2.csv 16.4 8 276.
11 subject1/cars3.csv 21.5 4 120.
12 subject1/cars3.csv 15.5 8 318
13 subject1/cars3.csv 19.7 6 145
14 subject1/cars3.csv 14.3 8 360
15 subject1/cars3.csv 21.4 4 121
> subject_2_binded
# A tibble: 15 × 4
paths mpg cyl disp
<chr> <dbl> <dbl> <dbl>
1 subject2/cars1.csv 27.3 4 79
2 subject2/cars1.csv 21 6 160
3 subject2/cars1.csv 13.3 8 350
4 subject2/cars1.csv 19.2 8 400
5 subject2/cars1.csv 10.4 8 460
6 subject2/cars2.csv 14.7 8 440
7 subject2/cars2.csv 18.7 8 360
8 subject2/cars2.csv 30.4 4 95.1
9 subject2/cars2.csv 15.5 8 318
10 subject2/cars2.csv 16.4 8 276.
11 subject2/cars3.csv 21.5 4 120.
12 subject2/cars3.csv 15.5 8 318
13 subject2/cars3.csv 19.7 6 145
14 subject2/cars3.csv 14.3 8 360
15 subject2/cars3.csv 21.4 4 121
> subject_3_binded
# A tibble: 15 × 4
paths mpg cyl disp
<chr> <dbl> <dbl> <dbl>
1 subject3/cars1.csv 27.3 4 79
2 subject3/cars1.csv 21 6 160
3 subject3/cars1.csv 13.3 8 350
4 subject3/cars1.csv 19.2 8 400
5 subject3/cars1.csv 10.4 8 460
6 subject3/cars2.csv 14.7 8 440
7 subject3/cars2.csv 18.7 8 360
8 subject3/cars2.csv 30.4 4 95.1
9 subject3/cars2.csv 15.5 8 318
10 subject3/cars2.csv 16.4 8 276.
11 subject3/cars3.csv 21.5 4 120.
12 subject3/cars3.csv 15.5 8 318
13 subject3/cars3.csv 19.7 6 145
14 subject3/cars3.csv 14.3 8 360
15 subject3/cars3.csv 21.4 4 121 因此,在你的例子中,对于你的230个主题,你将有230个绑定的数据。
注意到在for -循环的最后一行中使用了
assign(),对于您的情况,它将在全局环境中创建230个data.frame (更恰当地说是tbl_df)对象。
https://stackoverflow.com/questions/72913073
复制相似问题