我应该将硕士论文的数据加载到一个R-dataframe中,它存储在74个excel工作簿中。每个工作簿都有4个工作表,分别称为: animals、features、r_words、verbs。所有的工作表都有相同的12个变量(starttime、word、endtime、ID、...等等)。我希望将前面的每个工作表连接在一起,因此得到的数据帧应该有12列,行数取决于74个主题产生的答案的数量。我想使用tidyverse的readxl-package,并遵循本文:https://readxl.tidyverse.org/articles/articles/readxl-workflows.html#csv-caching-and-iterating-over-sheets。我面临的第一个问题是如何使用read_excel(path,sheet = "animals","features","r_words","verbs")读取所有4个工作表。这只适用于第一个工作表,所以我尝试列出所有的工作表名称(object sheet)。这也不起作用。当我尝试在一个工作表中使用以下代码时,下一行抛出一个错误: Error in basename(.):一个字符矢量参数,因此,这是我的代码的一部分,希望能满足要求:
filenames <- list.files("data", pattern = '\\.xlsm',full.names = TRUE)
# indices
subfile_nos <- 1:length(filenames)
# function to read all the sheets in at once and cache to csv
read_then_csv <- function(sheet, path) {
for (i in 1:length(filenames)){
sheet <- excel_sheets(filenames[i])
len.sheet <- 1:length(sheet)
path <- read_excel(filenames[i], sheet = sheet[i]) #only reading in the first sheet
pathbase <- path %>%
basename() %>% #Error in basename(.) : a character vector argument expected
tools::file_path_sans_ext()
path %>%
read_excel(sheet = sheet) %>%
write_csv(paste0(pathbase, "-", sheet, ".csv"))
}
}发布于 2019-03-08 22:28:23
您应该执行双循环或嵌套映射,如下所示:
library(dplyr)
library(purrr)
library(readxl)
# I suggest looking at
?purrr::map_df
# Function to read all the sheets in at once and save as csv
read_then_csv <- function(input_filenames, output_file) {
# Iterate over files and concatenate results
map_df(input_filenames, function(f){
# Iterate over sheets and concatenate results
excel_sheets(f) %>%
map_df(function(sh){
read_excel(f, sh)
})
}) %>%
# Write csv
write_csv(output_file)
}
# Test function
filenames <- list.files("data", pattern = '\\.xlsm',full.names = TRUE)
read_then_csv(filenames, 'my_output.csv')发布于 2019-03-13 05:27:45
你说...“我想把之前的每个工作表连接起来”...下面的脚本将合并所有文件中的所有图纸。在你的数据副本上测试它,以防它不能做你想要/需要它做的事情。
# load names of excel files
files = list.files(path = "C:\\your_path_here\\", full.names = TRUE, pattern = ".xlsx")
# create function to read multiple sheets per excel file
read_excel_allsheets <- function(filename, tibble = FALSE) {
sheets <- readxl::excel_sheets(filename)
sapply(sheets, function(f) as.data.frame(readxl::read_excel(filename, sheet = f)),
simplify = FALSE)
}
# execute function for all excel files in "files"
all_data <- lapply(files, read_excel_allsheets)https://stackoverflow.com/questions/55063552
复制相似问题