将我的CSV-file上传到r时,会在数据框中添加一行和一列,尽管这些行在CSV-file中是空的。数据帧应具有2005个obs。和49个变量。但在上传时,会产生一个具有2006个obs的数据帧。和50个变量。此外,一些字段在上载后由r填充NA。
这是我用来将文件上传到r中的代码:
Dev_REITs_MTBV <- read.csv2("Developed_REITS_MTBV.csv", na="NA")
下面是csv文件:
上传前运行的代码:
pkgs <- c("readxl","akima","rgl","scatterplot3d","car","MASS","ISLR","stargazer","urca","rpart","ggplot2","e1071","randomForest",
"quantreg","mgcv","gamlss","rlang","gplots","psych","ggridges","viridis","caTools","caret","forecast", "shape", "diagram",
"writexl", "openxlsx", "maptools", "ggridges", "calibrate", "modelr", "XLConnect")
for (pkg in pkgs) {if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }}
lapply(pkgs, require, character.only = TRUE)下面是我输入的CSV行和列以及结果数据框的图片:
非常感谢您的帮助!
发布于 2020-09-25 20:24:29
你有没有尝试过来自tidyverse的read_csv()?如果没有实际的文件,这可能很难解决,但只需尝试另一个包可能就能解决它。您还可以尝试使用data.table包中的fread()
稍后编辑/添加:
您的数据相当混乱(使用',‘而不是’‘。作为小数分隔符和';‘和列分隔符,在最后一列中有一整串尾随逗号,并以数字(年份)作为变量名。但是,下面的代码应该可以解决这个问题:
library(tidyverse) # you need dplyr 1.0.0 or later
# load data
dataset <- read_delim("Developed_REITS_MTBV.csv", delim = ";") %>%
# rename final column
rename(`2019` = `2019,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,`) %>%
# delete all trailing commas in last column (but not the first one)
mutate(`2019` = gsub("^,*|(?<=,),|,*$", "", `2019`, perl = T)) %>%
# name the year columns numeric after switching the commas to points
mutate(across(c(`1980`:`2019`), ~as.numeric(gsub(",", ".", .))))部分代码来自:Removing multiple commas and trailing commas using gsub
https://stackoverflow.com/questions/64063918
复制相似问题