文章/答案/技术大牛

发布

社区首页 >问答首页 >R:使用R Edgar包从SEC edgar数据库读取旧的13F txt文件

问R:使用R Edgar包从SEC edgar数据库读取旧的13F txt文件
EN

Stack Overflow用户

提问于 2021-10-06 06:06:01

回答 2查看 239关注 0票数 0

您好，我正在尝试使用R edgar包读取SEC edgar数据库中的13F文件

我面临的挑战是我正在查看的文件是旧的文件(~2000年) https://www.sec.gov/edgar/browse/?CIK=1087699

它们是糟糕的txt格式，不同于今天的13F，并且无法使用readtxt函数读取。

示例文件在这里：https://www.sec.gov/Archives/edgar/data/1087699/000108769999000001/0001087699-99-000001.txt

library(edgar)

F13<-
  getFilings(
  cik.no = "0001087699",
  form.type = "13F-HR",
  1999,
  quarter=c(1,2,3),
  useragent="myname@gmail.com"
)

我试过了，R告诉我它一直在忙着下载，它不是一个很大的txt文件。所以有些地方不对劲。然后，当它最终完成时，它会说没有找到给定的CIK和表单类型的归档信息，但我清楚地看到了该文件。如果edgar包不是为处理它而设计的，那么我如何处理它呢？

我的最终目标是将文件放在漂亮的数据框架中，列用于股票符号和价格，行用于股票数据。请帮个忙。

有没有可用的报废产品？我用铬合金高亮了检查的灯光，但它们在我看来很奇怪(对不起，根本不擅长报废)。

web-scraping

txt

edgar

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-10-12 12:59:28

我解析了您提供的作为示例here的文件。我首先将数据从文件复制到txt文件。文件copied.txt需要位于当前工作目录中。这可以给你一个如何继续的想法。

library(tidyverse)

df <- read_file("copied.txt") %>%
  # trying to extract data only from the table
  (function(x){
    tbl_beg <- str_locate(x, "Managers Sole")[2] + 1
    tbl_end <- str_locate(x, "\r\n</TABLE>")[1]
    str_sub(x, tbl_beg, tbl_end)
    }) %>%
  # removing some unwanted characters from the beginning and the end of the extracted string
  str_sub(start = 4, end = -3) %>%
  # splitting for individual lines
  str_split('\"\r\n\"') %>% unlist() %>%
  # removing broken line break
  str_remove("\r\n") %>%
  # replacing the original text where there are spaces with one, where there is underscore
  # the reason for that is that I need to split the rows into columns using space
  str_replace_all("Sole   Managers Sole", " Managers_Sole") %>%
  # removing extra spaces
  str_squish() %>%
  # reversing the order of the line (I need to split from the right because the company name contains additional spaces)
  # if the company name is the last one, it is okey that there are additional spaces
  stringi::stri_reverse() %>%
  str_split(pattern = " ", n = 6, simplify = T) %>%
  # making the order to the original one
  apply(MARGIN = 2, FUN = stringi::stri_reverse) %>%
  as_tibble() %>%
  select(c(6:1)) %>%
  set_names(nm = c("name_of_issuer", "title_of_cl", "cusip_number", "fair_market_value", "shares",  "shares_of_princip_mngrs"))

# A tibble: 47 x 6
   name_of_issuer   title_of_cl cusip_number fair_market_value shares  shares_of_princip_mngrs
   <chr>            <chr>       <chr>        <chr>             <chr>   <chr>                  
 1 America Online   COM         02364J104    2,940,000         20,000  Managers_Sole          
 2 Anheuser Busch   COM         35229103     3,045,000         40,000  Managers_Sole          
 3 At Home          COM         45919107     787,500           5,000   Managers_Sole          
 4 AT&T             COM         1957109      5,985,937         75,000  Managers_Sole          
 5 Bank Toyko       COM         65379109     700,000           50,000  Managers_Sole          
 6 Bay View Capital COM         07262L101    14,958,437        792,500 Managers_Sole          
 7 Broadcast.com    COM         111310108    2,954,687         25,000  Managers_Sole          
 8 Chase Manhattan  COM         16161A108    10,578,750        130,000 Managers_Sole          
 9 Chase Manhattan  4/85C       16161A9DQ    59,375            500     Managers_Sole          
10 Cisco Systems    COM         17275R102    4,930,312         45,000  Managers_Sole

票数 1

Stack Overflow用户

发布于 2021-10-11 09:21:25

您可以使用httr包来请求页面：

> install.packages("httr")
# follow instructions etc

然后在R shell中(您可能需要重新启动)：

> httr::GET("https://www.sec.gov/Archives/edgar/data/1087699/000108769999000001/0001087699-99-000001.txt")

这将成功地下载文件，但是我的R语言不够流利，无法解析这个文本，但它看起来很简单:按<TABLE>拆分文本，用新行样条换行，用空格拆分每一行。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69460581

复制

相似问题

问R:使用R Edgar包从SEC edgar数据库读取旧的13F txt文件
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R:使用R Edgar包从SEC edgar数据库读取旧的13F txt文件EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R:使用R Edgar包从SEC edgar数据库读取旧的13F txt文件
EN