我试着从一个网站(网址)下载一个图片(网址)根据一个特定的人的名字(以R)。我得到以下错误
Error in read_xml.raw(raw, encoding = encoding, base_url = base_url, as_html = as_html, :
CHAR() can only be applied to a 'CHARSXP', not a 'NULL' 这是回溯
19.read_xml.raw(raw, encoding = encoding, base_url = base_url, as_html = as_html,
options = options)
18.read_xml.connection(con, encoding = encoding, ..., as_html = as_html,
base_url = x, options = options)
17.read_xml.character(x, encoding = encoding, ..., as_html = TRUE,
options = options)
16.read_xml(x, encoding = encoding, ..., as_html = TRUE, options = options)
15.withCallingHandlers(expr, warning = function(w) if (inherits(w,
classes)) tryInvokeRestart("muffleWarning"))
14.suppressWarnings(read_xml(x, encoding = encoding, ..., as_html = TRUE,
options = options))
13.read_html.default(., image_page)
12.read_html(., image_page)
11.html_nodes(., "img")
10.xml2::xml_attr(x, name, default = default)
9.html_attr(., "src")
8.handle_url(handle, url, ...)
7.httr::GET(.)
6.is.response(x)
5.stopifnot(is.response(x))
4.httr::content(., "raw")
3.writeBin(., paste0("~/", ceo_name, ".jpg"))
2.paste0(site, image_page) %>% read_html(image_page) %>% html_nodes("img") %>%
html_attr("src") %>% {
grep("gstatic", ., value = TRUE)
} %>% 1[] %>% httr::GET() %>% httr::content("raw") %>% writeBin(paste0("~/", ...
1.get_image(as.character("Mark Lloyd")) 我不明白为什么。拜托,有人能点亮我吗?非常感谢
码
> library(rvest)
> library(httr)
>
> get_image <- function(ceo_name)
+ {
+ site <- "https://www.icobench.com"
+ query <- paste0(site, "/ico/max-crowdfund/team", url_escape(ceo_name))
+
+ image_page <- read_html(query) %>%
+ html_nodes(xpath = "//a[contains(text(), 'Images')]") %>%
+ html_attr("href")
+
+ paste0(site, image_page) %>%
+ read_html(image_page) %>%
+ html_nodes("img") %>%
+ html_attr("src") %>%
+ {grep("gstatic", ., value = TRUE)} %>%
+ `[`(1) %>%
+ httr::GET() %>%
+ httr::content("raw") %>%
+ writeBin(paste0("~/", ceo_name, ".jpg"))
+ }
>
> get_image(as.character("Mark Lloyd"))发布于 2021-03-05 17:35:37
我会确保我做到了?以在url中指示查询字符串。然后,我将使用一个较短的css选择器来从页面上选择合适的图像。您应该真正检查状态,以确保页面找到。
在找到合适的节点后,我将使用regex模式提取url,并将它与协议+域从url_absolute直接传递给httr。Regex,因为url位于该页面的样式属性中。
理想情况下,在尝试从节点提取属性值之前,还应该测试节点是否匹配。
因为您只想要一个映像,所以可以使用html_node作为首席执行官节点,从而获得在单个节点匹配后退出的效率。
这是假设实际的登陆url,完整的?应该是:
https://icobench.com/ico/max-crowdfund/team?Mark%20Lloyd
对于该页面,xpath语句无效。
没有经过其他CEO的测试。
library(rvest)
library(httr)
library(stringr)
get_image <- function(ceo_name) {
site <- "https://www.icobench.com"
query <- paste0(site, "/ico/max-crowdfund/team?", url_escape(ceo_name))
image_page <- read_html(query) %>%
html_node(paste0('[title="', ceo_name, '"] .image_background')) %>%
html_attr("style") %>%
stringr::str_extract("/.*\\.jpg") %>%
url_absolute(site)
image_page %>%
httr::GET() %>%
httr::content("raw") %>%
writeBin(paste0("~/", ceo_name, ".jpg"))
}
get_image(as.character("Mark Lloyd"))https://stackoverflow.com/questions/66491530
复制相似问题