文章/答案/技术大牛

发布

社区首页 >问答首页 >通过wget下载服务器上的最新文件

问通过wget下载服务器上的最新文件
EN

Stack Overflow用户

提问于 2014-05-01 20:29:47

回答 2查看 2.1K关注 0票数 1

大家下午好，

我正在努力弄清楚如何使用我的Linux系统上的wget从服务器下载最新的文件。这些文件是5分钟雷达数据，因此，文件增加了5分钟，直到最近的，即1930.grib2,1935.grib 2,1940.grib2等。

目前，我在bash脚本中实现了以下代码，该脚本从每个小时的顶部开始下载每个文件，但这并不是获取最新文件的有效方法：

HR=$(date +%H)
padtowidth=2
START=0
END=55
i=${START}

while [[ ${i} -le ${END} ]]
do

tau=$(printf "%0*d\n" $padtowidth ${i})

URL1=http://thredds.ucar.edu/thredds/fileServer/grib/nexrad/composite/unidata/files/${YMD}/Level_3_Composite_N0R_${YMD}_${HR}${tau}.grib2

wget -P ${HOMEDIR}${PATH1}${YMD}/${HR}Z/ -N ${URL1}

((i = i + 5))
done

ubuntu

wget

linux

bash

unix

回答 2

Stack Overflow用户

回答已采纳

发布于 2014-05-01 23:23:25

如果有所有文件的索引，您可以先下载该文件，然后解析它以查找最新的文件。

如果这是不可能的，您可以从当前时间(除date +%H之外使用date +%H)向后计数，如果wget能够获得文件，则停止(例如，如果wget与0一起退出)。

希望能帮上忙！

解析索引的示例：

filename=`wget -q -O - http://thredds.ucar.edu/thredds/catalog/grib/nexrad/composite/unidata/NEXRAD_Unidata_Reflectivity-20140501/files/catalog.html | grep '<a href=' | head -1 | sed -e 's/.*\(Level3_Composite_N0R_[0-9]*_[0-9]*.grib2\).*/\1/'`

这将获取页面，并通过一个快速的<a href=来提取文件名，运行包含一个sed的第一行。

票数 2

Stack Overflow用户

发布于 2018-07-24 12:12:39

我制作了一个C++控制台程序来自动完成这个任务。我会在下面贴出整个代码。只需使用wget捕获目录文件，然后在同一个目录中运行此文件，它将自动创建一个BAT文件，您可以随意启动该文件来下载最新的文件。我专门为Unidata THREDDS服务器编写了这篇文章，所以我知道这是一个很好的答案。编辑和重要的注意事项:这是最新的围棋-16数据，所以你将不得不围绕不同产品的子字符串值。

#include <iostream>
#include <string>
#include <stdio.h>
#include <time.h>
#include <iostream>
#include <fstream>
#include <sstream>
using namespace std;


int main() 

{

// First, I open the catalog.html which was downloaded using wget, and put the entire file into a string.

ifstream inFile; // create instance
inFile.open("catalog.html"); // opens the file
stringstream strStream; // create stringstream
strStream << inFile.rdbuf();  //read the file
string str = strStream.str();  //str holds the content of the file

cout << str << endl;  // The string contains the entire catalog ... you can do anything with the string

// Now I will create the entire URL we need automatically by getting the base URL which is known (step 1 is : string "first")

string first= "http://thredds-test.unidata.ucar.edu/thredds/fileServer/satellite/goes16/GRB16/ABI/CONUS/Channel02/current/";

// The string "second" is the actual filename, since (to my knowledge) the filename in the HTML file never changes, but this must be watched in case it DOES change     in the future. I use the c++ substring function to extract it.

string second = str.substr(252784,76); 


// I then create a batch file and write "wget (base url + filename)" which can now automatically launch/download the latest GRIB2 file.

ofstream myfile2;
myfile2.open ("downloadGOESLatest.bat");
myfile2 << "wget ";
myfile2 << first;
myfile2 << second;
myfile2.close();


return 0;

}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/23416050

复制

相似问题

问通过wget下载服务器上的最新文件
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问通过wget下载服务器上的最新文件EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问通过wget下载服务器上的最新文件
EN