文章/答案/技术大牛

发布

社区首页 >问答首页 >将URL的完整HTML内容读取并保存到文本文件中

问将URL的完整HTML内容读取并保存到文本文件中
EN

Stack Overflow用户

提问于 2014-03-22 07:27:59

回答 3查看 9.5K关注 0票数 4

要求：

要从任何网站读取HTML，请说"http://www.twitter.com“。打印检索的HTML 将其保存到本地计算机上的文本文件。

代码：

import java.net.*;

import java.io.*;

public class oddless {
    public static void main(String[] args) throws Exception {

        URL oracle = new URL("http://www.fetagracollege.org");
        BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));

        OutputStream os = new FileOutputStream("/Users/Rohan/new_sourcee.txt");


        String inputLine;
        while ((inputLine = in.readLine()) != null)
            System.out.println(inputLine);
        in.close();
    }
}

上面的代码检索数据，在控制台上打印数据，并将其保存到文本文件中，但大多数情况下，它只检索一半代码(因为html代码中的行空间)。它不会进一步保存代码。

问题：

如何保存完整的html代码？还有其他选择吗？

java

urlconnection

回答 3

Stack Overflow用户

发布于 2014-03-22 07:46:58

我使用了不同的方法，但我收到了与您相同的输出。这个URL的服务器端没有问题吗？

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet("http://www.fetagracollege.org");
CloseableHttpResponse response1 = httpclient.execute(httpGet);
try {
    System.out.println(response1.getStatusLine());
    HttpEntity entity1 = response1.getEntity();
    String content = EntityUtils.toString(entity1);
    System.out.println(content);
} finally {
    response1.close();
}

它的结尾是：

    </table>
    <p><br>

更新:这个工程和技术学院没有良好的主页。此内容已完成，您的代码运行良好。但是评论员是对的，你应该使用尝试/捕捉/最后块。

票数 0

Stack Overflow用户

发布于 2014-03-22 08:14:27

每当通过Java连接到网站时，我都会使用此代码

import java.io.*;
import java.net.*;

public class Connection
{
    public static void main(String[] args) throws Exception
    {
        new Connection();
    }
    public Connection() throws Exception
    {
        URL url = new URL("http://www.fetagracollege.org"); //The URL
        HttpURLConnection huc = connect(url); //Connects to the website
        huc.connect(); //Opens the connection
        String str = readBody(huc); //Reads the response
        huc.disconnect(); //Closes
        System.out.println(str); //Prints all output to the console
    }

    private String readBody(HttpURLConnection huc) throws Exception //Reads the response
    {
        InputStream is = huc.getInputStream(); //Inputstream
        BufferedReader rd = new BufferedReader(new InputStreamReader(is)); //BufferedReader
        String line;
        StringBuffer response = new StringBuffer();
        while ((line = rd.readLine()) != null)
        {
            response.append(line); //Append the line
            response.append('\n'); //and a new line
        }
        rd.close();
        return response.toString();
    }

    private HttpURLConnection connect(URL url) throws Exception //Connect to the URL
    {
        HttpURLConnection huc = (HttpURLConnection) url.openConnection(); //Opens connection to the website
        huc.setReadTimeout(15000); //Read timeout - 15 seconds
        huc.setConnectTimeout(15000); //Connecting timeout - 15 seconds
        huc.setUseCaches(false); //Don't use cache
        HttpURLConnection.setFollowRedirects(true); //Follow redirects if there are any
        huc.addRequestProperty("Host", "www.fetagracollege.org"); //www.fetagracollege.org is the host
        huc.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"); //Chrome user agent
        return huc;
    }
}

这个网站就是这样结束的，所以我认为问题是服务器端的，因为其他网站使用这个代码(用推特和谷歌测试)：

                            </font>&copy; fetaca 2011 </td>
                    </tr>
            </table>
    <p><br>

票数 0

Stack Overflow用户

发布于 2015-06-17 11:32:08

对于从URL读取内容，可以使用jsoup，然后可以使用文件处理概念(OutputStream out =.)来创建内容，这样就可以使用jsoup读取内容：

String url = "URL"; // getting URL
Document doc = Jsoup.connect(url).get(); // getting content as document type
String line = input.toString(); // getting contents as String type

现在，在字符串中包含内容之后，您可以轻松地将其刷新到文件中。

为了这个-你需要罐子。-导入3(3)类:导入org.jsoup.Jsoup；导入org.jsoup.nodes.Document；导入org.jsoup.select.Elements；

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/22574764

复制

相似问题

问将URL的完整HTML内容读取并保存到文本文件中
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将URL的完整HTML内容读取并保存到文本文件中EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将URL的完整HTML内容读取并保存到文本文件中
EN