要求:
要从任何网站读取HTML,请说"http://www.twitter.com“。 打印检索的HTML 将其保存到本地计算机上的文本文件。
代码:
import java.net.*;
import java.io.*;
public class oddless {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://www.fetagracollege.org");
BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));
OutputStream os = new FileOutputStream("/Users/Rohan/new_sourcee.txt");
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}上面的代码检索数据,在控制台上打印数据,并将其保存到文本文件中,但大多数情况下,它只检索一半代码(因为html代码中的行空间)。它不会进一步保存代码。
问题:
如何保存完整的html代码? 还有其他选择吗?
发布于 2014-03-22 07:46:58
我使用了不同的方法,但我收到了与您相同的输出。这个URL的服务器端没有问题吗?
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet("http://www.fetagracollege.org");
CloseableHttpResponse response1 = httpclient.execute(httpGet);
try {
System.out.println(response1.getStatusLine());
HttpEntity entity1 = response1.getEntity();
String content = EntityUtils.toString(entity1);
System.out.println(content);
} finally {
response1.close();
}它的结尾是:
</table>
<p><br>更新:这个工程和技术学院没有良好的主页。此内容已完成,您的代码运行良好。但是评论员是对的,你应该使用尝试/捕捉/最后块。
发布于 2014-03-22 08:14:27
每当通过Java连接到网站时,我都会使用此代码
import java.io.*;
import java.net.*;
public class Connection
{
public static void main(String[] args) throws Exception
{
new Connection();
}
public Connection() throws Exception
{
URL url = new URL("http://www.fetagracollege.org"); //The URL
HttpURLConnection huc = connect(url); //Connects to the website
huc.connect(); //Opens the connection
String str = readBody(huc); //Reads the response
huc.disconnect(); //Closes
System.out.println(str); //Prints all output to the console
}
private String readBody(HttpURLConnection huc) throws Exception //Reads the response
{
InputStream is = huc.getInputStream(); //Inputstream
BufferedReader rd = new BufferedReader(new InputStreamReader(is)); //BufferedReader
String line;
StringBuffer response = new StringBuffer();
while ((line = rd.readLine()) != null)
{
response.append(line); //Append the line
response.append('\n'); //and a new line
}
rd.close();
return response.toString();
}
private HttpURLConnection connect(URL url) throws Exception //Connect to the URL
{
HttpURLConnection huc = (HttpURLConnection) url.openConnection(); //Opens connection to the website
huc.setReadTimeout(15000); //Read timeout - 15 seconds
huc.setConnectTimeout(15000); //Connecting timeout - 15 seconds
huc.setUseCaches(false); //Don't use cache
HttpURLConnection.setFollowRedirects(true); //Follow redirects if there are any
huc.addRequestProperty("Host", "www.fetagracollege.org"); //www.fetagracollege.org is the host
huc.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"); //Chrome user agent
return huc;
}
}这个网站就是这样结束的,所以我认为问题是服务器端的,因为其他网站使用这个代码(用推特和谷歌测试):
</font>© fetaca 2011 </td>
</tr>
</table>
<p><br>发布于 2015-06-17 11:32:08
对于从URL读取内容,可以使用jsoup,然后可以使用文件处理概念(OutputStream out =.)来创建内容,这样就可以使用jsoup读取内容:
String url = "URL"; // getting URL
Document doc = Jsoup.connect(url).get(); // getting content as document type
String line = input.toString(); // getting contents as String type现在,在字符串中包含内容之后,您可以轻松地将其刷新到文件中。
为了这个-你需要罐子。-导入3(3)类:导入org.jsoup.Jsoup;导入org.jsoup.nodes.Document;导入org.jsoup.select.Elements;
https://stackoverflow.com/questions/22574764
复制相似问题