blocks|key|3682590|text|您不能指定实际下载了多少数据。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|3682591|为您的请求提供服务的and服务器将打开请求的文件，并通过tcp连接发送整个内容(前面有http响应头)。|3682592|这意味着整个文件将被发送给您，除了在正确的时间关闭底层连接之外，您不能对它做任何事情，但是这样做并不容易，特别是不能可靠地工作。这意味着:从输入流中读取5760字节(此时，输入流包含的字节已经超过了那5760字节！)然后关闭流和连接-但这并不意味着在此期间收到了更多的数据|3682593|要想知道你实际收到了多少，你必须完整地阅读你的输入流，并检查它的长度。|3682594|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|J|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|K|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|L|8|@]|9|@]|A|$]]|$1|F|3|G|5|6|7|M|8|@]|9|@]|A|$]]|$1|H|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|I|$]]

You cant specify how much data will actually be downloaded.

The webserver that serves your request will open the requested file and send the whole content (preceded by the http response headers) through the tcp connection.

That means that the whole file will be sent to you and you can't do anything about it except to close the underlying connection at just the right time, but that won't be easy to do and especially not work reliably. That means: you read the 5760 bytes from the inputstream (which, at this point, already contains more than those 5760 bytes!) and then close the stream and the connection - but that doesnt meant a whole lot more data was received in the meantime

To find out how much you actually received, you have to read your inputstream completely and check it's length.

blocks|key|3979725|text|Java首先下载整个GZIP文件，然后解压缩它，还是只下载必要数量的数据来填充byte5760缓冲区？
|type|blockquote|depth|inlineStyleRanges|entityRanges|data|3979726|unstyled|3979727|更接近后者。Java不首先读取整个文件。相反，url.openStream()为您提供了一个直接从套接字读取数据的“套接字流”。|offset|length|style|CODE|3979728|在内核端套接字数据结构中可能有一些数据缓冲，在GZIPInputStream中可能更多。但这绝对是一个有限的数目。因此，服务器发送的数据可能会超过应用程序实际消耗的数据，但不太可能发送整个(兆字节大小)文件。|3979729|如何才能找到从HTTP服务器实际下载了多少数据？
|3979730|3979731|它很难衡量，甚至很难界定。根据上下文，您似乎真正感兴趣的是服务器发送了多少。唯一实用的测量方法是在服务器端，甚至这是困难的。(如果你真的不需要发现这一点，我建议你不要费心去尝试.)|3979732|entityMap^0|0|0|N|G|0|N|F|0|0|0|0^^$0|@$1|2|3|4|5|6|7|S|8|@]|9|@]|A|$]]|$1|B|3|-4|5|C|7|T|8|@]|9|@]|A|$]]|$1|D|3|E|5|C|7|U|8|@$F|V|G|W|H|I]]|9|@]|A|$]]|$1|J|3|K|5|C|7|X|8|@$F|Y|G|Z|H|I]]|9|@]|A|$]]|$1|L|3|M|5|6|7|10|8|@]|9|@]|A|$]]|$1|N|3|-4|5|C|7|11|8|@]|9|@]|A|$]]|$1|O|3|P|5|C|7|12|8|@]|9|@]|A|$]]|$1|Q|3|-4|5|C|7|13|8|@]|9|@]|A|$]]]|R|$]]

<blockquote>
 Does Java first download the whole GZIP file and then decompress it, or does it download just the necessary amount of data to fill the byte[5760] buffer? 
</blockquote>

It is closer to that latter. Java does not read the entire file first. Instead, <code>url.openStream()</code> gives you a "socket stream" that reads data directly from the socket.

There is likely to be some data buffered in the kernel-side socket data structures, and possibly more in the <code>GZIPInputStream</code>. But it is definitely a bounded amount. So it is likely, that the server will send more data than your application actually consumes, but it is unlikely that it will send entire (megabyte-sized) files.

<blockquote>
 How can I find how much data was actually downloaded from the HTTP server?
</blockquote>

It is difficult to measure, and indeed even difficult to define. Based on the context, it seems that you are really interested in how much the server sends. The only practical way to measure that is on the server side, and even that is difficult. (If you don't really need to find this out, I recommend that you don't bother trying ...)

blocks|key|3979784|text|如果web服务器支持字节范围的请求，那么您可以告诉它仅下载第一个(例如)10+of的压缩数据(以确保您在解压缩数据时至少获得5760字节)。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|3979785|URL+url+=+new+URL("http://example.com/file123.gz");
URLConnection+connection+=+url.openConnection();
connection.setRequestProperty("Range",+"bytes=0-9999");
DataInputStream+ds+=+new+DataInputStream(
+++++++++++++++++++++++++new+GZIPInputStream(connection.getInputStream()));
byte[]+header+=+new+byte[5760];
ds.readFully(header);|code-block|syntax|javascript|3979786|您可能需要捕捉在此过程中抛出的任何异常，然后在不带范围头的情况下重试(尽管不理解的服务器应该只发送整个文件)。|3979787|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

If the web server supports byte-range requests then you may be able to tell it to download just the first (say) 10kB of compressed data (to ensure you get at least 5760 bytes when you decompress it)

<pre><code>URL url = new URL("http://example.com/file123.gz");
URLConnection connection = url.openConnection();
connection.setRequestProperty("Range", "bytes=0-9999");
DataInputStream ds = new DataInputStream(
 new GZIPInputStream(connection.getInputStream()));
byte[] header = new byte[5760];
ds.readFully(header);
</code></pre>

You may need to catch any exceptions thrown in this process and retry without the range header (though a server that doesn't understand it ought to just send the whole file anyway).

I have a set of thousands of GZIP files which I'm accessing through HTTP. Each file may be up to few hundreds of MB in size. I need to read first few kilobytes (header) from a file inside these compressed files.

This is my current approach:

<pre><code>URL url = new URL("http://example.com/file123.gz");
DataInputStream ds = new DataInputStream(new GZIPInputStream(url.openStream()));
byte[] header = new byte[5760];
ds.readFully(header);
</code></pre>

What I need to do is to download first 5760 bytes from the file inside this GZIP file, but I do not want Java to download the whole file (which is usually more than few MB).

My question is - does Java first download the whole GZIP file and then decompress it, or does it download just the necessary amount of data to fill the <code>byte[5760]</code> buffer? How can I find how much data was actually downloaded from the HTTP server?

GZIPInputStream: Read first n bytes from decompressed file

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

我有一组GZIP文件，我正在通过HTTP访问这些文件。每个文件的大小可能高达几百MB。我需要从这些压缩文件中的文件中读取头几千字节(头)。这是我目前的做法：URL url = new URL("http://example.com/file123.gz");DataInputStream ds = new DataIn...

问GZIPInputStream:从解压缩文件中读取n个字节
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问GZIPInputStream:从解压缩文件中读取n个字节EN