首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >试图解析分块传输编码,但它不起作用,我解码的文件完全不可读

试图解析分块传输编码,但它不起作用,我解码的文件完全不可读
EN

Stack Overflow用户
提问于 2021-03-31 14:12:39
回答 1查看 816关注 0票数 3

我试图解析Rest中块传输编码生成的数据,当我试图用字符串打印值时,我确实看到了数据有值,我认为它应该是工作的,但是当我试图将值赋值给文件时,文件是完全不可读的,下面的代码我使用了boost库,我将在代码中详细说明我的想法,我们将从代码的响应部分开始,我不知道我做错了什么。

代码语言:javascript
运行
复制
   // Send the request.
    boost::asio::write(socket, request);

    // Read the response status line. The response streambuf will automatically
    // grow to accommodate the entire line. The growth may be limited by passing
    // a maximum size to the streambuf constructor.
    boost::asio::streambuf response;
    boost::asio::read_until(socket, response, "\r\n");

    // Check that response is OK.
    std::istream response_stream(&response);
    std::string http_version;
    response_stream >> http_version;
    unsigned int status_code;
    response_stream >> status_code;
    std::string status_message;
    std::getline(response_stream, status_message);
    if (!response_stream || http_version.substr(0, 5) != "HTTP/")
    {
        //std::cout << "Invalid response\n";
        return 9002;
         
    }
    if (status_code != 200)
    {
        //std::cout << "Response returned with status code " << status_code << "\n";
        return 9003;
    }
    
    // Read the response headers, which are terminated by a blank line.
    boost::asio::read_until(socket, response, "\r\n\r\n");

    // Process the response headers.
    //this portion of code I tried to parse the file name in the header of response which the file name is in the  content-disposition of header
    std::string header;
    std::string fullHeader = "";
    string zipfilename="", txtfilename="";
    bool foundfilename = false;
    while (std::getline(response_stream, header) && header != "\r")
    {
        fullHeader.append(header).append("\n");
        std::transform(header.begin(), header.end(), header.begin(),
            [](unsigned char c){ return std::tolower(c); });
        string containstr = "content-disposition";
        string containstr2 = "filename";
        string quotestr = "\"";
        if (header.find(containstr) != std::string::npos && header.find(containstr2) != std::string::npos)
        {
            int countquotes = 0;
            bool foundquote = true;
            
            std::size_t startpos = 0, beginpos, endpos;
            while (foundquote)
            {
                
                std::size_t myfound = header.find(quotestr, startpos);
                if (myfound != std::string::npos)
                {
                    if (countquotes % 2 == 0)
                        beginpos = myfound;
                    else
                    {
                        endpos = myfound;
                        foundfilename = true;
                    }

                    startpos = myfound + 1;
                    
                }
                else
                   foundquote = false;

                countquotes++;
            }

            if (endpos > beginpos && foundfilename)
            {
                size_t zipfileleng = endpos - beginpos;
                zipfilename = header.substr(beginpos+1, zipfileleng-1);
                txtfilename = header.substr(beginpos+1, zipfileleng-5);
            }
            else
                return 9004;

        }
    }

    if (foundfilename == false || zipfilename.length() == 0 || txtfilename.length() == 0)
        return 9005;

     //when the zipfilename has been found, we gonna get the data from the body of response, due to the response was  chunked transfer encoding, I tried to parse it,it's not complicated due to I saw it on the Wikipedia, it just first line was length of data,the next line was data,and it's the loop which over and over again ,all I tried to do was spliting all the data from the body of response by "\r\n" into a vector<string>, and I gonna read the data from that vector

      // Write whatever content we already have to output.
    std::string fullResponse = "";
    if (response.size() > 0)
    {
        std::stringstream ss;
        ss << &response;
        fullResponse = ss.str();
     
    
    }
    //tried split the entire body of response into a vector<string>

     vector<string> allresponsedata;
    split_regex(allresponsedata, fullResponse, boost::regex("(\r\n)+"));
    
    //tried to merge the data of response
    string zipfiledata;
    int myindex = 0;
    for (auto &x : allresponsedata) {
        std::cout << "Split: " << x << std::endl;// I tried to print the data, I did see the value in the variable of x

        if (myindex % 2 != 0)
        {
            zipfiledata = zipfiledata + x;//tried to accumulate the datas
        }


        myindex++;
    }
    
    //tried to write the data into a file
    std::ofstream zipfilestream(zipfilename, ios::out | ios::binary);
    zipfilestream.write(zipfiledata.c_str(), zipfiledata.length());
    zipfilestream.close();

    //afterward, the zipfile was built, but it's unreadable which it's not able to open,the zip utlities software says it's a damaged zip file though

我甚至尝试过类似于这个slow http client based on boost::asio - (Chunked Transfer)的其他方法,但是这种方式并不是那么有效,VS说

代码语言:javascript
运行
复制
  1 IntelliSense: no instance of overloaded function "boost::asio::read" matches the argument list
        argument types are: (boost::asio::ip::tcp::socket, boost::asio::streambuf, boost::asio::detail::transfer_exactly_t, std::error_code)    

它只是无法在以下行中编译:

代码语言:javascript
运行
复制
size_t n = asio::read(socket, response, asio::transfer_exactly(chunk_bytes_to_read), error);

即使我读过asio::transfer_exactly的例子,在https://www.boost.org/doc/libs/1_57_0/doc/html/boost_asio/reference/transfer_exactly.html中也没有这样的例子

知道吗?

EN

Stack Overflow用户

回答已采纳

发布于 2021-03-31 16:11:45

我看你看不出正确的格式:https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Format

在积累完整的响应体之前,您需要读取块长度(十六进制)和任何可选的块扩展。

这需要在前面完成,因为您拆分的序列\r\n可以很容易地出现在块数据中。

再一次,我建议只使用Beast的支持,使其变得简单

代码语言:javascript
运行
复制
 http::response<http::string_body> response;
 boost::asio::streambuf buf;
 http::read(socket, buf, response);

并且您将对标题进行完整的解析和解释(包括Trailer头!)以及response.body()中的内容作为std::string

即使服务器不使用分块编码或与不同的编码选项组合,它也会做正确的事情。

根本没有理由重新发明轮子。

全演示

这演示了来自https://jigsaw.w3.org/HTTP/的分块编码测试url。

代码语言:javascript
运行
复制
#include <boost/process.hpp>
#include <boost/beast.hpp>
#include <iostream>
namespace http = boost::beast::http;
using boost::asio::ip::tcp;

int main() {
    http::response<http::string_body> response;

    boost::asio::io_context ctx;
    tcp::socket socket(ctx);

    connect(socket, tcp::resolver{ctx}.resolve("jigsaw.w3.org", "http"));

    http::write(
            socket,
            http::request<http::empty_body>(
                http::verb::get, "/HTTP/ChunkedScript", 11));

    boost::asio::streambuf buf;
    http::read(socket, buf, response);

    std::cout << response.body() << "\n";
    std::cout << "Effective headers are:" << response.base() << "\n";
}

打印

代码语言:javascript
运行
复制
This output will be chunked encoded by the server, if your client is HTTP/1.1
Below this line, is 1000 repeated lines of 0-9.
-------------------------------------------------------------------------
01234567890123456789012345678901234567890123456789012345678901234567890
01234567890123456789012345678901234567890123456789012345678901234567890
...996 lines removed ...
01234567890123456789012345678901234567890123456789012345678901234567890
01234567890123456789012345678901234567890123456789012345678901234567890

Effective headers are:HTTP/1.1 200 OK
cache-control: max-age=0
date: Wed, 31 Mar 2021 20:09:50 GMT
transfer-encoding: chunked
content-type: text/plain
etag: "1j3k6u8:tikt981g"
expires: Wed, 31 Mar 2021 20:09:49 GMT
last-modified: Mon, 18 Mar 2002 14:28:02 GMT
server: Jigsaw/2.3.0-beta3
票数 3
EN
查看全部 1 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66889515

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档