一、背景说明
近日开发同学说Php调用Java一个接口报错,表现是如果参数比较大,如56K就报错,其它情况下不报错。让其提供相应参数,发现一个参数的长度是81360,对应十六进制是13DD0,通过抓包发现,实际上传到Java这里长度只有3DD0
即上图中第7行,这是什么情况呢,还是从Hessian协议说起,Hessian中字符长度只能是2字节,即单包最多只能传65535字节,如果长度超过65535,则需要封多次包发送,官方说明如下:
string ::= x52 b1 b0 <utf8-data> string
::= S b1 b0 <utf8-data>
::= [x00-x1f] <utf8-data>
::= [x30-x33] b0 <utf8-data>
A 16-bit unicode character string encoded in UTF-8. Strings are encoded in chunks. x53 ('S') represents the final chunk and x52 ('R') represents any non-final chunk. Each chunk has a 16-bit unsigned integer length value.
The length is the number of 16-bit characters, which may be different than the number of bytes.
String chunks may not split surrogate pairs.
分几种情况:
如果长度在0到1f(31)之间,直接附上字符串;
如果32-1023之间又是一种编码,具体编码方式后面会上代码;
1024-65535之间又是另一种编码;
如果长度大于65535,则前面的包都是R开头的包,最后一个包才是S包,表示结束。
举个例子,如果字符串长度为81360,则应该是这样封包:
R,FFFF,<前66535个字符>
S,3DD1,<最末15825个字符>
其中,是不存在的,只是方便阅读作为间隔符。
具体细节可以看官方文档:
http://hessian.caucho.com/doc/hessian-serialization.html#anchor32
二、代码分析&解决
我们看Php代码实现:
function writeString($value){
$len = HessianUtils::stringLength($value);
if($len < 32){
return pack('C', $len)
. $this->writeStringData($value);
} else
if($len < 1024){
$b0 = 0x30 + ($len >> 8);
$stream = pack('C', $b0);
$stream .= pack('C', $len);
return $stream . $this->writeStringData($value);
} else {
$total = $len;
$stream = '';
$tag = 'S';
$stream .= $tag . pack('n', $len);
$stream .= $this->writeStringData($value);
return $stream;
}
}
可以看到最后一个else判断里,并没有判断剩余长度是否大于65535,所以导致上面的问题,修改后的代码如下:
function writeString($value)
{
$len = HessianUtils::stringLength($value);
if ($len < 32) {
return pack('C', $len)
. $this->writeStringData($value);
} else if ($len < 1024) {
$b0 = 0x30 + ($len >> 8);
$stream = pack('C', $b0);
$stream .= pack('C', $len);
return $stream . $this->writeStringData($value);
} else if ($len < 65536) {
$total = $len;
$stream = '';
$tag = 'S';
$stream .= $tag . pack('n', $len);
$stream .= $this->writeStringData($value);
return $stream;
} else {
$left = $len;
$offset = 0;
//数据包分R包和S包
$stream = '';
while ($left > 0) {
if ($left > 65535) {
$tag = 'R';
$stream .= $tag . pack('n', 65535);
$stream .= $this->writeStringData(substr($value, $offset, 65535));
$offset += 65535;
$left -= 65535;
} else {
$tag = 'S';
$stream .= $tag . pack('n', $left);
$stream .= $this->writeStringData(substr($value, $offset, $left));
$left = 0;
}
}
return $stream;
}
}
其实也可以参考Java代码的实现,
com.caucho.hessian.io.Hessian2Output用于Hessian封包,我们
看它的字符串封装:
while (length > 0x8000) {
int sublen = 0x8000;
offset = _offset;
if (SIZE <= offset + 16) {
flushBuffer();
offset = _offset;
}
// chunk can't end in high surrogate
char tail = value.charAt(strOffset + sublen - 1);
if (0xd800 <= tail && tail <= 0xdbff)
sublen--;
buffer[offset + 0] = (byte) BC_STRING_CHUNK;
buffer[offset + 1] = (byte) (sublen >> 8);
buffer[offset + 2] = (byte) (sublen);
_offset = offset + 3;
printString(value, strOffset, sublen);
length -= sublen;
strOffset += sublen;
}
offset = _offset;
if (SIZE <= offset + 16) {
flushBuffer();
offset = _offset;
}
if (length <= STRING_DIRECT_MAX) {
buffer[offset++] = (byte) (BC_STRING_DIRECT + length);
}
else if (length <= STRING_SHORT_MAX) {
buffer[offset++] = (byte) (BC_STRING_SHORT + (length >> 8));
buffer[offset++] = (byte) (length);
}
else {
buffer[offset++] = (byte) ('S');
buffer[offset++] = (byte) (length >> 8);
buffer[offset++] = (byte) (length);
}
_offset = offset;
printString(value, strOffset, length);
可以看到这里是以0x8000即32768作为单次包的最大字节数,如果大于32768就不断地封S包,剩下的才根据协议处理,当然其实现复杂些,引入了缓冲区,这里就不细讨论了。