首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Zend_string与写时复制

Zend_string与写时复制

作者头像
程序员小饭
发布2020-09-07 15:30:15
6020
发布2020-09-07 15:30:15
举报
文章被收录于专栏:golang+phpgolang+php

字符串的结构

struct _zend_string {
    zend_refcounted_h gc;  /* 垃圾回收 */
    zend_ulong        h;    /*空间换时间 hash值*/            /* hash value */
    size_t            len;  /*长度 和下面的val可以直接表示字符串*/
    char              val[1];
};
/*c语言字符串用\0来表示   属于非二进制安全  而php中的字符串时二进制安全的  用len和val可以直接表示字符串*/

zend_refcounted_h对应的结构体

typedef struct _zend_refcounted_h {
    uint32_t         refcount;          /* 引用基数  */  /* reference counter 32-bit */
    union {
        struct {
            ZEND_ENDIAN_LOHI_3(
                zend_uchar    type,
                zend_uchar    flags,    /* used for strings & objects */
                uint16_t      gc_info)  /* keeps GC root number (or 0) and color */
        } v;
        uint32_t type_info;
    } u;
} zend_refcounted_h;

下面我们来了解一下具体每个成员的作用:

  • gc:就是_zend_refcounted_h结构体,主要作用是引用计数以及标记变量的类别。
  • h:字符串的哈希值,在字符串被用来当数组的key时才初始化,这样如果同一个字符串被多次用来做key,就不会重复计算了。
  • val:这里的char[1]并不意味着只存储1位,char[1]被称为柔性数组

字符串的二进制安全

学习过C语言的应该知道,字符串中除了最后一个字符外不允许含有\0,否则会被认为是字符串的结束字符,这就导致了C语言的字符串有很多的限制,比如不存储图片、文件等二进制数据。但是PHP就没有这样的限制,它的字符串可以存储二进制数据,并不会出现任何报错,而PHP的这种能力就叫做字符串的二进制安全 c语言代码片段

main() {
    char a[] = "aaa\0b";    /* 含有\0的字符串 */
    printf("%d\n", strlen(a));  /* 长度为3,\0后的b被忽略 */
}

php代码片段

<?php
    $a = "aaa\0b";
    echo strlen($a);    //输出5
?>

但是PHP不是C语言写的吗?为什么PHP不会报错?我们再来回顾一下zend_string结构体,还记得成员变量len吗?它是实现二进制安全的关键,我们不需要像C一样通过\0来判定字符串是否被读取完成,而是通过长度len来判断,这样就保证了字符串的二进制安全。

写时复制: 当b = a 这种操作的时候 a和b是指向同一个zend_val的,内存只有一份,节约了空间,用gc中的refcount+1来标记 我们看代码如下

<?php
$c = "hello world";
echo $c;

$a = time()." string";
echo $a;

//写时复制

$b = $a;
echo $a;
echo $b;

$b = "hello";
echo $a;
echo $b;

c = "hello world"; echo

(gdb) p *z
$1 = {value = {lval = 140737314684512, dval = 6.9533472273566172e-310, counted = 0x7ffff5a5fe60, str = 0x7ffff5a5fe60, arr = 0x7ffff5a5fe60, obj = 0x7ffff5a5fe60,
    res = 0x7ffff5a5fe60, ref = 0x7ffff5a5fe60, ast = 0x7ffff5a5fe60, zv = 0x7ffff5a5fe60, ptr = 0x7ffff5a5fe60, ce = 0x7ffff5a5fe60, func = 0x7ffff5a5fe60, ww = {w1 = 4121296480,
      w2 = 32767}}, u1 = {v = {type = 6 '\006', type_flags = 0 '\000', const_flags = 0 '\000', reserved = 0 '\000'}, type_info = 6}, u2 = {next = 0, cache_slot = 0, lineno = 0,
    num_args = 0, fe_pos = 0, fe_iter_idx = 0, access_flags = 0, property_guard = 0, extra = 0}}
(gdb) p *$1.value.str
$2 = {gc = {refcount = 0, u = {v = {type = 6 '\006', flags = 2 '\002', gc_info = 0}, type_info = 518}}, h = 13876786532495509697, len = 11, val = "h"}
(gdb) p *$1.value.str.val@11
$3 = "hello world"

refcount=0而且flags=2表示一个常量

a = time()." string"; echo

(gdb) p *z
$4 = {value = {lval = 140737314684672, dval = 6.9533472273645223e-310, counted = 0x7ffff5a5ff00, str = 0x7ffff5a5ff00, arr = 0x7ffff5a5ff00, obj = 0x7ffff5a5ff00,
    res = 0x7ffff5a5ff00, ref = 0x7ffff5a5ff00, ast = 0x7ffff5a5ff00, zv = 0x7ffff5a5ff00, ptr = 0x7ffff5a5ff00, ce = 0x7ffff5a5ff00, func = 0x7ffff5a5ff00, ww = {w1 = 4121296640,
      w2 = 32767}}, u1 = {v = {type = 6 '\006', type_flags = 20 '\024', const_flags = 0 '\000', reserved = 0 '\000'}, type_info = 5126}, u2 = {next = 0, cache_slot = 0,
    lineno = 0, num_args = 0, fe_pos = 0, fe_iter_idx = 0, access_flags = 0, property_guard = 0, extra = 0}}
(gdb) p $4.value.str
$5 = (zend_string *) 0x7ffff5a5ff00
(gdb) p *$4.value.str
$6 = {gc = {refcount = 1, u = {v = {type = 6 '\006', flags = 0 '\000', gc_info = 0}, type_info = 6}}, h = 0, len = 17, val = "1"}
(gdb) p *$4.value.str.val@17
$7 = "1580219624 string"
(gdb)

refcount=1而且flags=0表示是一个变量,变量的地址是 (zend_string *) 0x7ffff5a5ff00 接下来 b = a; echo

(gdb)  p *z
$8 = {value = {lval = 140737314684672, dval = 6.9533472273645223e-310, counted = 0x7ffff5a5ff00, str = 0x7ffff5a5ff00, arr = 0x7ffff5a5ff00, obj = 0x7ffff5a5ff00,
    res = 0x7ffff5a5ff00, ref = 0x7ffff5a5ff00, ast = 0x7ffff5a5ff00, zv = 0x7ffff5a5ff00, ptr = 0x7ffff5a5ff00, ce = 0x7ffff5a5ff00, func = 0x7ffff5a5ff00, ww = {w1 = 4121296640,
      w2 = 32767}}, u1 = {v = {type = 6 '\006', type_flags = 20 '\024', const_flags = 0 '\000', reserved = 0 '\000'}, type_info = 5126}, u2 = {next = 0, cache_slot = 0,
    lineno = 0, num_args = 0, fe_pos = 0, fe_iter_idx = 0, access_flags = 0, property_guard = 0, extra = 0}}
(gdb) p $8.value.str
$9 = (zend_string *) 0x7ffff5a5ff00
(gdb) p *$8.value.str
$10 = {gc = {refcount = 2, u = {v = {type = 6 '\006', flags = 0 '\000', gc_info = 0}, type_info = 6}}, h = 0, len = 17, val = "1"}
(gdb) p *$8.value.str.val@17
$11 = "1580219624 string"
(gdb) c
Continuing.
1580219624 string
Breakpoint 1, ZEND_ECHO_SPEC_CV_HANDLER () at /download/php-7.1.9/Zend/zend_vm_execute.h:34699
34699       SAVE_OPLINE();
(gdb) n
34700       z = _get_zval_ptr_cv_undef(execute_data, opline->op1.var);
(gdb) n
34702       if (Z_TYPE_P(z) == IS_STRING) {
(gdb) p *z
$12 = {value = {lval = 140737314684672, dval = 6.9533472273645223e-310, counted = 0x7ffff5a5ff00, str = 0x7ffff5a5ff00, arr = 0x7ffff5a5ff00, obj = 0x7ffff5a5ff00,
    res = 0x7ffff5a5ff00, ref = 0x7ffff5a5ff00, ast = 0x7ffff5a5ff00, zv = 0x7ffff5a5ff00, ptr = 0x7ffff5a5ff00, ce = 0x7ffff5a5ff00, func = 0x7ffff5a5ff00, ww = {w1 = 4121296640,
      w2 = 32767}}, u1 = {v = {type = 6 '\006', type_flags = 20 '\024', const_flags = 0 '\000', reserved = 0 '\000'}, type_info = 5126}, u2 = {next = 0, cache_slot = 0,
    lineno = 0, num_args = 0, fe_pos = 0, fe_iter_idx = 0, access_flags = 0, property_guard = 0, extra = 0}}
(gdb) p $12.value.str
$13 = (zend_string *) 0x7ffff5a5ff00
(gdb) p *$12.value.str
$14 = {gc = {refcount = 2, u = {v = {type = 6 '\006', flags = 0 '\000', gc_info = 0}, type_info = 6}}, h = 0, len = 17, val = "1"}
(gdb) p *$12.value.str.val@17
$15 = "1580219624 string"

我们可以看到 两次输出 refcount=2,flags=0,并且a和b的地址都是 (zend_string *) 0x7ffff5a5ff00 说明a和b共用了一块内存,只是用refcount标记了一下而已 接下来

$b = "hello"; echo $a; echo $b;

(gdb) p *z
$2 = {value = {lval = 140737314684672, dval = 6.9533472273645223e-310, counted = 0x7ffff5a5ff00, str = 0x7ffff5a5ff00, arr = 0x7ffff5a5ff00, obj = 0x7ffff5a5ff00,
    res = 0x7ffff5a5ff00, ref = 0x7ffff5a5ff00, ast = 0x7ffff5a5ff00, zv = 0x7ffff5a5ff00, ptr = 0x7ffff5a5ff00, ce = 0x7ffff5a5ff00, func = 0x7ffff5a5ff00, ww = {w1 = 4121296640,
      w2 = 32767}}, u1 = {v = {type = 6 '\006', type_flags = 20 '\024', const_flags = 0 '\000', reserved = 0 '\000'}, type_info = 5126}, u2 = {next = 0, cache_slot = 0,
    lineno = 0, num_args = 0, fe_pos = 0, fe_iter_idx = 0, access_flags = 0, property_guard = 0, extra = 0}}
(gdb) p *$2.value.str
$3 = {gc = {refcount = 1, u = {v = {type = 6 '\006', flags = 0 '\000', gc_info = 0}, type_info = 6}}, h = 0, len = 17, val = "1"}
(gdb) p *$2.value.str.val@17
$4 = "1580220167 string"

输出$a的时候 refcount已经减为1 地址为 0x7ffff5a5ff00 但是在输出b的时候

(gdb) p *z
$5 = {value = {lval = 140737314302720, dval = 6.9533472084935861e-310, counted = 0x7ffff5a02b00, str = 0x7ffff5a02b00, arr = 0x7ffff5a02b00, obj = 0x7ffff5a02b00,
    res = 0x7ffff5a02b00, ref = 0x7ffff5a02b00, ast = 0x7ffff5a02b00, zv = 0x7ffff5a02b00, ptr = 0x7ffff5a02b00, ce = 0x7ffff5a02b00, func = 0x7ffff5a02b00, ww = {w1 = 4120914688,
      w2 = 32767}}, u1 = {v = {type = 6 '\006', type_flags = 0 '\000', const_flags = 0 '\000', reserved = 0 '\000'}, type_info = 6}, u2 = {next = 0, cache_slot = 0, lineno = 0,
    num_args = 0, fe_pos = 0, fe_iter_idx = 0, access_flags = 0, property_guard = 0, extra = 0}}
(gdb) p *$5.value.str
$6 = {gc = {refcount = 0, u = {v = {type = 6 '\006', flags = 2 '\002', gc_info = 0}, type_info = 518}}, h = 9223372247569412249, len = 5, val = "h"}
(gdb) p *$5.value.str.val@5
$7 = "hello"
(gdb)

refcount=0,而且flags=2,地址也新开辟了一块,为0x7ffff5a02b00

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 字符串的结构
  • 字符串的二进制安全
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档