专栏首页程序员升级之路Skywalking Php注册不上问题排查

Skywalking Php注册不上问题排查

Skywalking是一款分布式追踪应用,具体介绍可以参考 skywalking

最近公司的一个Php应用在Skywalking后台查不到数据了:

登录到某台服务器上发现注册不上,启动时就报错了:

先来整理下Skywalking php的整个流程,php扩展在系统启动时注册应用和实例,然后在每次请求拦截相关调用,将相关调用情况保存下来;注册相关代码在skywalking.c的module_init中:

static void module_init() {

    application_instance = -100000;
    application_id = -100000;

    int i = 0;

    do {
        application_id = serviceRegister(SKYWALKING_G(grpc), SKYWALKING_G(app_code));

        if(application_id == -100000) {
            sleep(1);
        }

        i++;
    } while (application_id == -100000 && i <= 1);

    if (application_id == -100000) {
        sky_close = 1;
        return;
    }

    char *ipv4s = _get_current_machine_ip();

    char hostname[100] = {0};
    if (gethostname(hostname, sizeof(hostname)) < 0) {
        strcpy(hostname, "");
    }

    char *l_millisecond = get_millisecond();
    long millisecond = zend_atol(l_millisecond, strlen(l_millisecond));
    efree(l_millisecond);

    i = 0;
    do {
        application_instance = serviceInstanceRegister(SKYWALKING_G(grpc), application_id, millisecond, SKY_OS_NAME,
                                                       hostname, getpid(),
                                                       ipv4s);
        if(application_instance == -100000) {
            sleep(2);
        }
        i++;
    } while (application_instance == -100000 && i <= 3);


    if (application_instance == -100000) {
        sky_close = 1;
    php_error(E_WARNING, "skywalking: register service error");
        return;
    }
  
    php_error(E_WARNING, "skywalking: register service success");
}

可以看到,注册应用是调用serviceRegister函数注册,然后调用serviceInstanceRegister来注册实例的,后者会调用GreeterClient::serviceInstanceRegister以下函数完成注册:

int serviceInstanceRegister(int applicationid, long registertime, char *osname, char *hostname, int processno,
                                char *ipv4s) {
        ServiceInstances request;
        ServiceInstance *s = request.add_instances();

        if (uuid == NULL) {
            std::string uuid_str = boost::uuids::to_string(boost_uuid);
            uuid = (char *) malloc(uuid_str.size() + 1);
            bzero(uuid, uuid_str.size() + 1);
            strncpy(uuid, uuid_str.c_str(), uuid_str.size() + 1);
        }

        s->set_serviceid(applicationid);
        s->set_instanceuuid(std::string(uuid));
        s->set_time(registertime);

        KeyStringValuePair *os = s->add_properties();
        KeyStringValuePair *host = s->add_properties();
        KeyStringValuePair *process = s->add_properties();
        KeyStringValuePair *ipv4 = s->add_properties();
        KeyStringValuePair *language = s->add_properties();

        os->set_key("os_name");
        os->set_value(osname);
        host->set_key("host_name");
        host->set_value(hostname);
        process->set_key("process_no");
        process->set_value(std::to_string(processno));
        ipv4->set_key("ipv4");
        ipv4->set_value(ipv4s);
        language->set_key("language");
        language->set_value("php");

        ServiceInstanceRegisterMapping reply;

        ClientContext context;

        Status status = stub_->doServiceInstanceRegister(&context, request, &reply);

        if (status.ok()) {
            for (int i = 0; i < reply.serviceinstances_size(); i++) {
                const KeyIntValuePair &kv = reply.serviceinstances(i);
//                std::cout << "Register Instance:"<< std::endl;
//                std::cout << kv.key() << ": " << kv.value() << std::endl;

                if (kv.key() == uuid) {
                    return kv.value();
                }
            }
        }

        return -100000;

    }

通过gdb的断点,发现注册应用是成功的,注册实例失败了,然后在GreeterClient::serviceInstanceRegister加上相应的日志:

if (status.ok()) {
            std::cout << "size:" << reply.serviceinstances_size() << std::endl;
            for (int i = 0; i < reply.serviceinstances_size(); i++) {
                const KeyIntValuePair &kv = reply.serviceinstances(i);
                std::cout << "Register Instance:"<< std::endl;
                std::cout << kv.key() << ": " << kv.value() << std::endl;

                if (kv.key() == uuid) {
                    return kv.value();
                }
            }
        }else{
                printf("instance register error");
        }

这里也把UUID打印出来了,发现响应里每次都是空的,所以导致失败。

客户端已经没有线索了,只好从服务端入手,因为服务端是Java实现的,不大方便调试,因此在本地搭了个环境想调试下,哪知服务端跑起来了,Php客户端死活编译不上,因为Skywalking依赖protobuf、grpc等组件,这些组件之间有版本依赖关系的,官方文档也没有说明,一时陷入困境。

因之前服务端维护的同学走了,只好自己硬着头皮看代码,发现注册入口代码在RegisterServiceHandler::doServiceInstanceRegister中:

@Override 
    public void doServiceInstanceRegister(ServiceInstances request,
        StreamObserver<ServiceInstanceRegisterMapping> responseObserver) {

        ServiceInstanceRegisterMapping.Builder builder = ServiceInstanceRegisterMapping.newBuilder();

        request.getInstancesList().forEach(instance -> {
            ServiceInventory serviceInventory = serviceInventoryCache.get(instance.getServiceId());

            JsonObject instanceProperties = new JsonObject();
            List<String> ipv4s = new ArrayList<>();

            for (KeyStringValuePair property : instance.getPropertiesList()) {
                String key = property.getKey();
                switch (key) {
                    case HOST_NAME:
                        instanceProperties.addProperty(HOST_NAME, property.getValue());
                        break;
                    case OS_NAME:
                        instanceProperties.addProperty(OS_NAME, property.getValue());
                        break;
                    case LANGUAGE:
                        instanceProperties.addProperty(LANGUAGE, property.getValue());
                        break;
                    case "ipv4":
                        ipv4s.add(property.getValue());
                        break;
                    case PROCESS_NO:
                        instanceProperties.addProperty(PROCESS_NO, property.getValue());
                        break;
                }
            }
            instanceProperties.addProperty(IPV4S, ServiceInstanceInventory.PropertyUtil.ipv4sSerialize(ipv4s));

            String instanceName = serviceInventory.getName();
            if (instanceProperties.has(PROCESS_NO)) {
                instanceName += "-pid:" + instanceProperties.get(PROCESS_NO).getAsString();
            }
            if (instanceProperties.has(HOST_NAME)) {
                instanceName += "@" + instanceProperties.get(HOST_NAME).getAsString();
            }

            int serviceInstanceId = serviceInstanceInventoryRegister.getOrCreate(instance.getServiceId(), instanceName, instance.getInstanceUUID(), instance.getTime(), instanceProperties);

            if (serviceInstanceId != Const.NONE) {
                logger.info("register service instance id={} [UUID:{}]", serviceInstanceId, instance.getInstanceUUID());
                builder.addServiceInstances(KeyIntValuePair.newBuilder().setKey(instance.getInstanceUUID()).setValue(serviceInstanceId));
            }
        });

        responseObserver.onNext(builder.build());
        responseObserver.onCompleted();
    }

关键是这行代码来生成实例id的:

int serviceInstanceId = serviceInstanceInventoryRegister.getOrCreate(instance.getServiceId(), instanceName, instance.getInstanceUUID(), instance.getTime(), instanceProperties);

再跟进去:

@Override public int getOrCreate(int serviceId, String serviceInstanceName, String uuid, long registerTime,
        JsonObject properties) {
        if (logger.isDebugEnabled()) {
            logger.debug("Get or create service instance by service instance name, service id: {}, service instance name: {},uuid: {}, registerTime: {}", serviceId, serviceInstanceName, uuid, registerTime);
        }

        int serviceInstanceId = getServiceInstanceInventoryCache().getServiceInstanceId(serviceId, uuid);

        if (serviceInstanceId == Const.NONE) {
            ServiceInstanceInventory serviceInstanceInventory = new ServiceInstanceInventory();
            serviceInstanceInventory.setServiceId(serviceId);
            serviceInstanceInventory.setName(serviceInstanceName);
            serviceInstanceInventory.setInstanceUUID(uuid);
            serviceInstanceInventory.setIsAddress(BooleanUtils.FALSE);
            serviceInstanceInventory.setAddressId(Const.NONE);

            serviceInstanceInventory.setRegisterTime(registerTime);
            serviceInstanceInventory.setHeartbeatTime(registerTime);

            serviceInstanceInventory.setProperties(properties);

            InventoryStreamProcessor.getInstance().in(serviceInstanceInventory);
        }
        return serviceInstanceId;
    }

这里的逻辑就比较清晰了,先从缓存中拿实例ID:

getServiceInstanceInventoryCache().getServiceInstanceId(serviceId, uuid);

拿不到则加入后台任务处理生成ID。

再跟进getServiceInstanceId方法,

 if (Objects.isNull(serviceInstanceId) || serviceInstanceId == Const.NONE) {
            serviceInstanceId = getCacheDAO().getServiceInstanceId(serviceId, uuid);
            if (serviceId != Const.NONE) {
                serviceInstanceNameCache.put(ServiceInstanceInventory.buildId(serviceId, uuid), serviceInstanceId);
            }
        }

从缓存中拿不到则从DAO中拿,

GetResponse response = getClient().get(ServiceInstanceInventory.INDEX_NAME, id);
            if (response.isExists()) {
                return (int)response.getSource().getOrDefault(RegisterSource.SEQUENCE, 0);
            } else {
                return Const.NONE;
            }

后者从ES索引service_instance_inventory去拿。

为了证实上述逻辑无误,从ES中读取数据试下,果然实例ID都注册在ES里面:

再从客户端证实下,既然实例ID是写入ES的,那么用以前的ID肯定是能注册成功的,因此修改客户端代码,将UUID写死注册试下:

 int serviceInstanceRegister(int applicationid, long registertime, char *osname, char *hostname, int processno,
                                char *ipv4s) {
        ServiceInstances request;
        ServiceInstance *s = request.add_instances();
        uuid= "7e22c317-e2e2-4f81-a53d-fe011013e0a3";
        if (uuid == NULL) {
            std::string uuid_str = boost::uuids::to_string(boost_uuid);
            uuid = (char *) malloc(uuid_str.size() + 1);
            bzero(uuid, uuid_str.size() + 1);
            strncpy(uuid, uuid_str.c_str(), uuid_str.size() + 1);
        }

马上注册成功了:

7e22c317-e2e2-4f81-a53d-fe011013e0a3
size:1
Register Instance:
7e22c317-e2e2-4f81-a53d-fe011013e0a3: 3386041
PHP Warning:  skywalking: register service success in Unknown on line 0
PHP Warning:  skywalking: hook redis handler success in Unknown on line 0
PHP Warning:  skywalking: hook session handler success in Unknown on line 0

再回到这个问题,原因已经知道了,如何解决呢,有两个办法:

1、加大注册时等待时间,如等待到100秒;

2、记录最近一次注册成功的UUID并且持久化,下次启动时直接用上次的;

因为2涉及到改代码,因此先用方案1解决问题。

本文分享自微信公众号 - 程序员升级之路(gh_1fab42db66cb),作者:刘江华

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2020-09-19

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • Skywalking Php注册不上问题排查

    Skywalking是一款分布式追踪应用,具体介绍可以参考 skywalking。

    心平气和
  • FastDFS不同步怎么破

    FastDFS是一款开源的分布式文件系统,具体介绍就不说了,有兴趣的可以自行百度下。

    心平气和
  • Raft算法之选举篇

    Follower(跟随者):系统启动时默认的角色,一般来说不参与客户端读、写请求,接受Leader发送过来的心跳追加日志,在Leader挂了之后转变为Candi...

    心平气和
  • Skywalking Php注册不上问题排查

    Skywalking是一款分布式追踪应用,具体介绍可以参考 skywalking。

    心平气和
  • 小甲鱼《零基础学习Python》课后笔记(七、八):了不起的分支和循环1

    assert这个关键字我们称之为“断言”,当这个关键字后边的条件为假的时候,程序自动崩溃并抛出AssertionError的异常。

    小火柴棒
  • Android EditText长按菜单中分享功能的隐藏方法

    像华为/oppo等手机,该菜单实际是谷歌系统的即没有改过源代码,像小米的菜单则是自定义,该部分的源代码改动过。 两方面修改:

    砸漏
  • python实现单工、半双工、全双工聊天室

    半双工实现是连接建立以后,服务器等待客户端发送消息,客户端发送消息后等待接收服务器,这样一来一回循环往复下去。直到出现quit,关闭连接。

    Ewdager
  • python入门(三)判断语句

    python中的常用判断语句if....elif....else,while if if的用法:

    py3study
  • PHP PDOStatement::getAttribute讲解

    PDOStatement::getAttribute — 检索一个语句属性(PHP 5 = 5.1.0, PECL pdo = 0.2.0)

    砸漏
  • ed命令

    ed命令是文本编辑器,用于文本编辑,ed是Linux中功能最简单的文本编辑程序,一次仅能编辑一行而非全屏幕方式的操作。ed命令并不是一个常用的命令,一般使用比较...

    WindrunnerMax

扫码关注云+社区

领取腾讯云代金券