前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Hive源码系列(九)编译模块之语义解析 整体分析

Hive源码系列(九)编译模块之语义解析 整体分析

作者头像
数据仓库践行者
发布2020-04-20 11:49:50
1K0
发布2020-04-20 11:49:50
举报

语义解析主要是把AST Tree转化为QueryBlock,那为什么要转成QueryBlock呢?从之前的分析,我们可以看到AST Tree 还是很抽象,并且也不携带表、字段相关的信息,进行语义解析可以将AST Tree分模块存入QueryBlock 并携带对应的元数据信息,为生成逻辑执行计划做准备

简单串一下语义解析

sql编译器的入口:

代码语言:javascript
复制
      BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(queryState, tree);
      List<HiveSemanticAnalyzerHook> saHooks =
          getHooks(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK,
              HiveSemanticAnalyzerHook.class);

      // Flush the metastore cache.  This assures that we don't pick up objects from a previous
      // query running in this same thread.  This has to be done after we get our semantic
      // analyzer (this is when the connection to the metastore is made) but before we analyze,
      // because at that point we need access to the objects.
      Hive.get().getMSC().flushCache();

      // Do semantic analysis and plan generation
      if (saHooks != null && !saHooks.isEmpty()) { //hive的hook机制,在hook中实现一些方法来对语句做预判
        HiveSemanticAnalyzerHookContext hookCtx = new HiveSemanticAnalyzerHookContextImpl();
        hookCtx.setConf(conf);
        hookCtx.setUserName(userName);
        hookCtx.setIpAddress(SessionState.get().getUserIpAddress());
        hookCtx.setCommand(command);
        for (HiveSemanticAnalyzerHook hook : saHooks) {
          tree = hook.preAnalyze(hookCtx, tree);
        }
        sem.analyze(tree, ctx);
        hookCtx.update(sem);
        for (HiveSemanticAnalyzerHook hook : saHooks) {
          hook.postAnalyze(hookCtx, sem.getAllRootTasks());
        }
      } else {
        sem.analyze(tree, ctx); //直接进入编译
      }

进入sql编译之前,先判断是不是设置了hive.semantic.analyzer.hook参数,这个是hive的hook机制,在hook中实现一些方法来对语句做预判,具体做法是实现HiveSemanticAnalyzerHook接口,preAnalyze方法和postAnalyze方法会分别在编译之前和之后执行。 在这里,我们更关心编译模块sem.analyze(tree, ctx)。

sem是由BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(queryState, tree) 获取。这里用到了java设计模式中的工厂模式:

代码语言:javascript
复制
public static BaseSemanticAnalyzer get(QueryState queryState, ASTNode tree)
      throws SemanticException {
    if (tree.getToken() == null) {
      throw new RuntimeException("Empty Syntax Tree");
    } else {
      HiveOperation opType = commandType.get(tree.getType());
      queryState.setCommandType(opType);
      switch (tree.getType()) {
      case HiveParser.TOK_EXPLAIN:
        return new ExplainSemanticAnalyzer(queryState);
      case HiveParser.TOK_EXPLAIN_SQ_REWRITE:
        return new ExplainSQRewriteSemanticAnalyzer(queryState);
      case HiveParser.TOK_LOAD:
        return new LoadSemanticAnalyzer(queryState);
      case HiveParser.TOK_EXPORT:
        return new ExportSemanticAnalyzer(queryState);
      case HiveParser.TOK_IMPORT:
        return new ImportSemanticAnalyzer(queryState);
      case HiveParser.TOK_ALTERTABLE: {
        Tree child = tree.getChild(1);
        switch (child.getType()) {
          case HiveParser.TOK_ALTERTABLE_RENAME:
          case HiveParser.TOK_ALTERTABLE_TOUCH:
          case HiveParser.TOK_ALTERTABLE_ARCHIVE:
          case HiveParser.TOK_ALTERTABLE_UNARCHIVE:
          case HiveParser.TOK_ALTERTABLE_ADDCOLS:
          case HiveParser.TOK_ALTERTABLE_RENAMECOL:
          case HiveParser.TOK_ALTERTABLE_REPLACECOLS:
          case HiveParser.TOK_ALTERTABLE_DROPPARTS:
          case HiveParser.TOK_ALTERTABLE_ADDPARTS:
          case HiveParser.TOK_ALTERTABLE_PARTCOLTYPE:
          case HiveParser.TOK_ALTERTABLE_PROPERTIES:
          case HiveParser.TOK_ALTERTABLE_DROPPROPERTIES:
          case HiveParser.TOK_ALTERTABLE_EXCHANGEPARTITION:
          case HiveParser.TOK_ALTERTABLE_SKEWED:
          case HiveParser.TOK_ALTERTABLE_DROPCONSTRAINT:
          case HiveParser.TOK_ALTERTABLE_ADDCONSTRAINT:
          queryState.setCommandType(commandType.get(child.getType()));
          return new DDLSemanticAnalyzer(queryState);
        }
        opType =
            tablePartitionCommandType.get(child.getType())[tree.getChildCount() > 2 ? 1 : 0];
        queryState.setCommandType(opType);
        return new DDLSemanticAnalyzer(queryState);
      }
      case HiveParser.TOK_ALTERVIEW: {
        Tree child = tree.getChild(1);
        switch (child.getType()) {
          case HiveParser.TOK_ALTERVIEW_PROPERTIES:
          case HiveParser.TOK_ALTERVIEW_DROPPROPERTIES:
          case HiveParser.TOK_ALTERVIEW_ADDPARTS:
          case HiveParser.TOK_ALTERVIEW_DROPPARTS:
          case HiveParser.TOK_ALTERVIEW_RENAME:
            opType = commandType.get(child.getType());
            queryState.setCommandType(opType);
            return new DDLSemanticAnalyzer(queryState);
        }
        // TOK_ALTERVIEW_AS
        assert child.getType() == HiveParser.TOK_QUERY;
        queryState.setCommandType(HiveOperation.ALTERVIEW_AS);
        return new SemanticAnalyzer(queryState);
      }
      case HiveParser.TOK_CREATEDATABASE:
      case HiveParser.TOK_DROPDATABASE:
      case HiveParser.TOK_SWITCHDATABASE:
      case HiveParser.TOK_DROPTABLE:
      case HiveParser.TOK_DROPVIEW:
      case HiveParser.TOK_DESCDATABASE:
      case HiveParser.TOK_DESCTABLE:
      case HiveParser.TOK_DESCFUNCTION:
      case HiveParser.TOK_MSCK:
      case HiveParser.TOK_ALTERINDEX_REBUILD:
      case HiveParser.TOK_ALTERINDEX_PROPERTIES:
      case HiveParser.TOK_SHOWDATABASES:
      case HiveParser.TOK_SHOWTABLES:
      case HiveParser.TOK_SHOWCOLUMNS:
      case HiveParser.TOK_SHOW_TABLESTATUS:
      case HiveParser.TOK_SHOW_TBLPROPERTIES:
      case HiveParser.TOK_SHOW_CREATEDATABASE:
      case HiveParser.TOK_SHOW_CREATETABLE:
      case HiveParser.TOK_SHOWFUNCTIONS:
      case HiveParser.TOK_SHOWPARTITIONS:
      case HiveParser.TOK_SHOWINDEXES:
      case HiveParser.TOK_SHOWLOCKS:
      case HiveParser.TOK_SHOWDBLOCKS:
      case HiveParser.TOK_SHOW_COMPACTIONS:
      case HiveParser.TOK_SHOW_TRANSACTIONS:
      case HiveParser.TOK_ABORT_TRANSACTIONS:
      case HiveParser.TOK_SHOWCONF:
      case HiveParser.TOK_CREATEINDEX:
      case HiveParser.TOK_DROPINDEX:
      case HiveParser.TOK_ALTERTABLE_CLUSTER_SORT:
      case HiveParser.TOK_LOCKTABLE:
      case HiveParser.TOK_UNLOCKTABLE:
      case HiveParser.TOK_LOCKDB:
      case HiveParser.TOK_UNLOCKDB:
      case HiveParser.TOK_CREATEROLE:
      case HiveParser.TOK_DROPROLE:
      case HiveParser.TOK_GRANT:
      case HiveParser.TOK_REVOKE:
      case HiveParser.TOK_SHOW_GRANT:
      case HiveParser.TOK_GRANT_ROLE:
      case HiveParser.TOK_REVOKE_ROLE:
      case HiveParser.TOK_SHOW_ROLE_GRANT:
      case HiveParser.TOK_SHOW_ROLE_PRINCIPALS:
      case HiveParser.TOK_SHOW_ROLES:
      case HiveParser.TOK_ALTERDATABASE_PROPERTIES:
      case HiveParser.TOK_ALTERDATABASE_OWNER:
      case HiveParser.TOK_TRUNCATETABLE:
      case HiveParser.TOK_SHOW_SET_ROLE:
      case HiveParser.TOK_CACHE_METADATA:
        return new DDLSemanticAnalyzer(queryState);

      case HiveParser.TOK_CREATEFUNCTION:
      case HiveParser.TOK_DROPFUNCTION:
      case HiveParser.TOK_RELOADFUNCTION:
        return new FunctionSemanticAnalyzer(queryState);

      case HiveParser.TOK_ANALYZE:
        return new ColumnStatsSemanticAnalyzer(queryState);

      case HiveParser.TOK_CREATEMACRO:
      case HiveParser.TOK_DROPMACRO:
        return new MacroSemanticAnalyzer(queryState);

      case HiveParser.TOK_UPDATE_TABLE:
      case HiveParser.TOK_DELETE_FROM:
        return new UpdateDeleteSemanticAnalyzer(queryState);

      case HiveParser.TOK_START_TRANSACTION:
      case HiveParser.TOK_COMMIT:
      case HiveParser.TOK_ROLLBACK:
      case HiveParser.TOK_SET_AUTOCOMMIT:
      default: {
        SemanticAnalyzer semAnalyzer = HiveConf //判断CBO是否为true,如果为true,走CalcitePlanner,否则SemanticAnalyzer
            .getBoolVar(queryState.getConf(), HiveConf.ConfVars.HIVE_CBO_ENABLED) ?
                new CalcitePlanner(queryState) : new SemanticAnalyzer(queryState);
          return semAnalyzer;
      }
      }
    }
  }

针对不同功能的sql,hive有多种编译方式,比如:explain走ExplainSemanticAnalyzer,DDL走DDLSemanticAnalyzer,load走LoadSemanticAnalyzer等等,工厂模式可以使这些不同的功能隔离开,在一定程度上解耦,也增加了可扩展性,比如某天需要再添加个import数据的编译过程,开发个ImportSemanticAnalyzer类 在SemanticAnalyzerFactory工厂里注册一下就ok了。

然而,我们更多的是使用query,这次的源码分析也是围绕query展开,因此,我们就进入了default 选项,然后就会发现,有个判断,判断hive.cbo.enable是否为true,如果为 true就走 CalcitePlanner 类,否则走 SemanticAnalyzer 类。 CBO是基于代价的优化方式,功能很强大,hive2.x在CBO方面也下了很大的功夫,这个优化默认是开启的,我们在究研源码的时候,先关闭掉CBO,后面会专门再来讨论CBO。

hql的编译主要就是 SemanticAnalyzer.analyzeInternal 方法:

代码语言:javascript
复制
void analyzeInternal(ASTNode ast, PlannerContext plannerCtx) throws SemanticException {
    // 1. Generate Resolved Parse tree from syntax tree
    LOG.info("Starting Semantic Analysis");
    if (!genResolvedParseTree(ast, plannerCtx)) { //语义解析
      return;
    }

    // 2. Gen OP Tree from resolved Parse Tree
    Operator sinkOp = genOPTree(ast, plannerCtx); //生成逻辑执行计划

   ...
   ...

    // 7. Perform Logical optimization
    if (LOG.isDebugEnabled()) {
      LOG.debug("Before logical optimization\n" + Operator.toString(pCtx.getTopOps().values()));
    }
    Optimizer optm = new Optimizer();
    optm.setPctx(pCtx);
    optm.initialize(conf);
    pCtx = optm.optimize(); //优化逻辑执行计划
    if (pCtx.getColumnAccessInfo() != null) {
      // set ColumnAccessInfo for view column authorization
      setColumnAccessInfo(pCtx.getColumnAccessInfo());
    }
    FetchTask origFetchTask = pCtx.getFetchTask();
    if (LOG.isDebugEnabled()) {
      LOG.debug("After logical optimization\n" + Operator.toString(pCtx.getTopOps().values()));
    }

    // 8. Generate column access stats if required - wait until column pruning
    // takes place during optimization
    boolean isColumnInfoNeedForAuth = SessionState.get().isAuthorizationModeV2()
        && HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVE_AUTHORIZATION_ENABLED);
    if (isColumnInfoNeedForAuth
        || HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
      ColumnAccessAnalyzer columnAccessAnalyzer = new ColumnAccessAnalyzer(pCtx);
      // view column access info is carried by this.getColumnAccessInfo().
      setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess(this.getColumnAccessInfo()));
    }

    // 9. Optimize Physical op tree & Translate to target execution engine (MR,
    // TEZ..)
    if (!ctx.getExplainLogical()) {
      TaskCompiler compiler = TaskCompilerFactory.getCompiler(conf, pCtx);
      compiler.init(queryState, console, db);
      compiler.compile(pCtx, rootTasks, inputs, outputs); //生成物理执行计划及优化
      fetchTask = pCtx.getFetchTask();
    }
    LOG.info("Completed plan generation");

    // 10. put accessed columns to readEntity
    if (HiveConf.getBoolVar(this.conf, HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
      putAccessedColumnsToReadEntity(inputs, columnAccessInfo);
    }

    // 11. if desired check we're not going over partition scan limits
    if (!ctx.getExplain()) {
      enforceScanLimits(pCtx, origFetchTask);
    }

    return;
  }

语义解析是代码中的第一步genResolvedParseTree 方法

代码语言:javascript
复制
boolean genResolvedParseTree(ASTNode ast, PlannerContext plannerCtx) throws SemanticException {
    ASTNode child = ast;
    this.ast = ast;
    viewsExpanded = new ArrayList<String>();
    ctesExpanded = new ArrayList<String>();

    .....
    
    // 4. continue analyzing from the child ASTNode.
    Phase1Ctx ctx_1 = initPhase1Ctx();
    preProcessForInsert(child, qb);
    if (!doPhase1(child, qb, ctx_1, plannerCtx)) { //核心1
      // if phase1Result false return
      return false;
    }
    LOG.info("Completed phase 1 of Semantic Analysis");

    // 5. Resolve Parse Tree
    // Materialization is allowed if it is not a view definition
    getMetaData(qb, createVwDesc == null); ////核心2
    LOG.info("Completed getting MetaData in Semantic Analysis");

    plannerCtx.setParseTreeAttr(child, ctx_1);

    return true;
  }

genResolvedParseTree方法比较长,其中最核心的就是doPhase1和getMetaData。

doPhase1负责把 ASTTree 大卸八块存入对应的QB,getMetaData 负责把表、字段等元数据信息存入QB。

当这一切都执行完了之后,语义解析模块就结束了。

总结一下语义解析的代码路径:

Driver.compile

--> SemanticAnalyzerFactory.get(queryState, tree)

-->SemanticAnalyzer.analyzeInternal

-->SemanticAnalyzer.genResolvedParseTree

-->SemanticAnalyzer.doPhase1

-->SemanticAnalyzer.getMetaData `

路径这么描述也不是很准确,重要的是能明白就好

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2019-08-09,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 数据仓库践行者 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 简单串一下语义解析
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档