目标:在给定的Clickhouse语句中添加一个额外的WHERE子句。
我使用以下Antlr语法为lexer和解析器生成Java类。
词汇语法
https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseLexer.g4
解析语法
https://github.com/ClickHouse/ClickHouse/blob/master/utils/antlr/ClickHouseParser.g4
问题:我无法理解/理解如何交互或创建适当的POJO,以便与Antlr生成的生成类一起使用。
语句示例
String query = "INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')"SQL的目标(充实代码)
String enrichedQuery = SqlParser.enrich(query);
System.out.println(enrichedQuery);
//Output
>> INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def') (WHERE X IN USERS)我有下面的Java
public class Hello {
public static void main( String[] args) throws Exception{
String query = "INSERT INTO t VALUES (1, 'Hello, world'), (2, 'abc'), (3, 'def')"
ClickhouseLexer = new ClickhouseLexer(new ANTLRInputStream(query));
CommonTokenStream tokens = new CommonTokenStream(lexer);
ClickHouseParser = new ClickHouseParser (tokens);
ParseTreeWalker walker = new ParseTreeWalker();
}
}发布于 2021-08-07 20:52:29
我建议你看看TokenStreamRewriter。
首先,让我们把语法准备好。
1-对于TokenStreamRewriter,我们希望保留空白,所以让我们将-> skip指令更改为->channel(HIDDEN)
在词汇语法的末尾:
// Comments and whitespace
MULTI_LINE_COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
SINGLE_LINE_COMMENT: '--' ~('\n'|'\r')* ('\n' | '\r' | EOF) -> channel(HIDDEN);
WHITESPACE: [ \u000B\u000C\t\r\n] -> channel(HIDDEN); // '\n' can be part of multiline single query2- C++特定的东西只是防止使用关键字不止一次。为了您的目的,您并不真正需要检查(如果需要,可以在解析后侦听器中完成)。所以让我们丢掉语言方面的东西吧:
engineClause: engineExpr (
orderByClause
| partitionByClause
| primaryKeyClause
| sampleByClause
| ttlClause
| settingsClause
)*
;和
dictionaryAttrDfnt
: identifier columnTypeExpr (
DEFAULT literal
| EXPRESSION columnExpr
| HIERARCHICAL
| INJECTIVE
| IS_OBJECT_ID
)*
;
dictionaryEngineClause
: dictionaryPrimaryKeyClause? (
sourceClause
| lifetimeClause
| layoutClause
| rangeClause
| dictionarySettingsClause
)*
;注意:语法不接受insert语句的实际值似乎存在问题:
insertStmt
: INSERT INTO TABLE? (
tableIdentifier
| FUNCTION tableFunctionExpr
) columnsClause? dataClause
;
columnsClause
: LPAREN nestedIdentifier (COMMA nestedIdentifier)* RPAREN
;
dataClause
: FORMAT identifier # DataClauseFormat
| VALUES # DataClauseValues // <- problem on this line
| selectUnionStmt SEMICOLON? EOF # DataClauseSelect
;(我不打算修复这个部分,所以我已经评论了你的意见,以适应)
(如果顶层规则需要一个EOF令牌,这也会有所帮助;如果没有这些,ANTLR就会在VALUE之后停止解析。根规则结尾处的EOF正是基于这个原因被认为是最佳实践。)
主程序:
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.TokenStreamRewriter;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
public class TSWDemo {
public static void main(String... args) {
new TSWDemo().run(CharStreams.fromString("INSERT INTO t VALUES /* (1, 'Hello, world'), (2, 'abc'), (3, 'def') */"));
}
public void run(CharStream charStream) {
var lexer = new ClickHouseLexer(charStream);
var tokenStream = new CommonTokenStream(lexer);
var parser = new ClickHouseParser(tokenStream);
var tsw = new TokenStreamRewriter(tokenStream);
var listener = new TSWDemoListener(tsw);
var queryStmt = parser.queryStmt();
ParseTreeWalker.DEFAULT.walk(listener, queryStmt);
System.out.println(tsw.getText());
}
}听众:
import org.antlr.v4.runtime.TokenStreamRewriter;
public class TSWDemoListener extends ClickHouseParserBaseListener {
private TokenStreamRewriter tsw;
public TSWDemoListener(TokenStreamRewriter tsw) {
this.tsw = tsw;
}
@Override
public void exitInsertStmt(ClickHouseParser.InsertStmtContext ctx) {
tsw.insertAfter(ctx.getStop(), " (WHERE X IN USERS)");
}
}输出:
INSERT INTO t VALUES (WHERE X IN USERS) /* (1, 'Hello, world'), (2, 'abc'), (3, 'def') */https://stackoverflow.com/questions/68687065
复制相似问题