我的每个流文件包含2000条记录。我想将01/ 01 /2000解析为列year = 2000,列month = Jan和列day =01
即,将输入列01/ 01 /2000分成由逗号01分隔的3个值,2000年1月
发布于 2018-08-04 00:10:46
假设您有一个类似于一个生日的人的模式,并且您想要拆分生日:
{
"name": "person",
"namespace": "nifi",
"type": "record",
"fields": [
{ "name": "first_name", "type": "string" },
{ "name": "last_name", "type": "string" },
{ "name": "birthday", "type": "string" }
]
}
您需要修改架构,使其具有要添加的字段:
{
"name": "person",
"namespace": "nifi",
"type": "record",
"fields": [
{ "name": "first_name", "type": "string" },
{ "name": "last_name", "type": "string" },
{ "name": "birthday", "type": "string" },
{ "name": "birthday_year", "type": ["null", "string"] },
{ "name": "birthday_month", "type": ["null", "string"] },
{ "name": "birthday_day", "type": ["null", "string"] }
]
}
假设输入记录包含以下文本:
bryan,bende,1980-01-01
我们可以使用带有CsvReader和CsvWriter的UpdateRecord,UpdateRecord可以通过解析原始的生日字段来填充我们想要的三个字段。
如果我们将输出发送到LogAttribute,我们现在应该看到以下内容:
first_name,last_name,birthday,birthday_year,birthday_month,birthday_day
bryan,bende,1980-01-01,1980,01,01
以下是指向记录路径指南的链接,了解有关toDate和格式函数的详细信息:
https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html
发布于 2018-08-04 02:53:10
为此,您可以使用UpdateRecord,假设您的输入记录具有名为"myDate“的date列,您将Replacement Value Strategy
设置为Record Path Value
,并且您的用户定义属性可能如下所示:
/day format(/myDate, "dd") /month format(/myDate, "MMM") /year format(/myDate, "YYYY")
您的输出模式将如下所示:
{
"namespace": "nifi",
"name": "myRecord",
"type": "record",
"fields": [
{"name": "day","type": "int"},
{"name": "month","type": "string"},
{"name": "year","type": "int"}
]
}
https://stackoverflow.com/questions/51675050
复制相似问题