前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Power Query 系列 (08) - 行转列案例

Power Query 系列 (08) - 行转列案例

原创
作者头像
StoneWM
修改2021-03-25 09:45:44
1.1K0
修改2021-03-25 09:45:44
举报
文章被收录于专栏:Stone的专栏

行转列是一种常见的数据处理操作,所以对如何在 SQL 语句中、如何在 pandas 中实现这种行转列做过一些总结。请参考我之前写的博文:

今天介绍在 Power Query (PQ) 实现行转列的操作方法,顺便说是稍微复杂一点的 IF 语句如何编写。

新建一个空查询,导入我已经放在 github 上的示例数据:

代码语言:txt
复制
let
    source = Csv.Document(
    	Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas%20sample%20data/project-listing.csv"),
   	 [Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None])
in
    source

可以在高级编辑器中拷贝以上代码,或者根据 url 从 Web 导入数据,因为数据是 csv 格式,所以 PQ 用 Csv.Document() 函数来读取数据的内容。

将第一行作为标题,PQ 自动识别数据类型:

因为公式栏高级编辑器对中文的支持不好,建议将所有步骤改成英文,步骤名称最好不要有空格。

这一步骤完成后,进入高级编辑器,M 脚本代码如下:

代码语言:txt
复制
let
    source = Csv.Document(
    	Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas%20sample%20data/project-listing.csv"),
   	 [Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    firstLineAsTitle = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    changedTitle = Table.TransformColumnTypes(firstLineAsTitle,
        {{"ID", Int64.Type}, 
        {"YEAR", Int64.Type}, 
        {"MONTH", Int64.Type}, 
        {"ENTITY", Int64.Type}, 
        {"ACCOUNT", type text}, 
        {"DIRECTION", type text}, 
        {"CURRENCY", type text}, 
        {"PROJ_TYPE", type text}, 
        {"PROJ_CODE", type text}, 
        {"AMOUNT", type number}})
in
    changedTitle

注意 PQ 将 ACCOUNT 字段识别为 Int64.Type,我改为了 text

接下来增加条件列来进行行转列的操作,说说规则:

代码语言:txt
复制
- BEGIN_COST (年初成本: 科目为 140401,期间  < 2018 年
- BEGIN_VAR (年初成本变化额:科目为 140404,期间 < 2018 年
- PER_COST_ADD (期间成本增加:科目为 140401,方向为 S, Year = 2018, Month <= 4
- PER_VAR_ADD (期间公允价值增加:科目为 140404,方向为 S, Year = 2018, Month <= 4
- PER_COST_DECT (期间成本减少:科目为 140401,方向为 H, Year = 2018, Month <= 4
- PER_VAR_ADD (期间公允价值减少:科目为 140404,方向为 H, Year = 2018, Month <= 4

增加一个条件列,我们可以看到,PQ 并不能表达 [ACCOUNT] = "140401" and [YEAR] < 2018,对话框中只能根据某一个列来设置条件:

我们需要点击确定后,在公式栏或高级编辑器中编写条件表达式:

选中步骤 addedBeginCost,右键,在后面插入一个步骤

因为步骤的公式类似,可以从上一步骤的公式拷贝过来进行改写:

用类似的方法添加步骤,计算出其他栏位。完成后 M 语言脚本如下:

代码语言:txt
复制
let
    source = Csv.Document(
    	Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas%20sample%20data/project-listing.csv"),
   	 [Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    firstLineAsTitle = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    changedTitle = Table.TransformColumnTypes(firstLineAsTitle,
        {{"ID", Int64.Type}, 
        {"YEAR", Int64.Type}, 
        {"MONTH", Int64.Type}, 
        {"ENTITY", Int64.Type}, 
        {"ACCOUNT", type text}, 
        {"DIRECTION", type text}, 
        {"CURRENCY", type text}, 
        {"PROJ_TYPE", type text}, 
        {"PROJ_CODE", type text}, 
        {"AMOUNT", type number}}),
    
    addedBeginCost = Table.AddColumn(changedTitle, "BEGIN_COST", each if [ACCOUNT] = "140401" and [YEAR] < 2018 then [AMOUNT] else 0),
    addedBeginVar = Table.AddColumn(addedBeginCost, "BEGIN_VAR", each if [ACCOUNT] = "140404" and [YEAR] < 2018 then [AMOUNT] else 0),
    
    addedPerCostAdd = Table.AddColumn(
        addedBeginVar, "PER_COST_ADD", each if [ACCOUNT] = "140401" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0),
    
    addedPerVarAdd = Table.AddColumn(
        addedPerCostAdd, "PER_VAR_ADD", each if [ACCOUNT] = "140404" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0),
    
    addedPerCostDect = Table.AddColumn(
        addedPerVarAdd, "PER_COST_DECT",each if [ACCOUNT] = "140401" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0),
    
    addedPerVarDect = Table.AddColumn(
        addedPerCostDect, "PER_VAR_DECT", each if [ACCOUNT] = "140404" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] = 4 then [AMOUNT] else 0)
in
    addedPerVarDect

根据 PROJ_CODE 进行分组:

完成之后数据格式如下:

完成之后 M 语言脚本如下:

代码语言:txt
复制
let
    source = Csv.Document(
    	Web.Contents("https://raw.githubusercontent.com/stonewm/python-practice-projects/master/pandas%20sample%20data/project-listing.csv"),
   	 [Delimiter=",", Columns=10, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    firstLineAsTitle = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    筛选的行 = Table.SelectRows(firstLineAsTitle, each true),
    changedTitle = Table.TransformColumnTypes(筛选的行,
        {{"ID", Int64.Type}, 
        {"YEAR", Int64.Type}, 
        {"MONTH", Int64.Type}, 
        {"ENTITY", Int64.Type}, 
        {"ACCOUNT", type text}, 
        {"DIRECTION", type text}, 
        {"CURRENCY", type text}, 
        {"PROJ_TYPE", type text}, 
        {"PROJ_CODE", type text}, 
        {"AMOUNT", type number}}),
    
    addedBeginCost = Table.AddColumn(changedTitle, "BEGIN_COST", each if [ACCOUNT] = "140401" and [YEAR] < 2018 then [AMOUNT] else 0),
    addedBeginVar = Table.AddColumn(addedBeginCost, "BEGIN_VAR", each if [ACCOUNT] = "140404" and [YEAR] < 2018 then [AMOUNT] else 0),
    
    addedPerCostAdd = Table.AddColumn(
        addedBeginVar, "PER_COST_ADD", each if [ACCOUNT] = "140401" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
    
    addedPerVarAdd = Table.AddColumn(
        addedPerCostAdd, "PER_VAR_ADD", each if [ACCOUNT] = "140404" and [DIRECTION] = "S" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
    
    addedPerCostDect = Table.AddColumn(
        addedPerVarAdd, "PER_COST_DECT",each if [ACCOUNT] = "140401" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
    
    addedPerVarDect = Table.AddColumn(
        addedPerCostDect, "PER_VAR_DECT", each if [ACCOUNT] = "140404" and [DIRECTION] = "H" and [YEAR] = 2018 and [MONTH] <= 4 then [AMOUNT] else 0),
    
    grouped = Table.Group(addedPerVarDect, {"PROJ_CODE"}, {
            {"BEGIN_COST", each List.Sum([BEGIN_COST]), type number}, 
            {"BEGIN_VAR", each List.Sum([BEGIN_VAR]), type number}, 
            {"PER_COST_ADD", each List.Sum([PER_COST_ADD]), type number}, 
            {"PER_VAR_ADD", each List.Sum([PER_VAR_ADD]), type number}, 
            {"PER_COST_DECT", each List.Sum([PER_COST_DECT]), type number}, 
            {"PER_VAR_DECT", each List.Sum([PER_VAR_DECT]), type number}
        }),
        
    sortedLines = Table.Sort(grouped,{{"PROJ_CODE", Order.Ascending}})
in
    sortedLines

本示例贴了较多代码,作用 1 是结合操作步骤观察和理解 M 代码,作用二是可以直接将代码粘贴到高级编辑器,生成查询,直接学习。

Table.Group() 函数语法:

代码语言:txt
复制
Table.Group(table as table, 
            key as any, 
            aggregatedColumns as list, 
            optional groupKind as nullable number, 
            optional comparer as nullable function) as table

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
腾讯云 BI
腾讯云 BI(Business Intelligence,BI)提供从数据源接入、数据建模到数据可视化分析全流程的BI能力,帮助经营者快速获取决策数据依据。系统采用敏捷自助式设计,使用者仅需通过简单拖拽即可完成原本复杂的报表开发过程,并支持报表的分享、推送等企业协作场景。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档