问HowTo在中扁平简单Json文件并转换为Parquet
EN

Stack Overflow用户

提问于 2022-02-11 18:10:43

回答 1查看 477关注 0票数 0

我需要扁平一个简单的Json文件(json行)，并将其转换为Azure Synapse Analytics中星火笔记本中的Parquet格式。对于任何列，只有一个级别的嵌套对象。但是，我发现获取dataframe的模式并不返回嵌套对象的模式。我使用c#，以便其他公司开发人员不必学习其他受支持的语言。

json

parquet

azure-synapse

spark-notebook

回答 1

Stack Overflow用户

发布于 2022-02-11 18:10:43

下面的代码将适用于上述情况。希望它能节省别人几个小时。在将子属性添加到父数据帧之后，它还会从数据帧中删除父列。

我不需要将这段代码变成递归横截，因为我们没有嵌套的对象。

using System;
using System.Collections.Generic;
using Microsoft.Spark.Sql;
using Microsoft.Spark.Sql.Types;
using System.Diagnostics;


var df = spark.Read().Json("{Your source file path here}");

//get the schema of the data frame
var dfSchema = df.Schema() ;
// traverse the schema of the dataframe
foreach(var parentSchemaField in dfSchema.Fields) {
    
    if (parentSchemaField.DataType is StructType) {
        // get a new dataframe that just contains the child data from the parent
        var childFrame = df.Select($"{parentSchemaField.Name}.*") ;
        // traverse the schema of the child dataframe
        foreach(var childSchemaField in childFrame.Schema().Fields) {   
            //make a new column in the parent dataframe for each parents child property
            df = df.WithColumn($"{parentSchemaField.Name}.{childSchemaField.Name}",Col($"{parentSchemaField.Name}.{childSchemaField.Name}")) ;
        }
        // drop the parent column from the data frame its no longer needed
        df = df.Drop(parentSchemaField.Name) ;
    }
}
df.Write().Parquet("{Your sink file path here}") ;

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71084684

复制

相似问题

问HowTo在中扁平简单Json文件并转换为Parquet
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问HowTo在中扁平简单Json文件并转换为ParquetEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问HowTo在中扁平简单Json文件并转换为Parquet
EN