df.withColumn("storeInfo", struct($"store", struct($"inhand", $"storeQuantity")))
.groupBy("sku").agg(collect_list("storeInfo").as("info"))
.show(false)
+---+---------------------------------------------------+
|sku|info |
+---+---------------------------------------------------+
|1 |[{2222, {3, 34}}, {3333, {5, 45}}] |
|2 |[{4444, {5, 56}}, {5555, {6, 67}}, {6666, {7, 67}}]|
+---+---------------------------------------------------+
当我把它送到couchbase的时候
{
"SKU": "1",
"info": [
{
"col2": {
"inhand": "3",
"storeQuantity": "34"
},
"Store": "2222"
},
{
"col2": {
"inhand": "5",
"storeQuantity": "45"
},
"Store": "3333"
}}
]}
我们可以用值重命名col2为存储值吗?我想让它看起来像下面的样子。因此,每个结构的关键都是存储价值的价值。
{
"SKU": "1",
"info": [
{
"2222": {
"inhand": "3",
"storeQuantity": "34"
},
"Store": "2222"
},
{
"3333": {
"inhand": "5",
"storeQuantity": "45"
},
"Store": "3333"
}}
]}
发布于 2021-07-30 01:57:34
简单地说,我们不能按您的要求构造一个列。两个限制:
struct
类型的字段名必须是固定的,我们可以将'col2‘改为另一个名称(例如。(演示1中的“fixedFieldName”),但它不能是动态的(类似于类字段名)
key
of map
类型可以是动态的,但是map
的value
必须是相同的类型,请参阅演示2中的异常。也许您应该更改模式,请参阅演示1、3的输出。
演示1
df.withColumn(
"storeInfo", struct($"store", struct($"inhand", $"storeQuantity").as("fixedFieldName"))).
groupBy("sku").agg(collect_list("storeInfo").as("info")).
toJSON.show(false)
// output:
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
//|value |
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
//|{"sku":1,"info":[{"store":2222,"fixedFieldName":{"inhand":3,"storeQuantity":34}},{"store":3333,"fixedFieldName":{"inhand":5,"storeQuantity":45}}]} |
//|{"sku":2,"info":[{"store":4444,"fixedFieldName":{"inhand":5,"storeQuantity":56}},{"store":5555,"fixedFieldName":{"inhand":6,"storeQuantity":67}},{"store":6666,"fixedFieldName":{"inhand":7,"storeQuantity":67}}]}|
//+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
演示2
df.withColumn(
"storeInfo",
map($"store", struct($"inhand", $"storeQuantity"), lit("Store"), $"store")).
groupBy("sku").agg(collect_list("storeInfo").as("info")).
toJSON.show(false)
// output exception:
// The given values of function map should all be the same type, but they are [struct<inhand:int,storeQuantity:int>, int]
演示3
df.withColumn(
"storeInfo",
map($"store", struct($"inhand", $"storeQuantity"))).
groupBy("sku").agg(collect_list("storeInfo").as("info")).
toJSON.show(false)
//+---------------------------------------------------------------------------------------------------------------------------------------------+
//|value |
//+---------------------------------------------------------------------------------------------------------------------------------------------+
//|{"sku":1,"info":[{"2222":{"inhand":3,"storeQuantity":34}},{"3333":{"inhand":5,"storeQuantity":45}}]} |
//|{"sku":2,"info":[{"4444":{"inhand":5,"storeQuantity":56}},{"5555":{"inhand":6,"storeQuantity":67}},{"6666":{"inhand":7,"storeQuantity":67}}]}|
//+---------------------------------------------------------------------------------------------------------------------------------------------+
https://stackoverflow.com/questions/68582447
复制相似问题