问使用json数组中的java spark sql将表保存在配置单元中。
EN

Stack Overflow用户

提问于 2018-09-20 05:49:56

回答 1查看 497关注 0票数 0

    Dataset<Row> ds = spark.read().option("multiLine", true).option("mode", "PERMISSIVE").json("/user/administrador/prueba_diario.txt").toDF();

    ds.printSchema();

    Dataset<Row> ds2 = ds.select("articles").toDF();

    ds2.printSchema();
    spark.sql("drop table if exists table1"); 
    ds2.write().saveAsTable("table1");

我有这个json格式。

root
 |-- articles: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- author: string (nullable = true)
 |    |    |-- content: string (nullable = true)
 |    |    |-- description: string (nullable = true)
 |    |    |-- publishedAt: string (nullable = true)
 |    |    |-- source: struct (nullable = true)
 |    |    |    |-- id: string (nullable = true)
 |    |    |    |-- name: string (nullable = true)
 |    |    |-- title: string (nullable = true)
 |    |    |-- url: string (nullable = true)
 |    |    |-- urlToImage: string (nullable = true)
 |-- status: string (nullable = true)
 |-- totalResults: long (nullable = true)

我希望将数组文章保存为具有数组格式的配置单元的表

我想要的配置单元表示例：

author (string)
content (string)
description (string)
publishedat (string)
source (struct<id:string,name:string>)
title (string)
url (string)
urltoimage (string)

问题是保存表时只有一个名为文章的列，而contend在这个唯一的列中

java

json

hadoop

hive

apache-spark-sql

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-09-20 05:55:09

有点令人费解，但我发现这个方法是可行的：

import org.apache.spark.sql.functions._
ds.select(explode(col("articles")).as("exploded")).select("exploded.*").toDF()

我测试过了

{
  "articles": [
    {
      "author": "J.K. Rowling",
      "title": "Harry Potter and the goblet of fire"
    },
    {
      "author": "George Orwell",
      "title": "1984"
    }
  ]
}

并且它返回(在将其收集到数组中之后)

result = {Arrays$ArrayList@13423}  size = 2
 0 = {GenericRowWithSchema@13425} "[J.K. Rowling,Harry Potter and the goblet of fire]"
 1 = {GenericRowWithSchema@13426} "[George Orwell,1984]"

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/52414644

复制

相似问题

问使用json数组中的java spark sql将表保存在配置单元中。
EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用json数组中的java spark sql将表保存在配置单元中。EN

回答 1

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用json数组中的java spark sql将表保存在配置单元中。
EN