文章/答案/技术大牛

发布

社区首页 >问答首页 >使用Cloudformation将分区投影添加到AWS雅典娜表

问使用Cloudformation将分区投影添加到AWS雅典娜表
EN

Stack Overflow用户

提问于 2022-04-08 07:39:45

回答 2查看 589关注 0票数 1

我定义了一个雅典娜表，并在cloudformation中指定了类似这样的模板：

云格式创建

EventsTable:
  Type: AWS::Glue::Table
  Properties:
    CatalogId: !Ref AWS::AccountId
    DatabaseName: !Ref DatabaseName
    TableInput:
      Description: "My Table"
      Name: !Ref TableName
      TableType: EXTERNAL_TABLE
      StorageDescriptor:
        Compressed: True
        Columns:
          - Name: account_id
            Type: string
            Comment: "Account Id of the account making the request"
            ...
        InputFormat: org.apache.hadoop.mapred.TextInputFormat
        SerdeInfo:
          SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
        OutputFormat: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
        Location: !Sub "s3://${EventsBucketName}/events/"

这是很好的工作和部署。我还发现我可以按照这个文档和这个文档创建分区投影。

并且可以通过直接创建表(粗略地说：

SQL创建

CREATE EXTERNAL TABLE `performance_data.events`
(
  `account_id`  string,
...
)
   PARTITIONED BY (
     `day` string)
    ROW FORMAT SERDE
        'org.openx.data.jsonserde.JsonSerDe'
    STORED AS INPUTFORMAT
        'org.apache.hadoop.mapred.TextInputFormat'
        OUTPUTFORMAT
          'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
    LOCATION
        's3://my-bucket/events/'
    TBLPROPERTIES (
        'has_encrypted_data' = 'false',
        'projection.enabled' = 'true',
        'projection.day.type' = 'date',
        'projection.day.format' = 'yyyy/MM/dd',
        'projection.day.range' = '2020/01/01,NOW',
        'projection.day.interval' = '1',
        'projection.day.interval.unit' = 'DAYS',
        'storage.location.template' = 's3://my-bucket/events/${day}/'
)

但我找不到转换成云层结构的文档。因此，我的问题是，如何实现cloudformation中SQL代码中显示的分区投影？

amazon-web-services

amazon-cloudformation

athena

Stack Overflow用户

发布于 2022-04-08 08:01:06

引用Glue Table TableInput的CloudFormation 参考文献，可以指定PartitionKeys和Parameters。这相当于查询中的PARTITIONED BY和TBLPROPERTIES。

编辑

例如，您可以参考这个文章。下面的示例展示了如何定义PartitionKeys和如何为Parameters定义JSON。在您的示例中，只需添加投影键(如projection.enabled)和值(true)即可。

# Create an Amazon Glue table
  CFNTableFlights:
    # Creating the table waits for the database to be created
    DependsOn: CFNDatabaseFlights
    Type: AWS::Glue::Table
    Properties:
      CatalogId: !Ref AWS::AccountId
      DatabaseName: !Ref CFNDatabaseName
      TableInput:
        Name: !Ref CFNTableName1
        Description: Define the first few columns of the flights table
        TableType: EXTERNAL_TABLE
        Parameters: {
    "classification": "csv"
  }
#       ViewExpandedText: String
        PartitionKeys:
        # Data is partitioned by month
        - Name: mon
          Type: bigint
        StorageDescriptor:
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          Columns:
          - Name: year
            Type: bigint
          - Name: quarter
            Type: bigint
          - Name: month
            Type: bigint
          - Name: day_of_month
            Type: bigint            
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          Location: s3://crawler-public-us-east-1/flight/2016/csv/
          SerdeInfo:
            Parameters:
              field.delim: ","
            SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

票数 1

查看全部 2 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71793344

复制

相似问题

问使用Cloudformation将分区投影添加到AWS雅典娜表
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Cloudformation将分区投影添加到AWS雅典娜表EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Cloudformation将分区投影添加到AWS雅典娜表
EN