SELECT
DATE,
EXTRACT(YEAR FROM DATE) AS year,
FIPS as Country,
LOCATIONS,
AVG(TONE) as Avg_Tone,
AVG(Positive Score) as PositiveS,
AVG(Negative Score) as NegativeS,
COUNT(*),
From `gdelt-bq.gdeltv2.gkg_partitioned`,
`gdelt-bq.extra.sourcesbycountry` country,
Where
DATE(_PARTITIONTIME) BETWEEN TIMESTAMP('2002-01-01') AND TIMESTAMP('2020-12-31')
AND SourceCommonName=country.Domain
AND Location like '%CH%'
GROUP BY Year,Country
ORDER BY Year,Country
码本链接是http://data.gdeltproject.org/documentation/GDELT-Global_Knowledge_Graph_Codebook-V2.1.pdf,V1.5TONE有音调、正分和负分等。我想按年数平均音调。如何从大查询中获得它。
发布于 2022-03-20 17:00:58
有必要先浇铸和分割字段。
date
是一个格式为“yyyymmdd.”的值。因此,我建议将值转换为字符串,并将前四个字符作为年份。V1.5TONE
列,只有V2Tone
。它由一个由逗号分隔的数字数字组成的字符串组成。字符串必须先分开。然后每个组件都需要抛出一个十进制数。gdelt-bq.extra.sourcesbycountry
应该将一个url映射到一个国家。它把国家复制成一个网址!若要消除至少一些重复的值,请使用内部select
和by组。要获得按年份分组的值,需要对国家所有其他维度列进行注释。
SELECT
#DATE,
substr(cast(date as string),0,4) AS year,
FIPS as Country,
#LOCATIONS,
AVG(cast(split(V2Tone,",")[safe_offset(0)] as decimal )) as Avg_Tone,
AVG(cast(split(V2Tone,",")[safe_offset(1)] as decimal )) as PositiveS,
AVG(cast(split(V2Tone,",")[safe_offset(2)] as decimal )) as NegativeS,
COUNT(*) as counts,
From `gdelt-bq.gdeltv2.gkg_partitioned`
left join
(SELECT Domain, FIPS
from `gdelt-bq.extra.sourcesbycountry`
group by 1,2) country
on SourceCommonName=country.Domain
Where
DATE(_PARTITIONTIME) BETWEEN DATE('2020-01-01') AND DATE('2020-01-31')
AND Locations like '%CH%'
GROUP BY Year,Country
ORDER BY Year,Country
同样在where
段中:不要混淆日期和时间戳。
https://stackoverflow.com/questions/71546963
复制相似问题