我想在“莎士比亚”( samples -> BigQuery,又称bigquery-public-data.samples.shakespeare
)上的可公开访问的莎士比亚数据库中找到给定单词的总字数。
模式如下所示:
Field name | Type | Mode | Description
---------------------------------------------------
word | STRING | REQUIRED | A single unique word (where whitespace is the delimiter) extracted from a corpus.
word_count | INTEGER | REQUIRED | The number of times this word appears in this corpus.
corpus | STRING | REQUIRED | The work from which this word was extracted.
corpus_date| INTEGER | REQUIRED | The year in which this corpus was published.
我已经成功地组合了corpus
所有值中的所有小写单词,然后将所有corpus
实例组合到一个新列found_in
中,同时还将它们的单词计数合并到列total_word_count
中。
我的查询如下所示:
SELECT
STRING_AGG(DISTINCT corpus) AS found_in,
LOWER(word),
SUM(word_count) AS total_word_count
FROM
`bigquery-public-data.samples.shakespeare`
GROUP BY
LOWER(word)
ORDER BY
total_word_count DESC
LIMIT
1000
输出列为
Row found_in f0_ total_word_count
我的问题是重命名f0_
列。这是一个问题,因为我希望将整个内容包装在另一个查询中,这样我就可以执行类似SELECT * FROM {{that previous query}} WHERE word="thou"
的操作。
我不明白的是
WHERE
子句中引用"word“。LOWER(word)
部分,就像我对STRING_AGG
和(LOWER(word)
)所做的那样(使用AS
)。我尝试了以下几种方法:
SELECT
*
FROM
(
SELECT
STRING_AGG(DISTINCT corpus) AS found_in,
LOWER(word),
SUM(word_count) AS total_word_count
FROM
`bigquery-public-data.samples.shakespeare`
GROUP BY
LOWER(word)
ORDER BY
total_word_count DESC
LIMIT
1000
)
WHERE word = 'thou'
然而,我在最后一行得到了一个错误:Unrecognized name: word
。
因此,我尝试使用AS
SELECT
*
FROM
(
SELECT
STRING_AGG(DISTINCT corpus) AS found_in,
LOWER(word) AS lowered_word,
SUM(word_count) AS total_word_count
FROM
`bigquery-public-data.samples.shakespeare`
GROUP BY
LOWER(word)
ORDER BY
total_word_count DESC
LIMIT
1000
)
WHERE word = 'and'
但是在使用LOWER(word)
时,我得到了错误SELECT list expression references column word which is neither grouped nor aggregated
。
这让我感到困惑,因为我看到GROUP BY
引用了word
。
如何才能正确地引用较低的(单词),以便在第二题中引用它?
发布于 2019-06-13 03:40:46
我想这就是你想要的:
SELECT *
FROM (SELECT STRING_AGG(DISTINCT corpus) AS found_in,
LOWER(word) AS lowered_word,
SUM(word_count) AS total_word_count
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY lowered_word
ORDER BY total_word_count DESC
LIMIT 1000
) w
WHERE lowered_word = 'and';
备注:
word
的东西,因此使用外部lowered_word
可以在BigQuery中按列别名进行聚合。LIMIT
似乎是任意的。我不认为这会提高性能或降低查询成本。https://stackoverflow.com/questions/56569097
复制相似问题