我在Postgres 13.1数据库中有一个只有一列的表。它由许多带有逗号分隔值的行组成-最多大约20个元素。
我想将数据拆分为多个列。但是我只有有限数量的列,比如在单行中有5个和5个以上的CSV值,所以多余的值必须转移到新的/下一行)。如何做到这一点?
示例:
a1, b1, c1
a2, b2, c2, d2, e2, f2
a3, b3, c3, d3, e3, f3, g3, h3, i3, j3
a4
a5, b5, c5
'
'
'
列只有5列,因此输出将如下所示:
c1 c2 c3 c4 c5
---------------
a1 b1 c1
a2 b2 c2 d2 e2
f2
a3 b3 c3 d3 e3
f3 g3 h3 i3 j3
a4
a5 b5 c5
'
'
'
发布于 2021-02-20 20:31:16
将CSV值存储在单个列中通常是糟糕的设计。如果可能,请使用数组或适当的规范化设计。
当你被困在你目前的情况下...
对于已知的最大元素数较少的情况
一个没有诡计或递归的简单解决方案就可以了:
SELECT id, 1 AS rnk
, split_part(csv, ', ', 1) AS c1
, split_part(csv, ', ', 2) AS c2
, split_part(csv, ', ', 3) AS c3
, split_part(csv, ', ', 4) AS c4
, split_part(csv, ', ', 5) AS c5
FROM tbl
WHERE split_part(csv, ', ', 1) <> '' -- skip empty rows
UNION ALL
SELECT id, 2
, split_part(csv, ', ', 6)
, split_part(csv, ', ', 7)
, split_part(csv, ', ', 8)
, split_part(csv, ', ', 9)
, split_part(csv, ', ', 10)
FROM tbl
WHERE split_part(csv, ', ', 6) <> '' -- skip empty rows
-- three more blocks to cover a maximum "around 20"
ORDER BY id, rnk;
db<>fiddle这里
id
是原表的主键。显然,这里假设',‘作为分隔符。你可以很容易地适应。
相关信息:
对于未知数量的元素
不同的方式。单向使用regexp_replace()
在取消嵌套前每隔五个分隔符替换一次...
-- for any number of elements
SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 1) AS c1
, split_part(c.csv5, ', ', 2) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 4) AS c4
, split_part(c.csv5, ', ', 5) AS c5
FROM tbl t
, unnest(string_to_array(regexp_replace(csv, '((?:.*?,){4}.*?),', '\1;', 'g'), '; ')) WITH ORDINALITY c(csv5, rnk)
ORDER BY t.id, c.rnk;
db<>fiddle这里
这假设所选的分隔符;
从不出现在您的字符串中。(就像,
永远不会出现。)
正则表达式模式是关键:'((?:.*?,){4}.*?),'
(?:)
..。“非捕获”括号集
()
..。“捕获”一组括号
*?
..。非贪婪量词
{4}?
..。恰好4个匹配的序列
替代者'\1;'
包含反向引用\1
...。
'g'
因为第四个函数参数需要重复替换。
进一步阅读:
解决此问题的其他方法包括递归CTE或集返回函数...
从右到左填充
(就像您在如何将从右侧开始的值放入列中?)
只需像这样倒数:
SELECT t.id, c.rnk
, split_part(c.csv5, ', ', 5) AS c1
, split_part(c.csv5, ', ', 4) AS c2
, split_part(c.csv5, ', ', 3) AS c3
, split_part(c.csv5, ', ', 2) AS c4
, split_part(c.csv5, ', ', 1) AS c5
FROM ...
db<>fiddle这里
发布于 2021-02-20 21:47:38
CREATE UNLOGGED TABLE foo( x TEXT );
\copy foo FROM stdin
a1, b1, c1
a2, b2, c2, d2, e2, f2
a3, b3, c3, d3, e3, f3, g3, h3, i3, j3
a4
a5, b5, c5
\.
从行到单列...
SELECT (ROW_NUMBER() OVER () - 1)/5 AS r, u FROM (SELECT unnest(string_to_array(x,', ')) u from foo) y;
r | u
---+----
0 | a1
0 | b1
0 | c1
0 | a2
0 | b2
1 | c2
1 | d2
...etc
...and返回到已知长度的行。
SELECT r,array_agg(u) a FROM (
SELECT (ROW_NUMBER() OVER () - 1)/5 AS r, u FROM (
SELECT unnest(string_to_array(x,', ')) u from foo) y) y1
GROUP BY r ORDER BY r;
r | a
---+------------------
0 | {a1,b1,c1,a2,b2}
1 | {c2,d2,e2,f2,a3}
2 | {b3,c3,d3,e3,f3}
3 | {g3,h3,i3,j3,a4}
4 | {a5,b5,c5}
之后,您可以对每一列使用a[]将其插入到表中。如何处理最后一行留给读者作为练习……
发布于 2021-02-23 22:12:47
相关问题的答案:如何将从右侧开始的值放入列中?
被接受的来自@ErwinBrandstetter的精彩回答可以很容易地适应所需的从右到左的输出。
您只需更改拆分部分的顺序即可。因此,您不会返回拆分的1-5和6-10部分,而是返回5-1和10-6部分:
SELECT id, 1 AS rnk
, split_part(csv, ', ', 5) AS c1
, split_part(csv, ', ', 4) AS c2
, split_part(csv, ', ', 3) AS c3
, split_part(csv, ', ', 2) AS c4
, split_part(csv, ', ', 1) AS c5
FROM tbl
WHERE split_part(csv, ', ', 1) <> '' -- skip empty rows
UNION ALL
SELECT id, 2
, split_part(csv, ', ', 10)
, split_part(csv, ', ', 9)
, split_part(csv, ', ', 8)
, split_part(csv, ', ', 7)
, split_part(csv, ', ', 6)
FROM tbl
WHERE split_part(csv, ', ', 6) <> '' -- skip empty rows
-- more?
ORDER BY id, rnk;
https://stackoverflow.com/questions/66291134
复制相似问题