我有下表
CREATE TABLE Test (
ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
FIRST VARCHAR(10) NOT NULL,
SECOND VARCHAR(10) NOT NULL
)表格中充满了一些重复的数据。TestTarget表具有相同的结构,它使用以下过程算法填充:
DECLARE @first varchar(10), @second varchar(10)
DECLARE c CURSOR FAST_FORWARD
FOR
SELECT first, second FROM Test ORDER BY id
OPEN c
FETCH NEXT FROM c INTO @first, @second
WHILE @@fetch_status = 0
BEGIN
IF NOT EXISTS(SELECT 1 FROM TestTarget WHERE first=@first OR second=@second)
INSERT INTO TestTarget (first, second) VALUES(@first, @second)
FETCH NEXT FROM c INTO @first, @second
END
CLOSE c
DEALLOCATE c 简单地说,我们在插入之前检查目标表是否已经包含这样的‘第一个’或‘第二个’值。
示例:
源表
ID FIRST SECOND
1 A 2
2 A 1
3 A 3
4 B 2
5 B 1
6 B 3
7 B 2
8 B 4
9 C 2
10 C 3
INSERT INTO Test (first, second)
VALUES ('A', '2'),
('A', '1'),
('A', '3'),
('B', '2'),
('B', '1'),
('B', '3'),
('B', '2'),
('B', '4'),
('C', '2'),
('C', '3')目标表
ID FIRST SECOND
1 A 2
5 B 1
10 C 3实际源表有x*100 k行,相同的“第一”或“第二”列至少有2行。我正在寻找基于集合的解决方案,如果可能的话,或者至少请一些比这样的循环更快的解决方案,因为这需要我的实际情况下的几个小时。
注意,通过分区/联接/等进行的经典重复删除不是这样的,因为即使在不同的最终行数下,它也会产生不同的结果。
发布于 2018-06-04 10:23:36
INSERT INTO TestTarget (first, second)
SELECT first,second
FROM Test t
WHERE NOT EXISTS
(
SELECT 1
FROM Test t2
WHERE t2.id>t.id and (t2.first=t.first or t2.second=t.second)
)发布于 2018-06-04 11:57:57
恐怕我想不出任何简单的基于集合的解决方案来解决您的问题,但我希望下面的内容会比您现有的游标快得多:
declare @test table
(id int,
first varchar(1),
second varchar(1))
declare @target table
(id int,
first varchar(1),
second varchar(1))
declare @temp table
(id int,
first varchar(1),
second varchar(1))
INSERT INTO @Test (id, first, second)
VALUES (1, 'A', '2'),
(2, 'A', '1'),
(3, 'A', '3'),
(4, 'B', '2'),
(5, 'B', '1'),
(6, 'B', '3'),
(7, 'B', '2'),
(8, 'B', '4'),
(9, 'C', '2'),
(10, 'C', '3')
declare @firsts table
(first varchar(1))
declare @seconds table
(second varchar(1))
INSERT INTO @firsts
SELECT DISTINCT first FROM @test
INSERT INTO @seconds
SELECT DISTINCT second FROM @test
declare @firstcnt int = (SELECT count(*) FROM @firsts)
declare @secondcnt int = (SELECT count(*) FROM @firsts)
WHILE (@firstcnt > 0 AND @secondcnt > 0)
BEGIN
DELETE FROM @temp
INSERT INTO @temp
SELECT TOP 1 t.id, t.first, t.second FROM @test t
INNER JOIN @firsts f On t.first = f.first
INNER JOIN @seconds s On t.second = s.second
ORDER BY id
INSERT INTO @target
SELECT * FROM @temp
DELETE FROM @firsts WHERE first = (SELECT first FROM @temp)
SET @firstcnt = @firstcnt - 1
DELETE FROM @seconds WHERE second = (SELECT second FROM @temp)
SET @secondcnt = @secondcnt - 1
END
SELECT * FROM @target这确实产生了所需的值,而且我预计它会更快,因为while循环只需要运行惟一值对的总数,而不必遍历整个表。
它还提供了10C3作为最后一行,我认为这是正确的,尽管@Gordon的评论。如果我正确理解这个问题,id顺序优先:也就是说,尽管'A‘和'B’有以'3‘作为第二个值的条目,但这些条目的ID比另一个可以合法插入的第二个值要大。
HTH
发布于 2018-06-05 11:37:07
使用递归CTE,
declare @Target table(col1 varchar(20),col2 int)
declare @Test table(col1 varchar(20),col2 int)
INSERT INTO @Test (col1, col2
VALUES ('A', '2')
('A', '1')
('A', '3'),
('B', '1')
('B', '2'),
('B', '3'),
('B', '2'),
('B', '4'),
('C', '2'),
('C', '3')
;With CTE as
(
select col1 ,col2
,DENSE_RANK()over( ORDER by col1)rn1
from @Test
)
,cte1 AS(
select top 1 c.col1,c.col2,rn1 from cte c where rn1=1
union ALL
select c.col1,c.col2,c.rn1 from cte c
inner join cte1 c1
on c.rn1>c1.rn
where c.col2!=c1.col2
)
insert into @Target
select col1,col2 FROM(
select *,ROW_NUMBER()over(partition by col1 order by (select null)) rn2 from cte1
)t4
where rn2=1
select * from @Targethttps://stackoverflow.com/questions/50678328
复制相似问题