我的数据库中有这些项目:
当用户搜索“哈利波特”(而不是“哈利波特”)时,我如何返回上述项目?


发布于 2016-09-22 23:28:01
在SQL Server中很难找到对这类事情有效的东西。当你需要搜索拼写错误时,模糊匹配真的很难处理,而同时又不想在一些事情上得到糟糕的匹配。
例如,以下是您尝试这样做的一种方法:
DECLARE @ TABLE (id INT IDENTITY(1, 1), blah NVARCHAR(255));
INSERT @ VALUES ('Harry Potter and the Chamber of Secrets')
,('Harry Potter and the Deathly Hallows: Part 1')
,('Harry Potter and the Deathly Hallows: Part 2')
,('Harry Potter and the Goblet of Fire')
,('Harry Potter and the Half-Blood Prince')
,('Harry Potter and the Order of the Phoenix')
,('Harry Potter and the Prisoner of Azkaban')
,('Harry Potter and the Sorcerer''s Stone');
DECLARE @myVar NVARCHAR(255) = 'deadly halow'; -- returns 2 matches (both parts of Deathly Hallows)
-- SET @myVar = 'hary poter'; -- returns 8 matches, all of them
-- SET @myVar = 'order'; -- returns 1 match (Order of the Phoenix)
-- SET @myVar = 'phoneix'; -- returns 2 matches (Order of the Phoenix and Half-blood Prince, the latter due to a fuzzy match on 'prince')
WITH CTE AS (
SELECT id, blah
FROM @
UNION ALL
SELECT 0, @myVar
)
, CTE2 AS (
SELECT id
, blah
, SUBSTRING(blah, 1, ISNULL(NULLIF(CHARINDEX(' ', blah), 0) - 1, LEN(blah))) individualWord
, NULLIF(CHARINDEX(' ', blah), 0) cIndex
, 1 L
FROM CTE
UNION ALL
SELECT CTE.id
, CTE.blah
, SUBSTRING(CTE.blah, cIndex + 1, ISNULL(NULLIF(CHARINDEX(' ', CTE.blah, cIndex + 1), 0) - 1 - cIndex, LEN(CTE.blah)))
, NULLIF(CHARINDEX(' ', CTE.blah, cIndex + 1), 0)
, L + 1
FROM CTE2
JOIN CTE ON CTE.id = CTE2.id
WHERE cIndex IS NOT NULL
)
SELECT blah
FROM (
SELECT X.blah, ROW_NUMBER() OVER (PARTITION BY X.ID, Y.L ORDER BY (SELECT NULL)) RN, Y.wordCount
FROM CTE2 X
JOIN (SELECT *, COUNT(*) OVER() wordCount FROM CTE2 WHERE id = 0) Y ON DIFFERENCE(X.individualWord, Y.individualWord) >= 3 AND X.id <> 0) T
WHERE RN = 1
GROUP BY blah
HAVING COUNT(*) = MAX(wordCount);这会分割搜索词中的每个单词,分割标题中的每个单词,然后使用DIFFERENCE()函数来比较值的SOUNDEX(),并告诉您它们之间的距离有多远。例如,SOUNDEX('Halow')是'H400‘,SOUNDEX('Hallows')是'H420’-这里的区别是3(因为H、4和一个零匹配)。一场完美的比赛会有4的差别,而近距离的比赛一般会有3以上的差异。
不幸的是,因为你需要检查近距离的匹配,有时你会得到一些错误的结果。例如,我用“phoneix”作为输入来测试它,由于“王子”和“凤凰”之间的模糊匹配,我在“混血王子”上得到了一个匹配。我相信这是可以改进的方式,但是像这样的东西应该作为你想要达到的目标的基础。
发布于 2016-09-22 22:52:18
您可以使用此查询。
create table #test (v varchar(50) )
insert into #test (v) values
('Harry Potter and the Chamber of Secrets' )
,('Harry Potter and the Deathly Hallows: Part 1' )
,('Harry Potter and the Deathly Hallows: Part 2' )
,('Harry Potter and the Goblet of Fire' )
,('Harry Potter and the Half-Blood Prince' )
,('Harry Potter and the Order of the Phoenix' )
,('Harry Potter and the Prisoner of Azkaban' )
,('Harry Potter and the Sorcerer''s Stone' )
select * from #test
where PATINDEX('%[Hh]%ar[r]%y [pP]%ot[t]%er%', v)>0https://stackoverflow.com/questions/39649715
复制相似问题