I have a large number of descriptions that can be anywhere from 5 to 20 sentences each. I am trying to put a script together that will locate and remove a sentence that contains a word with numbers before or after it.
我有大量的描述,每个都可以有5到20个句子。我正在尝试把一个脚本放在一起,它将定位并删除一个句子,其中包含一个在它之前或之后带有数字的单词。
before example: Hello world. Todays department has 345 employees. Have a good day. after example: Hello world. Have a good day.
之前的例子:Hello world。今天的部门有345名员工。有一个美好的一天。后的例子:Hello world。有一个美好的一天。
My main problem right now is identifying the violation.
Here "345 employees" is what causes the sentence to be removed. However, each description will have a different number and possibly a different variation of the word employee. I would like to avoid having to create a table of all the different variations of employee.
我现在的主要问题是识别违规行为。在这里,“345名员工”是导致判决被撤销的原因。但是,每个描述都有不同的数字,可能还有雇员这个词的不同变体。我希望避免必须创建一个包含所有员工的不同变体的表。
JTB
JTB
3 个解决方案
#1
3
This would make a good SQL Puzzle.
这将是一个很好的SQL难题。
Disclaimer: there are probably TONS of edge cases that would blow this up
免责声明:可能有大量的边缘案例会让这一切破灭
This would take a string, split it out into a table with a row for each sentence, then remove the rows that matched a condition, and then finally join them all back into a string.
这将取一个字符串,将它分割成一个表,每个句子都有一行,然后删除匹配条件的行,然后最终将它们合并到一个字符串中。
CREATE FUNCTION dbo.fn_SplitRemoveJoin(@Val VARCHAR(2000), @FilterCond VARCHAR(100))
RETURNS VARCHAR(2000)
AS
BEGIN
DECLARE @tbl TABLE (rid INT IDENTITY(1,1), val VARCHAR(2000))
DECLARE @t VARCHAR(2000)
-- Split into table @tbl
WHILE CHARINDEX('.',@Val) > 0
BEGIN
SET @t = LEFT(@Val, CHARINDEX('.', @Val))
INSERT @tbl (val) VALUES (@t)
SET @Val = RIGHT(@Val, LEN(@Val) - LEN(@t))
END
IF (LEN(@Val) > 0)
INSERT @tbl VALUES (@Val)
-- Filter out condition
DELETE FROM @tbl WHERE val LIKE @FilterCond
-- Join back into 1 string
DECLARE @i INT, @rv VARCHAR(2000)
SET @i = 1
WHILE @i <= (SELECT MAX(rid) FROM @tbl)
BEGIN
SELECT @rv = IsNull(@rv,'') + IsNull(val,'') FROM @tbl WHERE rid = @i
SET @i = @i + 1
END
RETURN @rv
END
go
CREATE TABLE #TMP (rid INT IDENTITY(1,1), sentence VARCHAR(2000))
INSERT #tmp (sentence) VALUES ('Hello world. Todays department has 345 employees. Have a good day.')
INSERT #tmp (sentence) VALUES ('Hello world. Todays department has 15 emps. Have a good day. Oh and by the way there are 12 employees somewhere else')
SELECT
rid, sentence, dbo.fn_SplitRemoveJoin(sentence, '%[0-9] Emp%')
FROM #tmp t
returns
返回
rid | sentence | |
1 | Hello world. Todays department has 345 employees. Have a good day. | Hello world. Have a good day.|
2 | Hello world. Todays department has 15 emps. Have a good day. Oh and by the way there are 12 employees somewhere else | Hello world. Have a good day. |
#2
2
I've used the split/remove/join technique as well.
我也使用了分割/删除/连接技术。
The main points are:
要点是:
- This uses a pair of recursive CTEs, rather than a UDF.
- 这使用一对递归cte,而不是UDF。
- This will work with all English sentence endings:
.
or!
or?
- 这将与所有英语句子结尾一起使用:。或!还是?
- This removes whitespace to make the comparison for "digit then employee" so you don't have to worry about multiple spaces and such.
- 这将删除空格,以便对“digit then employee”进行比较,因此您不必担心多个空格等。
Here's the SqlFiddle demo, and the code:
这是sql小提琴演示,代码如下:
-- Split descriptions into sentences (could use period, exclamation point, or question mark)
-- Delete any sentences that, without whitespace, are like '%[0-9]employ%'
-- Join sentences back into descriptions
;with Splitter as (
select ID
, ltrim(rtrim(Data)) as Data
, cast(null as varchar(max)) as Sentence
, 0 as SentenceNumber
from Descriptions -- Your table here
union all
select ID
, case when Data like '%[.!?]%' then right(Data, len(Data) - patindex('%[.!?]%', Data)) else null end
, case when Data like '%[.!?]%' then left(Data, patindex('%[.!?]%', Data)) else Data end
, SentenceNumber + 1
from Splitter
where Data is not null
), Joiner as (
select ID
, cast('' as varchar(max)) as Data
, 0 as SentenceNumber
from Splitter
group by ID
union all
select j.ID
, j.Data +
-- Don't want "digit+employ" sentences, remove whitespace to search
case when replace(replace(replace(replace(s.Sentence, char(9), ''), char(10), ''), char(13), ''), char(32), '') like '%[0-9]employ%' then '' else s.Sentence end
, s.SentenceNumber
from Joiner j
join Splitter s on j.ID = s.ID and s.SentenceNumber = j.SentenceNumber + 1
)
-- Final Select
select a.ID, a.Data
from Joiner a
join (
-- Only get max SentenceNumber
select ID, max(SentenceNumber) as SentenceNumber
from Joiner
group by ID
) b on a.ID = b.ID and a.SentenceNumber = b.SentenceNumber
order by a.ID, a.SentenceNumber
#3
0
One way to do this. Please note that it only works if you have one number in all sentences.
一种方法。请注意,只有在所有句子中有一个数字时,它才有效。
declare @d VARCHAR(1000) = 'Hello world. Todays department has 345 employees. Have a good day.'
declare @dr VARCHAR(1000)
set @dr = REVERSE(@d)
SELECT REVERSE(RIGHT(@dr,LEN(@dr) - CHARINDEX('.',@dr,PATINDEX('%[0-9]%',@dr))))
+ RIGHT(@d,LEN(@d) - CHARINDEX('.',@d,PATINDEX('%[0-9]%',@d)) + 1)
#1
3
This would make a good SQL Puzzle.
这将是一个很好的SQL难题。
Disclaimer: there are probably TONS of edge cases that would blow this up
免责声明:可能有大量的边缘案例会让这一切破灭
This would take a string, split it out into a table with a row for each sentence, then remove the rows that matched a condition, and then finally join them all back into a string.
这将取一个字符串,将它分割成一个表,每个句子都有一行,然后删除匹配条件的行,然后最终将它们合并到一个字符串中。
CREATE FUNCTION dbo.fn_SplitRemoveJoin(@Val VARCHAR(2000), @FilterCond VARCHAR(100))
RETURNS VARCHAR(2000)
AS
BEGIN
DECLARE @tbl TABLE (rid INT IDENTITY(1,1), val VARCHAR(2000))
DECLARE @t VARCHAR(2000)
-- Split into table @tbl
WHILE CHARINDEX('.',@Val) > 0
BEGIN
SET @t = LEFT(@Val, CHARINDEX('.', @Val))
INSERT @tbl (val) VALUES (@t)
SET @Val = RIGHT(@Val, LEN(@Val) - LEN(@t))
END
IF (LEN(@Val) > 0)
INSERT @tbl VALUES (@Val)
-- Filter out condition
DELETE FROM @tbl WHERE val LIKE @FilterCond
-- Join back into 1 string
DECLARE @i INT, @rv VARCHAR(2000)
SET @i = 1
WHILE @i <= (SELECT MAX(rid) FROM @tbl)
BEGIN
SELECT @rv = IsNull(@rv,'') + IsNull(val,'') FROM @tbl WHERE rid = @i
SET @i = @i + 1
END
RETURN @rv
END
go
CREATE TABLE #TMP (rid INT IDENTITY(1,1), sentence VARCHAR(2000))
INSERT #tmp (sentence) VALUES ('Hello world. Todays department has 345 employees. Have a good day.')
INSERT #tmp (sentence) VALUES ('Hello world. Todays department has 15 emps. Have a good day. Oh and by the way there are 12 employees somewhere else')
SELECT
rid, sentence, dbo.fn_SplitRemoveJoin(sentence, '%[0-9] Emp%')
FROM #tmp t
returns
返回
rid | sentence | |
1 | Hello world. Todays department has 345 employees. Have a good day. | Hello world. Have a good day.|
2 | Hello world. Todays department has 15 emps. Have a good day. Oh and by the way there are 12 employees somewhere else | Hello world. Have a good day. |
#2
2
I've used the split/remove/join technique as well.
我也使用了分割/删除/连接技术。
The main points are:
要点是:
- This uses a pair of recursive CTEs, rather than a UDF.
- 这使用一对递归cte,而不是UDF。
- This will work with all English sentence endings:
.
or!
or?
- 这将与所有英语句子结尾一起使用:。或!还是?
- This removes whitespace to make the comparison for "digit then employee" so you don't have to worry about multiple spaces and such.
- 这将删除空格,以便对“digit then employee”进行比较,因此您不必担心多个空格等。
Here's the SqlFiddle demo, and the code:
这是sql小提琴演示,代码如下:
-- Split descriptions into sentences (could use period, exclamation point, or question mark)
-- Delete any sentences that, without whitespace, are like '%[0-9]employ%'
-- Join sentences back into descriptions
;with Splitter as (
select ID
, ltrim(rtrim(Data)) as Data
, cast(null as varchar(max)) as Sentence
, 0 as SentenceNumber
from Descriptions -- Your table here
union all
select ID
, case when Data like '%[.!?]%' then right(Data, len(Data) - patindex('%[.!?]%', Data)) else null end
, case when Data like '%[.!?]%' then left(Data, patindex('%[.!?]%', Data)) else Data end
, SentenceNumber + 1
from Splitter
where Data is not null
), Joiner as (
select ID
, cast('' as varchar(max)) as Data
, 0 as SentenceNumber
from Splitter
group by ID
union all
select j.ID
, j.Data +
-- Don't want "digit+employ" sentences, remove whitespace to search
case when replace(replace(replace(replace(s.Sentence, char(9), ''), char(10), ''), char(13), ''), char(32), '') like '%[0-9]employ%' then '' else s.Sentence end
, s.SentenceNumber
from Joiner j
join Splitter s on j.ID = s.ID and s.SentenceNumber = j.SentenceNumber + 1
)
-- Final Select
select a.ID, a.Data
from Joiner a
join (
-- Only get max SentenceNumber
select ID, max(SentenceNumber) as SentenceNumber
from Joiner
group by ID
) b on a.ID = b.ID and a.SentenceNumber = b.SentenceNumber
order by a.ID, a.SentenceNumber
#3
0
One way to do this. Please note that it only works if you have one number in all sentences.
一种方法。请注意,只有在所有句子中有一个数字时,它才有效。
declare @d VARCHAR(1000) = 'Hello world. Todays department has 345 employees. Have a good day.'
declare @dr VARCHAR(1000)
set @dr = REVERSE(@d)
SELECT REVERSE(RIGHT(@dr,LEN(@dr) - CHARINDEX('.',@dr,PATINDEX('%[0-9]%',@dr))))
+ RIGHT(@d,LEN(@d) - CHARINDEX('.',@d,PATINDEX('%[0-9]%',@d)) + 1)