如何在SQL Server中从字符串中剥离所有非字母字符?


Answers:


362

试试这个功能:

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    Declare @KeepValues as varchar(50)
    Set @KeepValues = '%[^a-z]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp
End

这样称呼它:

Select dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl')

一旦理解了代码,就应该看到更改代码以删除其他字符也相对简单。您甚至可以使它足够动态以传递您的搜索模式。

希望能帮助到你。


9
此代码删除非字母字符(因此数字也被删除)。如果要保留数字(除去非字母数字字符),则...用^ az ^ 0-9替换^ az该搜索字符串出现在代码中的两个不同位置。确保同时更换它们。
乔治·马斯特罗

26
从Jeff的评论来看:我认为,如果要剥离所有非字母和非数字,则需要'^ a-z0-9'(与'^ az ^ 0-9'相对,后者会将^保留在字符串中) 。
甚至Mien

1
+1乔治。这是“基于集合”的代码和内联标量函数的使用在击败逐行时非常困难的地方之一。做得很好。几年来,我还一直在使用具有相同基本形式的“初始上限”功能。
杰夫·摩登

6
@Lynchie将'%[^ az]%'更改为'%[^ az]%'基本上,只需在z后面放置一个空格字符。
乔治·马斯特罗斯

8
变量名KeepValues实际上与它的含义相反。KeepValues列出了字符需要排除..
nee21

167

参数化版本摹Mastros ' 真棒答案

CREATE FUNCTION [dbo].[fn_StripCharacters]
(
    @String NVARCHAR(MAX), 
    @MatchExpression VARCHAR(255)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    SET @MatchExpression =  '%['+@MatchExpression+']%'

    WHILE PatIndex(@MatchExpression, @String) > 0
        SET @String = Stuff(@String, PatIndex(@MatchExpression, @String), 1, '')

    RETURN @String

END

仅按字母顺序:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', '^a-z')

仅数字:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', '^0-9')

仅字母数字:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', '^a-z0-9')

非字母数字:

SELECT dbo.fn_StripCharacters('a1!s2@d3#f4$', 'a-z0-9')

3
我更喜欢此版本,并在向下滚动以投票支持之前,改编了G Mastros的答案!
earnshavian

regex模式似乎不适用于所有空格。如果我想剥离除字母数字字符和空格之外的所有特殊字符,我希望使用SELECT dbo.fn_StripCharacters('a1!s2 spaces @d3# f4$', '^a-z0-9\s')仍会剥离空格的字符。我也尝试使用,[[:blank:]]但这破坏了功能并且从字符串中没有删除任何内容。我得到的最接近的Ive通过使用:(SELECT dbo.fn_StripCharacters('a1!s2 spaces @d3# f4$', '^a-z0-9 ')在正则表达式模式中硬编码一个空格)。但是,这并不能消除换行符。
比利·麦基

2
@BillyMcKee在开头添加空格,而不是在正则表达式的结尾添加空格。SELECT dbo.fn_StripCharacters('a1!s2 spaces @d3# f4$', '^ a-z0-9')
迈克

8

信不信由你,在我的系统中,这个丑陋的功能比G Mastros优雅的功能要好。

CREATE FUNCTION dbo.RemoveSpecialChar (@s VARCHAR(256)) 
RETURNS VARCHAR(256) 
WITH SCHEMABINDING
    BEGIN
        IF @s IS NULL
            RETURN NULL
        DECLARE @s2 VARCHAR(256) = '',
                @l INT = LEN(@s),
                @p INT = 1

        WHILE @p <= @l
            BEGIN
                DECLARE @c INT
                SET @c = ASCII(SUBSTRING(@s, @p, 1))
                IF @c BETWEEN 48 AND 57
                   OR  @c BETWEEN 65 AND 90
                   OR  @c BETWEEN 97 AND 122
                    SET @s2 = @s2 + CHAR(@c)
                SET @p = @p + 1
            END

        IF LEN(@s2) = 0
            RETURN NULL

        RETURN @s2

常见的逗号,句号,空格等如何?
sojim

如果您ASCII在这里不使用整数,然后直接将输出SUBSTRING与某些字符进行比较,则有多少不同,例如:SET @ch=SUBSTRING(@s, @p, 1)IF @ch BETWEEN '0' AND '9' OR @ch BETWEEN 'a' AND 'z' OR @ch BETWEEN 'A' AND 'Z' ...
S.Serpooshan

像您的函数一样,将WITH SCHEMABINDING添加到他的函数中。您正在使用VARCHAR,他的功能正在使用NVARCHAR。如果要传递给他的函数的参数是VARCHAR,则应在其函数内使用VARCHAR而不是NVARCHAR,否则,系统需要先将字符串值从VARCHAR强制转换为NVARCHAR,然后才能执行该函数,这会增加开销。即使进行了这些更改,您的功能仍然可能会更快,但是这些只是一些示例,我可以看到他的功能在您遇到的情况下可能会为您带来较慢的性能。
埃里克

1
他的函数也使用NVARCHAR(MAX),而您的函数使用VARCHAR(256)。如果只需要256,则将其功能也更改为使用VARCHAR(256),他的功能将为您更快地工作。
埃里克

5

我知道SQL在字符串操作方面很糟糕,但是我并不认为这会很困难。这是一个简单的函数,可从字符串中去除所有数字。会有更好的方法来执行此操作,但这只是一个开始。

CREATE FUNCTION dbo.AlphaOnly (
    @String varchar(100)
)
RETURNS varchar(100)
AS BEGIN
  RETURN (
    REPLACE(
      REPLACE(
        REPLACE(
          REPLACE(
            REPLACE(
              REPLACE(
                REPLACE(
                  REPLACE(
                    REPLACE(
                      REPLACE(
                        @String,
                      '9', ''),
                    '8', ''),
                  '7', ''),
                '6', ''),
              '5', ''),
            '4', ''),
          '3', ''),
        '2', ''),
      '1', ''),
    '0', '')
  )
END
GO

-- ==================
DECLARE @t TABLE (
    ColID       int,
    ColString   varchar(50)
)

INSERT INTO @t VALUES (1, 'abc1234567890')

SELECT ColID, ColString, dbo.AlphaOnly(ColString)
FROM @t

输出量

ColID ColString
----- ------------- ---
    1 abc1234567890 abc

第二轮-数据驱动黑名单

-- ============================================
-- Create a table of blacklist characters
-- ============================================
IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.CharacterBlacklist'))
  DROP TABLE dbo.CharacterBlacklist
GO
CREATE TABLE dbo.CharacterBlacklist (
    CharID              int         IDENTITY,
    DisallowedCharacter nchar(1)    NOT NULL
)
GO
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'0')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'1')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'2')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'3')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'4')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'5')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'6')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'7')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'8')
INSERT INTO dbo.CharacterBlacklist (DisallowedCharacter) VALUES (N'9')
GO

-- ====================================
IF EXISTS (SELECT * FROM sys.objects WHERE [object_id] = OBJECT_ID('dbo.StripBlacklistCharacters'))
  DROP FUNCTION dbo.StripBlacklistCharacters
GO
CREATE FUNCTION dbo.StripBlacklistCharacters (
    @String nvarchar(100)
)
RETURNS varchar(100)
AS BEGIN
  DECLARE @blacklistCt  int
  DECLARE @ct           int
  DECLARE @c            nchar(1)

  SELECT @blacklistCt = COUNT(*) FROM dbo.CharacterBlacklist

  SET @ct = 0
  WHILE @ct < @blacklistCt BEGIN
    SET @ct = @ct + 1

    SELECT @String = REPLACE(@String, DisallowedCharacter, N'')
    FROM dbo.CharacterBlacklist
    WHERE CharID = @ct
  END

  RETURN (@String)
END
GO

-- ====================================
DECLARE @s  nvarchar(24)
SET @s = N'abc1234def5678ghi90jkl'

SELECT
    @s                  AS OriginalString,
    dbo.StripBlacklistCharacters(@s)   AS ResultString

输出量

OriginalString           ResultString
------------------------ ------------
abc1234def5678ghi90jkl   abcdefghijkl

我对读者的挑战:您能否提高效率?那使用递归呢?


您可能会使用sommarskog.se/arrays-in-sql-2005.html#tblnum数字表加入黑名单表中而没有循环就可以编写出更好的dbo.StripBlacklistCharacters() ,但今天我懒得尝试我自己...
KM。

4

如果您像我一样,并且无权仅向生产数据中添加功能,但仍然希望执行这种过滤,那么这是一个纯SQL解决方案,使用PIVOT表将过滤后的片段重新放在一起。

注意:我将表格硬编码为40个字符,如果要过滤的字符串较长,则必须添加更多字符。

SET CONCAT_NULL_YIELDS_NULL OFF;

with 
    ToBeScrubbed
as (
    select 1 as id, '*SOME 222@ !@* #* BOGUS !@*&! DATA' as ColumnToScrub
),

Scrubbed as (
    select 
        P.Number as ValueOrder,
        isnull ( substring ( t.ColumnToScrub , number , 1 ) , '' ) as ScrubbedValue,
        t.id
    from
        ToBeScrubbed t
        left join master..spt_values P
            on P.number between 1 and len(t.ColumnToScrub)
            and type ='P'
    where
        PatIndex('%[^a-z]%', substring(t.ColumnToScrub,P.number,1) ) = 0
)

SELECT
    id, 
    [1]+ [2]+ [3]+ [4]+ [5]+ [6]+ [7]+ [8] +[9] +[10]
    +  [11]+ [12]+ [13]+ [14]+ [15]+ [16]+ [17]+ [18] +[19] +[20]
    +  [21]+ [22]+ [23]+ [24]+ [25]+ [26]+ [27]+ [28] +[29] +[30]
    +  [31]+ [32]+ [33]+ [34]+ [35]+ [36]+ [37]+ [38] +[39] +[40] as ScrubbedData
FROM (
    select 
        *
    from 
        Scrubbed
    ) 
    src
    PIVOT (
        MAX(ScrubbedValue) FOR ValueOrder IN (
        [1], [2], [3], [4], [5], [6], [7], [8], [9], [10],
        [11], [12], [13], [14], [15], [16], [17], [18], [19], [20],
        [21], [22], [23], [24], [25], [26], [27], [28], [29], [30],
        [31], [32], [33], [34], [35], [36], [37], [38], [39], [40]
        )
    ) pvt

对于我来说,此解决方案比在一组235K行上使用函数快2.3倍。我还必须进行2次更换,并且总共使用了四个CTE。像冠军一样工作。
JJS 2014年

4

看完所有给定的解决方案后,我认为必须有一种纯SQL方法,该函数不需要函数或CTE / XML查询,并且不涉及难以维护嵌套的REPLACE语句。这是我的解决方案:

SELECT 
  x
  ,CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 1, 1) + '%' THEN '' ELSE SUBSTRING(x, 1, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 2, 1) + '%' THEN '' ELSE SUBSTRING(x, 2, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 3, 1) + '%' THEN '' ELSE SUBSTRING(x, 3, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 4, 1) + '%' THEN '' ELSE SUBSTRING(x, 4, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 5, 1) + '%' THEN '' ELSE SUBSTRING(x, 5, 1) END
    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, 6, 1) + '%' THEN '' ELSE SUBSTRING(x, 6, 1) END
-- Keep adding rows until you reach the column size 
    AS stripped_column
FROM (SELECT 
        column_to_strip AS x
        ,'ABCDEFGHIJKLMNOPQRSTUVWXYZ' AS a 
      FROM my_table) a

这样做的好处是,有效字符包含在子查询的一个字符串中,从而可以轻松地为另一组字符重新配置。

缺点是您必须为每个字符添加一行SQL,直到列的大小为止。为了简化该任务,我仅使用下面的Powershell脚本,该示例用于VARCHAR(64):

1..64 | % {
  "    + CASE WHEN a NOT LIKE '%' + SUBSTRING(x, {0}, 1) + '%' THEN '' ELSE SUBSTRING(x, {0}, 1) END" -f $_
} | clip.exe

3
在一般情况下很尴尬,但对于带有狭窄列的一次性查询而言,它很容易且有用。
埃里克·J.

3

这是使用删除非字母字符的另一种方法iTVF。首先,您需要一个基于模式的字符串拆分器。这是摘自德恩·坎普的文章

-- PatternSplitCM will split a string based on a pattern of the form 
-- supported by LIKE and PATINDEX 
-- 
-- Created by: Chris Morris 12-Oct-2012 
CREATE FUNCTION [dbo].[PatternSplitCM]
(
       @List                VARCHAR(8000) = NULL
       ,@Pattern            VARCHAR(50)
) RETURNS TABLE WITH SCHEMABINDING 
AS 

RETURN
    WITH numbers AS (
        SELECT TOP(ISNULL(DATALENGTH(@List), 0))
            n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
        FROM
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
        (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)
    )

    SELECT
        ItemNumber = ROW_NUMBER() OVER(ORDER BY MIN(n)),
        Item = SUBSTRING(@List,MIN(n),1+MAX(n)-MIN(n)),
        [Matched]
    FROM (
        SELECT n, y.[Matched], Grouper = n - ROW_NUMBER() OVER(ORDER BY y.[Matched],n)
        FROM numbers
        CROSS APPLY (
            SELECT [Matched] = CASE WHEN SUBSTRING(@List,n,1) LIKE @Pattern THEN 1 ELSE 0 END
        ) y
    ) d
    GROUP BY [Matched], Grouper

现在您有了基于模式的拆分器,您需要拆分与模式匹配的字符串:

[a-z]

然后将它们串联起来以获得所需的结果:

SELECT *
FROM tbl t
CROSS APPLY(
    SELECT Item + ''
    FROM dbo.PatternSplitCM(t.str, '[a-z]')
    WHERE Matched = 1
    ORDER BY ItemNumber
    FOR XML PATH('')
) x (a)

样品

结果:

| Id |              str |              a |
|----|------------------|----------------|
|  1 |    testte d'abc |     testtedabc |
|  2 |            anr¤a |           anra |
|  3 |  gs-re-C“te d'ab |     gsreCtedab |
|  4 |         Mfe, DF |          MfeDF |
|  5 |           Rtemd |          Rtemd |
|  6 |          jadji |          jadji |
|  7 |      Cje y ret¢n |       Cjeyretn |
|  8 |        Jklbalu |        Jklbalu |
|  9 |       lene-iokd |       leneiokd |
| 10 |   liode-Pyrnie |    liodePyrnie |
| 11 |         Vs Gta |          VsGta |
| 12 |        Sƒo Paulo |        SoPaulo |
| 13 |  vAstra gAtaland | vAstragAtaland |
| 14 |  ¥uble / Bio-Bio |     ubleBioBio |
| 15 | Upln/ds VAsb-y |    UplndsVAsby |

与其他答案相比,使用它有什么好处吗?
S.Serpooshan

2

此解决方案受艾伦先生的解决方案启发,需要一个Numbers整数表(如果要进行性能良好的严肃查询操作,则应具有整数表)。它不需要CTE。您可以更改NOT IN (...)表达式以排除特定字符,或者将其更改为IN (...)OR LIKE表达式以仅保留某些字符。

SELECT (
    SELECT  SUBSTRING([YourString], N, 1)
    FROM    dbo.Numbers
    WHERE   N > 0 AND N <= CONVERT(INT, LEN([YourString]))
        AND SUBSTRING([YourString], N, 1) NOT IN ('(',')',',','.')
    FOR XML PATH('')
) AS [YourStringTransformed]
FROM ...

一个无关问题的有趣解决方案。
2016年

2

这是一个不需要创建函数或列出所有要替换字符实例的解决方案。它结合使用递归WITH语句和PATINDEX来查找不需要的字符。它将替换列中所有不需要的字符-任何给定的字符串中最多包含100个唯一的坏字符。(例如,EG“ ABC123DEF234”将包含4个错误字符1、2、3和4)。100的限制是WITH语句中允许的最大递归数,但这并不对要处理的行数施加限制。仅受可用内存的限制。
如果您不想获得DISTINCT结果,则可以从代码中删除这两个选项。

-- Create some test data:
SELECT * INTO #testData 
FROM (VALUES ('ABC DEF,K.l(p)'),('123H,J,234'),('ABCD EFG')) as t(TXT)

-- Actual query:
-- Remove non-alpha chars: '%[^A-Z]%'
-- Remove non-alphanumeric chars: '%[^A-Z0-9]%'
DECLARE @BadCharacterPattern VARCHAR(250) = '%[^A-Z]%';

WITH recurMain as (
    SELECT DISTINCT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
    FROM #testData
    UNION ALL
    SELECT CAST(TXT AS VARCHAR(250)) AS TXT, PATINDEX(@BadCharacterPattern, TXT) AS BadCharIndex
    FROM (
        SELECT 
            CASE WHEN BadCharIndex > 0 
                THEN REPLACE(TXT, SUBSTRING(TXT, BadCharIndex, 1), '')
                ELSE TXT 
            END AS TXT
        FROM recurMain
        WHERE BadCharIndex > 0
    ) badCharFinder
)
SELECT DISTINCT TXT
FROM recurMain
WHERE BadCharIndex = 0;

1

我把这放在调用PatIndex的两个地方。

PatIndex('%[^A-Za-z0-9]%', @Temp)

对于RemoveNonAlphaCharacters上方的自定义函数,并将其重命名为RemoveNonAlphaNumericCharacters


1

-首先创建一个功能

CREATE FUNCTION [dbo].[GetNumericonly]
(@strAlphaNumeric VARCHAR(256))
RETURNS VARCHAR(256)
AS
BEGIN
     DECLARE @intAlpha INT
     SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric)
BEGIN
     WHILE @intAlpha > 0
   BEGIN
          SET @strAlphaNumeric = STUFF(@strAlphaNumeric, @intAlpha, 1, '' )
          SET @intAlpha = PATINDEX('%[^0-9]%', @strAlphaNumeric )
   END
END
RETURN ISNULL(@strAlphaNumeric,0)
END

现在像这样调用此函数

select [dbo].[GetNumericonly]('Abhi12shek23jaiswal')

其结果像

1223

1

从性能角度来看,我将使用内联函数:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[udf_RemoveNumericCharsFromString]
(
@List NVARCHAR(4000)
)
RETURNS TABLE 
AS RETURN

    WITH GetNums AS (
       SELECT TOP(ISNULL(DATALENGTH(@List), 0))
        n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
        FROM
          (VALUES (0),(0),(0),(0)) d (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f (n),
          (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) g (n)
            )

    SELECT StrOut = ''+
        (SELECT Chr
         FROM GetNums
            CROSS APPLY (SELECT SUBSTRING(@List , n,1)) X(Chr)
         WHERE Chr LIKE '%[^0-9]%' 
         ORDER BY N
         FOR XML PATH (''),TYPE).value('.','NVARCHAR(MAX)')


   /*How to Use
   SELECT StrOut FROM dbo.udf_RemoveNumericCharsFromString ('vv45--9gut')
   Result: vv--gut
   */

我知道这个线程很旧,但是,内联表值函数是可行的方法。解决方案的问题是,因为您只返回数字,所以不需要此代码:),TYPE).value('。','NVARCHAR(MAX)'),它将使函数运行速度降低约50%
艾伦·伯斯坦(Alan Burstein)

1

这里还有一个递归CTE的解决方案,基于@Gerhard魏斯的答案在这里。您应该能够将整个代码块复制并粘贴到SSMS中,然后在其中运行。结果包括一些额外的列,以帮助我们了解发生了什么。我花了一段时间才了解PATINDEX(RegEx)和递归CTE的所有功能。

DECLARE @DefineBadCharPattern varchar(30)
SET @DefineBadCharPattern = '%[^A-z]%'  --Means anything NOT between A and z characters (according to ascii char value) is "bad"
SET @DefineBadCharPattern = '%[^a-z0-9]%'  --Means anything NOT between a and z characters or numbers 0 through 9 (according to ascii char value) are "bad"
SET @DefineBadCharPattern = '%[^ -~]%'  --Means anything NOT between space and ~ characters (all non-printable characters) is "bad"
--Change @ReplaceBadCharWith to '' to strip "bad" characters from string
--Change to some character if you want to 'see' what's being replaced. NOTE: It must be allowed accoring to @DefineBadCharPattern above
DECLARE @ReplaceBadCharWith varchar(1) = '#'  --Change this to whatever you want to replace non-printable chars with 
IF patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, @ReplaceBadCharWith) > 0
    BEGIN
        RAISERROR('@ReplaceBadCharWith value (%s) must be a character allowed by PATINDEX pattern of %s',16,1,@ReplaceBadCharWith, @DefineBadCharPattern)
        RETURN
    END
--A table of values to play with:
DECLARE @temp TABLE (OriginalString varchar(100))
INSERT @temp SELECT ' 1hello' + char(13) + char(10) + 'there' + char(30) + char(9) + char(13) + char(10)
INSERT @temp SELECT '2hello' + char(30) + 'there' + char(30)
INSERT @temp SELECT ' 3hello there'
INSERT @temp SELECT ' tab' + char(9) + ' character'
INSERT @temp SELECT 'good bye'

--Let the magic begin:
;WITH recurse AS (
    select
    OriginalString,
    OriginalString as CleanString,
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString) as [Position],
    substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1) as [InvalidCharacter],
    ascii(substring(OriginalString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN, OriginalString),1)) as [ASCIICode]
    from @temp
   UNION ALL
    select
    OriginalString,
    CONVERT(varchar(100),REPLACE(CleanString,InvalidCharacter,@ReplaceBadCharWith)),
    patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) as [Position],
    substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1),
    ascii(substring(CleanString,patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString),1))
    from recurse
    where patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) > 0
)
SELECT * FROM recurse
--optionally comment out this last WHERE clause to see more of what the recursion is doing:
WHERE patindex(@DefineBadCharPattern COLLATE Latin1_General_BIN,CleanString) = 0

0

使用CTE生成的数字表检查每个字符,然后使用FOR XML连接到一串保留值,您可以...

CREATE FUNCTION [dbo].[PatRemove](
    @pattern varchar(50),
    @expression varchar(8000) 
    )
RETURNS varchar(8000)
AS
BEGIN
    WITH 
        d(d) AS (SELECT d FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) digits(d)),
        nums(n) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM d d1, d d2, d d3, d d4),
        chars(c) AS (SELECT SUBSTRING(@expression, n, 1) FROM nums WHERE n <= LEN(@expression))
    SELECT 
        @expression = (SELECT c AS [text()] FROM chars WHERE c NOT LIKE @pattern FOR XML PATH(''));

    RETURN @expression;
END

0
DECLARE @vchVAlue NVARCHAR(255) = 'SWP, Lettering Position 1: 4 Ω, 2: 8 Ω, 3: 16 Ω, 4:  , 5:  , 6:  , Voltage Selector, Solder, 6, Step switch, : w/o fuseholder '


WHILE PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))) > 0
  BEGIN
    SELECT @vchVAlue = STUFF(@vchVAlue,PATINDEX('%?%' , CAST(@vchVAlue AS VARCHAR(255))),1,' ')
  END 

SELECT @vchVAlue

0

这种方式对我不起作用,因为我试图保留阿拉伯字母以替换正则表达式,但也无效。我写了另一种在ASCII级别上工作的方法,因为这是我唯一的选择,而且行得通。

 Create function [dbo].[RemoveNonAlphaCharacters] (@s varchar(4000)) returns varchar(4000)
   with schemabinding
begin
   if @s is null
      return null
   declare @s2 varchar(4000)
   set @s2 = ''
   declare @l int
   set @l = len(@s)
   declare @p int
   set @p = 1
   while @p <= @l begin
      declare @c int
      set @c = ascii(substring(@s, @p, 1))
      if @c between 48 and 57 or @c between 65 and 90 or @c between 97 and 122 or @c between 165 and 253 or @c between 32 and 33
         set @s2 = @s2 + char(@c)
      set @p = @p + 1
      end
   if len(@s2) = 0
      return null
   return @s2
   end


-1

尽管帖子有些陈旧,但我想说以下内容。上述解决方案的问题在于它不会过滤掉ç,ë,ï等字符。我按如下方式修改了函数(我仅使用80 varchar字符串来节省内存):

create FUNCTION dbo.udf_Cleanchars (@InputString varchar(80)) 
RETURNS varchar(80) 
AS 

BEGIN 
declare @return varchar(80) , @length int , @counter int , @cur_char char(1) 
SET @return = '' 
SET @length = 0 
SET @counter = 1 
SET @length = LEN(@InputString) 
IF @length > 0 
BEGIN WHILE @counter <= @length 

BEGIN SET @cur_char = SUBSTRING(@InputString, @counter, 1) IF ((ascii(@cur_char) in (32,44,46)) or (ascii(@cur_char) between 48 and 57) or (ascii(@cur_char) between 65 and 90) or (ascii(@cur_char) between 97 and 122))
BEGIN SET @return = @return + @cur_char END 
SET @counter = @counter + 1 
END END 

RETURN @return END

谢谢你,埃里克。如您所说,标记为答案的帖子非常好,但是不会去除½等愚蠢的“数字”字符。
troy

-3

如果您正在使用Oracle 10g,我刚刚发现它已内置。我必须删除所有特殊字符以进行电话号码比较。

regexp_replace(c.phone, '[^0-9]', '')

5
“ SQL Server”专门指Microsoft的产品。
没人
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.