String Manipulation — REPLACE, PATINDEX and Regular Expressions

Finding the exact patter can be like trying to a certain snowflake.
Finding the exact patter can be like trying to a certain snowflake.

In the previous SQL posts, I showed you SUBSTRING, then I showed you REPLACE. Now, with REPLACE, you had to know what characters you wanted to replace.  What if you only knew what characters you wanted to keep?  Well, let me introduce you to PATINDEX.  This is an advanced command that is easy to learn, difficult to master.  I’ll bring this tool out of my tool belt in future posts to provide you more ways to use it.

PATINDEX let’s you define a string pattern.  Many strings you’ll store in your database will have a pattern.  A phone number (in america) has a very distinct pattern 3 digits of area code, 3 digits of prefix, and 4 digits telroot.  You may store this in several different ways, but you’ll always store the same pattern every time, otherwise, how will your users interpret the data your storing.

The example I’m going to use to introduce you to PATINDEX is based on the idea of a setting up a fixed format or pattern for your phone numbers.

But first, I need to include my standard disclaimer:

I would like to point out that you have to be careful when and how often you use the techniques below.  A good rule to keep in mind is text manipulation is slow and painful to a server.  If you can leave the text manipulation to your middle-ware or front end, that would be better.  But we all have been stuck in a situation where we needed to alter a string before those two points, and so I bring you the following lesson.

The Problem

96615551222 Ext. 249
+44 7930 555271

Looking at the examples in the column we know there characters in there other than 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0.  How can we easily remove the non numeric values from this column?  We can use regular expressions.  We can say, if it isn’t 0-9, remove it.

Now, the solution to this problem will use a WHILE loop.  This is something I’m not really going to go into too much detail on, but using a while loop can really eat up processor time and memory.  This is one of those times where this solution should only be used as a last resort, and only during non-peak server usage.  Before implementing this solution, I’d be sure to spend time putting together a fix on the front end, to scrub out these bad characters before they got into my database.  But, let’s come off the soapbox and discuss the solution, shall we?

The Solution

Let’s first look at the syntax for the PATINDEX function.

PATINDEX ('%pattern%', sourceString)

The PATINDEX function will search for a pattern in the SourceString, and return a bigint of where that pattern is found in the sourceString.  If it’s not found, then you get 0.  With this you can also provide a pattern ‘%[^0-9]%’ and it will search for characters that are not numbers.

But that’s only part of the solution.  You then need to couple that with REPLACE, so you can remove anything that is not a number.

--variable to hold the current phone number being updated
--variable to hold the original value (so we can update it)
DECLARE @originalPhone VARCHAR(20)
--if there is a phone number with something other than a number in it
        SELECT TOP 1
        FROM sourceTable (READCOMMITTED)
               PATINDEX('%[^0-9]%', phone) ) > 0)
        --store the original number
        SELECT TOP 1
               @originalPhone = phone
        FROM sourceTable (READCOMMITTED)
               PATINDEX('%[^0-9]%', phone) > 0
        --then copy it to a temporary variable
        SET @phone = @originalPhone
        --spin through the phone number one character at a time
        -- if there is a character other than a number
        WHILE(PATINDEX('%[^0-9]%', @phone) ) > 0
               --then remove that one character, then continue
         SET @phone = REPLACE(@phone
               , SUBSTRING(@phone, PATINDEX('%[^0-9]%', @phone), 1)
              , '')
      --once all the non numeric characters are gone, update
        UPDATE sourceTable SET
               Phone = @phone
               Phone = @originalPhone

If you’re having difficulties with the WHILE loop, search my blog for the entry on WHILE loops.  They’re pretty easy once you get to know them.

If you wanted to get fancy you could make the replace portion of this code a scalar function, and call it in an update step.  It would get rid of the explicit outer loop.  In future versions of SQL this may improve, but for SQL 2000, this is the best I’ve come up with.

There are many more string manipulation techniques to cover, so stay tuned to this same bat-channel!


By Shannon Lowder

Shannon Lowder is the Database Engineer you've been looking for! Look no further for expertise in: Business Analysis to gather the business requirements for the database; Database Architecting to design the logical design of the database; Database Development to actually build the objects needed by the business logic; finally, Database Administration to keep the database running in top form, and making sure there is a disaster recovery plan.


  1. I saw the some sql functions. i have very usefull the site. thank u.

  2. I am using SQL Server 2008 R2.

    I found a typo in the above code.

    –store the original number
    @originalPhone = phone
    FROM sourceTable (READCOMMITTED)
    PATINDEX(‘%[^0-9]%’, phone)

    Should be:

    –store the original number
    @originalPhone = phone
    FROM sourceTable (READCOMMITTED)
    PATINDEX(‘%[^0-9]%’, phone) > 0

    The missing code is the > 0 on the PATINDEX line. Just FYI.

  3. it does not work if there is sql character like ‘%’ in input string so what is the solution.
    i used replace but its not feasible solution as how many character replaced is a question.
    Collapse | Copy Code
    declare @phone varchar(50)
    set @phone=’212%@@f’

    select REPLACE(@phone
    , SUBSTRING(@phone, PATINDEX(‘%[^0-9a-zA-Z .()/-]%’, @phone), 1)
    , ”)

    1. Am I right in understanding that you’re looking to remove any characters that are not an alpha, number, space, period, left or right parenthesis, forward slash, or dash?

      If so, you’ll need a loop to fire your replace multiple times, because it’s only going to find the first instance of your PATINDEX. Try this:

      DECLARE @phone VARCHAR(50)
      SET @phone='212%@@f'

      WHILE PATINDEX('%[^0-9a-zA-Z .()/-]%', @phone) > 0
      SET @phone = REPLACE(
      , SUBSTRING(@phone, PATINDEX('%[^0-9a-zA-Z .()/-]%', @phone), 1)
      , '')
      SELECT @phone

      1. Hi Shannon,

        Thanks for the answer. For the sake of efficiency it would be better to store your PATINDEX call in a variable so that it only has to be called once each loop. To do that you’d have to add (something like) SET @Loc = PATINDEX(…) once before the WHILE loop and once just before the END statement in the loop.


        1. Thanks fo the heads up. Efficiency in coding solutions is definitely something I need to pick up as I move from database administration to data science. Any little improvement that can be made gets magnified when dealing with billions of rows. Looks like I need to go back and brush up on my Big-O notation.

Leave a comment

Your email address will not be published. Required fields are marked *