Solved

Replace all non numerical characters in a varchar field

Posted on 2013-05-21
12
1,306 Views
Last Modified: 2013-05-24
I'm trying to strip out everything except numbers from ptmorphcast. I have a select case statement and sort by ptmorphcast, ptmorphraw
SELECT CASE 
WHEN a.post_title LIKE '%Single Stream%' THEN REPLACE( a.post_title, 'Single Stream', '1 Stream' ) 
WHEN a.post_title LIKE '%Double Stream%' THEN REPLACE( a.post_title, 'Double Stream', '2 Stream' ) 
WHEN a.post_title LIKE '%Triple Stream%' THEN REPLACE( a.post_title, 'Triple Stream', '3 Stream' ) 
WHEN a.post_title LIKE '%Quad Stream%' THEN REPLACE( a.post_title, 'Quad Stream', '4 Stream' ) 
WHEN a.post_title LIKE '%Single Unit%' THEN REPLACE( a.post_title, 'Single Unit', '1 Unit' ) 
WHEN a.post_title LIKE '%Double Unit%' THEN REPLACE( a.post_title, 'Double Unit', '2 Unit' ) 
WHEN a.post_title LIKE '%Triple Unit%' THEN REPLACE( a.post_title, 'Triple Unit', '3 Unit' ) 
WHEN a.post_title LIKE '%Single Unit%' THEN REPLACE( a.post_title, 'Quad Unit', '4 Unit' ) 
WHEN a.post_title LIKE '%Single Series%' THEN REPLACE( a.post_title, 'Single Series', '1 Series' ) 
WHEN a.post_title LIKE '%Double Series%' THEN REPLACE( a.post_title, 'Double Series', '2 Series' ) 
WHEN a.post_title LIKE '%Triple Series%' THEN REPLACE( a.post_title, 'Triple Series', '3 Series' ) 
WHEN a.post_title LIKE '%Quad Series%' THEN REPLACE( a.post_title, 'Quad Series', '4 Series' ) 
ELSE a.post_title END AS ptmorphraw, 
REPLACE(a.post_title, '[^0-9]+', '') as ptmorphcast, a.post_title as ptmorph, a.* 
FROM posts a 
ORDER By ptmorphcast, ptmorphraw

Open in new window


Replace() function for ptmorphcast isn't updating any values.
0
Comment
Question by:m2ew
12 Comments
 
LVL 22

Expert Comment

by:Ivo Stoykov
ID: 39186704
there is MySQL user-defined function REGEXP_REPLACE.

HTH

Ivo Stoykov
0
 
LVL 34

Expert Comment

by:James0628
ID: 39186815
I haven't used MySQL, but from a little checking, it appears that the standard REPLACE doesn't support regexp's, so you would need another function, possibly user-created, like the one that ivostoykov mentioned.

 However, I think your regexp is also wrong.  I believe that's looking for two character strings that consist of a single character that's not a number, followed by a "+" ("a+", "B+", ".+", "++", etc.).  That doesn't sound like what you wanted, but I guess it would depend on your data.  Maybe that was supposed to be '[^0-9+]'  ?

 James
0
 

Author Comment

by:m2ew
ID: 39187936
@James0628 yes I just wanted to replace the numerical values; its really an issue where I have a bunch of products that include 100+ in the title (i.e. Product 102 Gallons) as they have to be moved so they appear after all the smaller sizes (i.e. Product 45 Gallons). I've considered either adding padding to smaller sizes.

Not sure which would be better,
Right now I'm working on a user defined function (below) to pull only numbers out and then reworking it into the replace function

DELIMITER $$
CREATE FUNCTION wwpSTRIP_NON_DIGIT(input VARCHAR(255))
   RETURNS VARCHAR(255)
BEGIN
  DECLARE output   VARCHAR(255) DEFAULT '';
  DECLARE iterator INT          DEFAULT 1;
  WHILE iterator < (LENGTH(input) + 1) DO
     IF SUBSTRING(input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
        SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
     END IF;
     SET iterator = iterator + 1;
  END WHILE;
  RETURN output;
END;
$$
DELIMITER ;

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 39188065
I think the easiest and quickest solution might involve a tally table (number table).  If you aren't familiar with these, read this article:
http:A_3573-A-MySQL-Tidbit-Quick-Numbers-Table-Generation.html

=========
Basically, you need a small (0-9 values only) table of numbers that you join up with your posts table in an Update query.  You use the number values in the Replace() function in the SET clause.

You could improve the performance of the query if you add a WHERE clause condition that looks for the numeric digit character in the post_title field.
0
 

Author Comment

by:m2ew
ID: 39188140
@aikimark I read over the query. I can try that method. It gets complex because there is the overall A-Z sort, Then Size (Gallons), and Stream count

Ie. Product 102 Gallon 4 Stream
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 45

Expert Comment

by:aikimark
ID: 39188200
@ m2ew

I don't know what your data looks like and don't understand how you are trying to sort.  I suspect that your data is not sufficiently normalized or tables improperly designed.  Some more detailed information would be helpful to the participating experts.
0
 

Author Comment

by:m2ew
ID: 39188228
@aikimark I'm working within wordpress setup, so I was referring to post_title field for sorting. The format "General Name ### Gallon # of Streams" is a general format based of the that field. I could split it into strings as the sorting priority goes

A-Z on General Name
Then ### Gallon
Then # Streams
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 39188350
@m2ew

This now seems like a parsing problem, since you need to extract different bits of the title in order to sort in the orders you described.  First, start by reading Kevin's related article, where he uses the tally/numbers table to parse a string.
http:A_3622-A-MySQL-Tidbit-In-line-CSV-Parsing.html

Do you have any other fields you can work with?  It would be much better if we were able to do this parsing once and save the parsed data in some other fields on the row.

If you have three fields, my approach would be to first split the title into two fields, based on the string before and after "Gallons".  Trim the result.  Populate a Gallons field with the number string following the last space in the first parsed string. Populate a Streams field with the number string before the first space in the second parsed string.  The raw General Name field is the trimmed string before the Gallons digits in the first parsed string.

========
Another way to do this is to go back to my first post and double the number of 'number' items by prepending and appending a space to our 0-9 values.  What you do, in essence, is to add your own delimiters to the title that you can split on with the Replace() function.

"Product Name 102 Gallon 45 of Streams" becomes
"Product Name^102^Gallon^45^of Streams"

Then you can use the splitting technique in Kevin's second article to split the string based on the "^" delimiter.  His article uses a comma delimiter, but I didn't want to take a chance that your general names might include a comma character.
0
 

Author Comment

by:m2ew
ID: 39188422
@aikimark Thanks for pointing me in the right direction.

I was thinking along the second option.

Concat() function (within replace()) actually came in handy with adding a leading character for the gallon size. I simplified the WHEN to check the digit length > 2 and exclude or filter streams (compartments) over 5; otherwise, wwpSTRIP_NON_DIGIT function would add that value. The majority of items are listed as Single Stream, Double Stream, Triple Stream & Quad Stream so its not an issue. None the less anything 5 & greater was creating a potential problem. Once I had the LENGTH() check, I set either "Z" or "0" as padding to the Gallon number. Query ran fast at  0.0348 sec as I'm limiting to 16 records on the pagination. I have to tweek this a bit more for 5 -8 streams, but works prefect for smaller sizes.

CASE 
   WHEN 
   LENGTH(wwpSTRIP_NON_DIGIT(a.post_title)) > 2 AND a.post_title NOT LIKE '%5 Stream%' AND a.post_title NOT LIKE '%6 Stream%'  AND a.post_title NOT LIKE '%7 Stream%'  AND a.post_title NOT LIKE '%8 Stream%'    
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '5 Stream', 'Stream' ))) > 2 
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '6 Stream', 'Stream' ))) > 2 
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '7 Stream', 'Stream' ))) > 2 
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '8 Stream', 'Stream' ))) > 2
  THEN 
    REPLACE 
       ( REPLACE 
           ( REPLACE
               (REPLACE 
                  (REPLACE (a. post_title, wwpSTRIP_NON_DIGIT (REPLACE ( a. post_title, 'Quad Stream', 'Stream' )), CONCAT("Z",wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, 'Quad Stream', 'Stream' )))), 
                  'Quad Stream', '4 Stream'),
                'Triple Stream', '3 Stream'),
             'Double Stream', '2 Stream'),
          'Single Stream', '1 Stream')
   ELSE
     REPLACE 
       ( REPLACE 
           ( REPLACE
               (REPLACE 
                  (REPLACE (a. post_title, wwpSTRIP_NON_DIGIT (REPLACE ( a. post_title, 'Quad Stream', 'Stream' )), CONCAT("0",wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, 'Quad Stream', 'Stream' )))), 
                  'Quad Stream', '4 Stream'),
                'Triple Stream', '3 Stream'),
             'Double Stream', '2 Stream'),
          'Single Stream', '1 Stream')
END as ptmorphraw

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39194943
I got a neglected question alert on this, and even though it's already closed, I wanted to share my thoughts.  Considering this:

General Name ### Gallon # of Streams

I see a "name" - just a string field that should not carry any inherent meaning.  The meaning of the field could be normalized out of this by taking the information in the ## components and putting it into a separate column.

The problem with trying to get "102 gallon" to come out after "45 gallon" is that the natural sorting order for character strings sees 1 as less than 4.  But if you have this number in a column of type INT, your WHERE clause will work to your advantage.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39194957
@Ray

I agree.  That was the gist of my table-normalization comment http:#a39188200
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Shadow IT is coming out of the shadows as more businesses are choosing cloud-based applications. It is now a multi-cloud world for most organizations. Simultaneously, most businesses have yet to consolidate with one cloud provider or define an offic…
The viewer will learn how to dynamically set the form action using jQuery.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now