Solved

Replace all non numerical characters in a varchar field

Posted on 2013-05-21
12
1,414 Views
Last Modified: 2013-05-24
I'm trying to strip out everything except numbers from ptmorphcast. I have a select case statement and sort by ptmorphcast, ptmorphraw
SELECT CASE 
WHEN a.post_title LIKE '%Single Stream%' THEN REPLACE( a.post_title, 'Single Stream', '1 Stream' ) 
WHEN a.post_title LIKE '%Double Stream%' THEN REPLACE( a.post_title, 'Double Stream', '2 Stream' ) 
WHEN a.post_title LIKE '%Triple Stream%' THEN REPLACE( a.post_title, 'Triple Stream', '3 Stream' ) 
WHEN a.post_title LIKE '%Quad Stream%' THEN REPLACE( a.post_title, 'Quad Stream', '4 Stream' ) 
WHEN a.post_title LIKE '%Single Unit%' THEN REPLACE( a.post_title, 'Single Unit', '1 Unit' ) 
WHEN a.post_title LIKE '%Double Unit%' THEN REPLACE( a.post_title, 'Double Unit', '2 Unit' ) 
WHEN a.post_title LIKE '%Triple Unit%' THEN REPLACE( a.post_title, 'Triple Unit', '3 Unit' ) 
WHEN a.post_title LIKE '%Single Unit%' THEN REPLACE( a.post_title, 'Quad Unit', '4 Unit' ) 
WHEN a.post_title LIKE '%Single Series%' THEN REPLACE( a.post_title, 'Single Series', '1 Series' ) 
WHEN a.post_title LIKE '%Double Series%' THEN REPLACE( a.post_title, 'Double Series', '2 Series' ) 
WHEN a.post_title LIKE '%Triple Series%' THEN REPLACE( a.post_title, 'Triple Series', '3 Series' ) 
WHEN a.post_title LIKE '%Quad Series%' THEN REPLACE( a.post_title, 'Quad Series', '4 Series' ) 
ELSE a.post_title END AS ptmorphraw, 
REPLACE(a.post_title, '[^0-9]+', '') as ptmorphcast, a.post_title as ptmorph, a.* 
FROM posts a 
ORDER By ptmorphcast, ptmorphraw

Open in new window


Replace() function for ptmorphcast isn't updating any values.
0
Comment
Question by:m2ew
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
12 Comments
 
LVL 22

Expert Comment

by:Ivo Stoykov
ID: 39186704
there is MySQL user-defined function REGEXP_REPLACE.

HTH

Ivo Stoykov
0
 
LVL 35

Expert Comment

by:James0628
ID: 39186815
I haven't used MySQL, but from a little checking, it appears that the standard REPLACE doesn't support regexp's, so you would need another function, possibly user-created, like the one that ivostoykov mentioned.

 However, I think your regexp is also wrong.  I believe that's looking for two character strings that consist of a single character that's not a number, followed by a "+" ("a+", "B+", ".+", "++", etc.).  That doesn't sound like what you wanted, but I guess it would depend on your data.  Maybe that was supposed to be '[^0-9+]'  ?

 James
0
 

Author Comment

by:m2ew
ID: 39187936
@James0628 yes I just wanted to replace the numerical values; its really an issue where I have a bunch of products that include 100+ in the title (i.e. Product 102 Gallons) as they have to be moved so they appear after all the smaller sizes (i.e. Product 45 Gallons). I've considered either adding padding to smaller sizes.

Not sure which would be better,
Right now I'm working on a user defined function (below) to pull only numbers out and then reworking it into the replace function

DELIMITER $$
CREATE FUNCTION wwpSTRIP_NON_DIGIT(input VARCHAR(255))
   RETURNS VARCHAR(255)
BEGIN
  DECLARE output   VARCHAR(255) DEFAULT '';
  DECLARE iterator INT          DEFAULT 1;
  WHILE iterator < (LENGTH(input) + 1) DO
     IF SUBSTRING(input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
        SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
     END IF;
     SET iterator = iterator + 1;
  END WHILE;
  RETURN output;
END;
$$
DELIMITER ;

Open in new window

0
Containers and Docker for Everyone

Containers are an incredibly powerful technology that can provide you and/or your engineering team with huge productivity gains. Using containers, you can deploy, back up, replicate, and move apps and their dependencies quickly and easily.

 
LVL 45

Expert Comment

by:aikimark
ID: 39188065
I think the easiest and quickest solution might involve a tally table (number table).  If you aren't familiar with these, read this article:
http:A_3573-A-MySQL-Tidbit-Quick-Numbers-Table-Generation.html

=========
Basically, you need a small (0-9 values only) table of numbers that you join up with your posts table in an Update query.  You use the number values in the Replace() function in the SET clause.

You could improve the performance of the query if you add a WHERE clause condition that looks for the numeric digit character in the post_title field.
0
 

Author Comment

by:m2ew
ID: 39188140
@aikimark I read over the query. I can try that method. It gets complex because there is the overall A-Z sort, Then Size (Gallons), and Stream count

Ie. Product 102 Gallon 4 Stream
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39188200
@ m2ew

I don't know what your data looks like and don't understand how you are trying to sort.  I suspect that your data is not sufficiently normalized or tables improperly designed.  Some more detailed information would be helpful to the participating experts.
0
 

Author Comment

by:m2ew
ID: 39188228
@aikimark I'm working within wordpress setup, so I was referring to post_title field for sorting. The format "General Name ### Gallon # of Streams" is a general format based of the that field. I could split it into strings as the sorting priority goes

A-Z on General Name
Then ### Gallon
Then # Streams
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 39188350
@m2ew

This now seems like a parsing problem, since you need to extract different bits of the title in order to sort in the orders you described.  First, start by reading Kevin's related article, where he uses the tally/numbers table to parse a string.
http:A_3622-A-MySQL-Tidbit-In-line-CSV-Parsing.html

Do you have any other fields you can work with?  It would be much better if we were able to do this parsing once and save the parsed data in some other fields on the row.

If you have three fields, my approach would be to first split the title into two fields, based on the string before and after "Gallons".  Trim the result.  Populate a Gallons field with the number string following the last space in the first parsed string. Populate a Streams field with the number string before the first space in the second parsed string.  The raw General Name field is the trimmed string before the Gallons digits in the first parsed string.

========
Another way to do this is to go back to my first post and double the number of 'number' items by prepending and appending a space to our 0-9 values.  What you do, in essence, is to add your own delimiters to the title that you can split on with the Replace() function.

"Product Name 102 Gallon 45 of Streams" becomes
"Product Name^102^Gallon^45^of Streams"

Then you can use the splitting technique in Kevin's second article to split the string based on the "^" delimiter.  His article uses a comma delimiter, but I didn't want to take a chance that your general names might include a comma character.
0
 

Author Comment

by:m2ew
ID: 39188422
@aikimark Thanks for pointing me in the right direction.

I was thinking along the second option.

Concat() function (within replace()) actually came in handy with adding a leading character for the gallon size. I simplified the WHEN to check the digit length > 2 and exclude or filter streams (compartments) over 5; otherwise, wwpSTRIP_NON_DIGIT function would add that value. The majority of items are listed as Single Stream, Double Stream, Triple Stream & Quad Stream so its not an issue. None the less anything 5 & greater was creating a potential problem. Once I had the LENGTH() check, I set either "Z" or "0" as padding to the Gallon number. Query ran fast at  0.0348 sec as I'm limiting to 16 records on the pagination. I have to tweek this a bit more for 5 -8 streams, but works prefect for smaller sizes.

CASE 
   WHEN 
   LENGTH(wwpSTRIP_NON_DIGIT(a.post_title)) > 2 AND a.post_title NOT LIKE '%5 Stream%' AND a.post_title NOT LIKE '%6 Stream%'  AND a.post_title NOT LIKE '%7 Stream%'  AND a.post_title NOT LIKE '%8 Stream%'    
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '5 Stream', 'Stream' ))) > 2 
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '6 Stream', 'Stream' ))) > 2 
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '7 Stream', 'Stream' ))) > 2 
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '8 Stream', 'Stream' ))) > 2
  THEN 
    REPLACE 
       ( REPLACE 
           ( REPLACE
               (REPLACE 
                  (REPLACE (a. post_title, wwpSTRIP_NON_DIGIT (REPLACE ( a. post_title, 'Quad Stream', 'Stream' )), CONCAT("Z",wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, 'Quad Stream', 'Stream' )))), 
                  'Quad Stream', '4 Stream'),
                'Triple Stream', '3 Stream'),
             'Double Stream', '2 Stream'),
          'Single Stream', '1 Stream')
   ELSE
     REPLACE 
       ( REPLACE 
           ( REPLACE
               (REPLACE 
                  (REPLACE (a. post_title, wwpSTRIP_NON_DIGIT (REPLACE ( a. post_title, 'Quad Stream', 'Stream' )), CONCAT("0",wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, 'Quad Stream', 'Stream' )))), 
                  'Quad Stream', '4 Stream'),
                'Triple Stream', '3 Stream'),
             'Double Stream', '2 Stream'),
          'Single Stream', '1 Stream')
END as ptmorphraw

Open in new window

0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 39194943
I got a neglected question alert on this, and even though it's already closed, I wanted to share my thoughts.  Considering this:

General Name ### Gallon # of Streams

I see a "name" - just a string field that should not carry any inherent meaning.  The meaning of the field could be normalized out of this by taking the information in the ## components and putting it into a separate column.

The problem with trying to get "102 gallon" to come out after "45 gallon" is that the natural sorting order for character strings sees 1 as less than 4.  But if you have this number in a column of type INT, your WHERE clause will work to your advantage.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39194957
@Ray

I agree.  That was the gist of my table-normalization comment http:#a39188200
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article was originally published on Monitis Blog, you can check it here . Today it’s fairly well known that high-performing websites and applications bring in more visitors, higher SEO, and ultimately more sales. By the same token, downtime…
This post contains step-by-step instructions for setting up alerting in Percona Monitoring and Management (PMM) using Grafana.
This tutorial demonstrates how to identify and create boundary or building outlines in Google Maps. In this example, I outline the boundaries of an enclosed skatepark within a community park.  Login to your Google Account, then  Google for "Google M…
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question