• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1603
  • Last Modified:

Replace all non numerical characters in a varchar field

I'm trying to strip out everything except numbers from ptmorphcast. I have a select case statement and sort by ptmorphcast, ptmorphraw
SELECT CASE 
WHEN a.post_title LIKE '%Single Stream%' THEN REPLACE( a.post_title, 'Single Stream', '1 Stream' ) 
WHEN a.post_title LIKE '%Double Stream%' THEN REPLACE( a.post_title, 'Double Stream', '2 Stream' ) 
WHEN a.post_title LIKE '%Triple Stream%' THEN REPLACE( a.post_title, 'Triple Stream', '3 Stream' ) 
WHEN a.post_title LIKE '%Quad Stream%' THEN REPLACE( a.post_title, 'Quad Stream', '4 Stream' ) 
WHEN a.post_title LIKE '%Single Unit%' THEN REPLACE( a.post_title, 'Single Unit', '1 Unit' ) 
WHEN a.post_title LIKE '%Double Unit%' THEN REPLACE( a.post_title, 'Double Unit', '2 Unit' ) 
WHEN a.post_title LIKE '%Triple Unit%' THEN REPLACE( a.post_title, 'Triple Unit', '3 Unit' ) 
WHEN a.post_title LIKE '%Single Unit%' THEN REPLACE( a.post_title, 'Quad Unit', '4 Unit' ) 
WHEN a.post_title LIKE '%Single Series%' THEN REPLACE( a.post_title, 'Single Series', '1 Series' ) 
WHEN a.post_title LIKE '%Double Series%' THEN REPLACE( a.post_title, 'Double Series', '2 Series' ) 
WHEN a.post_title LIKE '%Triple Series%' THEN REPLACE( a.post_title, 'Triple Series', '3 Series' ) 
WHEN a.post_title LIKE '%Quad Series%' THEN REPLACE( a.post_title, 'Quad Series', '4 Series' ) 
ELSE a.post_title END AS ptmorphraw, 
REPLACE(a.post_title, '[^0-9]+', '') as ptmorphcast, a.post_title as ptmorph, a.* 
FROM posts a 
ORDER By ptmorphcast, ptmorphraw

Open in new window


Replace() function for ptmorphcast isn't updating any values.
0
m2ew
Asked:
m2ew
1 Solution
 
Ivo StoykovCommented:
there is MySQL user-defined function REGEXP_REPLACE.

HTH

Ivo Stoykov
0
 
James0628Commented:
I haven't used MySQL, but from a little checking, it appears that the standard REPLACE doesn't support regexp's, so you would need another function, possibly user-created, like the one that ivostoykov mentioned.

 However, I think your regexp is also wrong.  I believe that's looking for two character strings that consist of a single character that's not a number, followed by a "+" ("a+", "B+", ".+", "++", etc.).  That doesn't sound like what you wanted, but I guess it would depend on your data.  Maybe that was supposed to be '[^0-9+]'  ?

 James
0
 
m2ewAuthor Commented:
@James0628 yes I just wanted to replace the numerical values; its really an issue where I have a bunch of products that include 100+ in the title (i.e. Product 102 Gallons) as they have to be moved so they appear after all the smaller sizes (i.e. Product 45 Gallons). I've considered either adding padding to smaller sizes.

Not sure which would be better,
Right now I'm working on a user defined function (below) to pull only numbers out and then reworking it into the replace function

DELIMITER $$
CREATE FUNCTION wwpSTRIP_NON_DIGIT(input VARCHAR(255))
   RETURNS VARCHAR(255)
BEGIN
  DECLARE output   VARCHAR(255) DEFAULT '';
  DECLARE iterator INT          DEFAULT 1;
  WHILE iterator < (LENGTH(input) + 1) DO
     IF SUBSTRING(input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
        SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
     END IF;
     SET iterator = iterator + 1;
  END WHILE;
  RETURN output;
END;
$$
DELIMITER ;

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
aikimarkCommented:
I think the easiest and quickest solution might involve a tally table (number table).  If you aren't familiar with these, read this article:
http:A_3573-A-MySQL-Tidbit-Quick-Numbers-Table-Generation.html

=========
Basically, you need a small (0-9 values only) table of numbers that you join up with your posts table in an Update query.  You use the number values in the Replace() function in the SET clause.

You could improve the performance of the query if you add a WHERE clause condition that looks for the numeric digit character in the post_title field.
0
 
m2ewAuthor Commented:
@aikimark I read over the query. I can try that method. It gets complex because there is the overall A-Z sort, Then Size (Gallons), and Stream count

Ie. Product 102 Gallon 4 Stream
0
 
aikimarkCommented:
@ m2ew

I don't know what your data looks like and don't understand how you are trying to sort.  I suspect that your data is not sufficiently normalized or tables improperly designed.  Some more detailed information would be helpful to the participating experts.
0
 
m2ewAuthor Commented:
@aikimark I'm working within wordpress setup, so I was referring to post_title field for sorting. The format "General Name ### Gallon # of Streams" is a general format based of the that field. I could split it into strings as the sorting priority goes

A-Z on General Name
Then ### Gallon
Then # Streams
0
 
aikimarkCommented:
@m2ew

This now seems like a parsing problem, since you need to extract different bits of the title in order to sort in the orders you described.  First, start by reading Kevin's related article, where he uses the tally/numbers table to parse a string.
http:A_3622-A-MySQL-Tidbit-In-line-CSV-Parsing.html

Do you have any other fields you can work with?  It would be much better if we were able to do this parsing once and save the parsed data in some other fields on the row.

If you have three fields, my approach would be to first split the title into two fields, based on the string before and after "Gallons".  Trim the result.  Populate a Gallons field with the number string following the last space in the first parsed string. Populate a Streams field with the number string before the first space in the second parsed string.  The raw General Name field is the trimmed string before the Gallons digits in the first parsed string.

========
Another way to do this is to go back to my first post and double the number of 'number' items by prepending and appending a space to our 0-9 values.  What you do, in essence, is to add your own delimiters to the title that you can split on with the Replace() function.

"Product Name 102 Gallon 45 of Streams" becomes
"Product Name^102^Gallon^45^of Streams"

Then you can use the splitting technique in Kevin's second article to split the string based on the "^" delimiter.  His article uses a comma delimiter, but I didn't want to take a chance that your general names might include a comma character.
0
 
m2ewAuthor Commented:
@aikimark Thanks for pointing me in the right direction.

I was thinking along the second option.

Concat() function (within replace()) actually came in handy with adding a leading character for the gallon size. I simplified the WHEN to check the digit length > 2 and exclude or filter streams (compartments) over 5; otherwise, wwpSTRIP_NON_DIGIT function would add that value. The majority of items are listed as Single Stream, Double Stream, Triple Stream & Quad Stream so its not an issue. None the less anything 5 & greater was creating a potential problem. Once I had the LENGTH() check, I set either "Z" or "0" as padding to the Gallon number. Query ran fast at  0.0348 sec as I'm limiting to 16 records on the pagination. I have to tweek this a bit more for 5 -8 streams, but works prefect for smaller sizes.

CASE 
   WHEN 
   LENGTH(wwpSTRIP_NON_DIGIT(a.post_title)) > 2 AND a.post_title NOT LIKE '%5 Stream%' AND a.post_title NOT LIKE '%6 Stream%'  AND a.post_title NOT LIKE '%7 Stream%'  AND a.post_title NOT LIKE '%8 Stream%'    
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '5 Stream', 'Stream' ))) > 2 
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '6 Stream', 'Stream' ))) > 2 
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '7 Stream', 'Stream' ))) > 2 
   || LENGTH(wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, '8 Stream', 'Stream' ))) > 2
  THEN 
    REPLACE 
       ( REPLACE 
           ( REPLACE
               (REPLACE 
                  (REPLACE (a. post_title, wwpSTRIP_NON_DIGIT (REPLACE ( a. post_title, 'Quad Stream', 'Stream' )), CONCAT("Z",wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, 'Quad Stream', 'Stream' )))), 
                  'Quad Stream', '4 Stream'),
                'Triple Stream', '3 Stream'),
             'Double Stream', '2 Stream'),
          'Single Stream', '1 Stream')
   ELSE
     REPLACE 
       ( REPLACE 
           ( REPLACE
               (REPLACE 
                  (REPLACE (a. post_title, wwpSTRIP_NON_DIGIT (REPLACE ( a. post_title, 'Quad Stream', 'Stream' )), CONCAT("0",wwpSTRIP_NON_DIGIT(REPLACE( a.post_title, 'Quad Stream', 'Stream' )))), 
                  'Quad Stream', '4 Stream'),
                'Triple Stream', '3 Stream'),
             'Double Stream', '2 Stream'),
          'Single Stream', '1 Stream')
END as ptmorphraw

Open in new window

0
 
Ray PaseurCommented:
I got a neglected question alert on this, and even though it's already closed, I wanted to share my thoughts.  Considering this:

General Name ### Gallon # of Streams

I see a "name" - just a string field that should not carry any inherent meaning.  The meaning of the field could be normalized out of this by taking the information in the ## components and putting it into a separate column.

The problem with trying to get "102 gallon" to come out after "45 gallon" is that the natural sorting order for character strings sees 1 as less than 4.  But if you have this number in a column of type INT, your WHERE clause will work to your advantage.
0
 
aikimarkCommented:
@Ray

I agree.  That was the gist of my table-normalization comment http:#a39188200
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now