Link to home
Start Free TrialLog in
Avatar of Enyimba
Enyimba

asked on

Need SQL script to manipulate a substring to contain decimal point. Can anyone help with this?

Dear SQL gurus

I am looking to come up with a simple SQL that will scan DB2 records(rows) for some character string that begins with "PAT" followed by one or more numbers. Upon finding the character string
and numbers, the sql should a decimal point at exactly 2 decimal position from the right just as if one writing 20.00. The decimal point remains the same regardles of the size of the numbers.


The format of the record looks as follows:

Current:
Current:
LGG20140115092300,SNO0388,CUP20316700000,DNO20,QTY1,AMT1732,PAT1025,CUP85162600203  <<<___ See "PAT" + numbers
LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT17156,CUP3151
LGG20140108150500,SNO0388,CUP20476000000,DNO30,QTY1,AMT90114,PAT152736,CUP3199 <<<<___ See "PAT" + numbers
LGG20140108150500,SNO0388,CUP20476000000,DNO30,QTY1,AMT90114,PAT173,CUP1099
LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT052,CUP0051
LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT52,CUP0151 <<<___ See "PAT" + numbers


The result of what I am looking for should look like the folllowing:

Current:
LGG20140115092300,SNO0388,CUP20316700000,DNO20,QTY1,AMT1732,PAT10.25,CUP85162600203
LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT171.56,CUP3151  <<<___ See "PAT" + numbers(large numbers)
LGG20140108150500,SNO0388,CUP20476000000,DNO30,QTY1,AMT90114,PAT1527.36,CUP3199  <<<___ See "PAT" + numbers(large numbers)
LGG20140108150500,SNO0388,CUP20476000000,DNO30,QTY1,AMT90114,PAT1.73,CUP1099
LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT0.52,CUP0051
LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT.52,CUP0151

Can someone help rovide a SQL solution? Any help in resolving request will be much appreciated.


Thanks

Enyimba
Avatar of Koen Van Wielink
Koen Van Wielink
Flag of Netherlands image

Hi Enyimba,

Just to be sure I understand your problem correctly, is the entire string for each row contained in a single column, or do multiple columns have to be searched?
Avatar of Enyimba
Enyimba

ASKER

The entire string for each row can contained in a single column as feed to DB2.  Existing DB2 table data and multiple columns will also need to searched. Is that possible?

Thanks

Enyimba
SOLUTION
Avatar of Koen Van Wielink
Koen Van Wielink
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Enyimba

ASKER

Hello, thanks for the response and post. At this moment, my system is unavailable for maintenance so testing your code is on hold. I will have to wait to do my testing...

I will report back with my results...thanks a million!

Enyimba
Sure, no problem. Just let us know when you had a chance to run it.
Avatar of Enyimba

ASKER

Koen Van Wielink,

If you don't mind, educate me a little what the SQL code that you posted is doing. Why the "Select" immediately followed by "insert"? Locate, okay I can guess what that is but do tell it your own way. What is "CODEUNITS32), CODEUNITS32)" doing? I hope I'm not being a pain?

Thanks

Emyimba
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Kent Olsen
Hi Enyimba,

Any SQL that parses a string like these needs a lot of sanity checks.  

-- What happens if 'PAT' is not found in the string?
-- What happens if no digits follow 'PAT'?
-- What happens if exactly 1 digit follows 'PAT'
-- Is PATnnnn always followed by a comma?  Can it end the string?
etc..

Here's a script that takes your examples and walks through the process.  

Kent


WITH strings (text)
AS
(
  SELECT 'LGG20140115092300,SNO0388,CUP20316700000,DNO20,QTY1,AMT1732,PAT1025,CUP85162600203' as text from sysibm.sysdummy1
  UNION ALL
  SELECT 'LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT17156,CUP3151' from sysibm.sysdummy1
  UNION ALL
  SELECT 'LGG20140108150500,SNO0388,CUP20476000000,DNO30,QTY1,AMT90114,PAT152736,CUP3199' from sysibm.sysdummy1
  UNION ALL
  SELECT 'LGG20140108150500,SNO0388,CUP20476000000,DNO30,QTY1,AMT90114,PAT173,CUP1099' from sysibm.sysdummy1
  UNION ALL
  SELECT 'LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT052,CUP0051' from sysibm.sysdummy1
  UNION ALL
  SELECT 'LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT52,CUP0151' from sysibm.sysdummy1
),
patstrings (text)
as
(
  SELECT case when locate_in_string (text, 'PAT') = 0 then '0.00'
              else substr (text, locate_in_string (text, 'PAT'))
         end
  FROM strings
),
targetstrings (text)
as
(
  SELECT substr (case when locate (',', text) = 0 then text
                 else left (text, locate (',', text)-1)
                 end, 4)
  FROM patstrings
)
select case length (text) 
            when 0 then '0.00'
            when 1 then '0.0' || text
            when 2 then '0.' || text
            else left (text, length (text) - 2) || '.' || right (text, 2)
       end
from targetstrings;

Open in new window

Avatar of Enyimba

ASKER

Hi Kent,

Thank you for providing the code example. Much appreciated. The example records that I used in my initial post is jus.t to display the format of the records to be processed by way of SQL singleton select from a table or from a websphere Queue (MQ). So that I can understand this solution better, could you please rework the SQL  code to process one record (row) at a time? Can I just remove all the union statement? I am going to test it without the union statements but I thght it just as well to ask now...

Again thanks for the continued interest and your assistance.

Enyimba
Avatar of Enyimba

ASKER

Koen Van Wielink,

Thank you Sir for the detail explanation what the SQL is doing. I have not had a chance to test the code and I hope to do that this thursday as soon as I get the tes data ready. I am glad I could have a couple of ways to solve my problem...what an invaluable way to learn the advance SQL from seasoned profesionals like yourself and Mr. Kent.

To the both of you, I am thankful.

What happens if I locate "PAT", extract up to a maximum of six sumbers, CAST this numbers to integer, divide this integer number by 100 to get the desired result? For example, if I have "PAT152736", 152736/100=1527.36.
How can I modify your code to do that...

Thanks

Anyanso
Avatar of Enyimba

ASKER

Hello Koen Van Wielink,
I have tested the SQL code that you presented and it threw an error as shown below. Below that is the data that I loaded into the test_table  and the column name I am interogating is named MEMO.

select      insert(memo, LOCATE('.', memo, POSITION('PAT', memo, CODEUNITS32), CODEUNITS32) - 2, 0, '.')
from      rapm.test_table
where      POSITION('PAT', memo, CODEUNITS32) <> 0

A numeric argument of a built-in string function is out of range.. SQLCODE=-138, SQLSTATE=22011, DRIVER=3.68.6

Can you help again with this error?

Thanks

Enyimba



INSERT INTO RAPM.TEST_TABLE VALUES
                                                                                              -------Column Name=MEMO-------------------------------
(3769910,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3769911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3769910,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3769911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3769912,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3769913,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3769914,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3769915,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3769916,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT1025,UPC85162600203'),
(3769917,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3769918,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3769919,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT17156,UPC3151'),
(3769900,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3769901,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3769800,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT152736,UPC85162600203'),
(3719911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3729910,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3739911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3749910,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3759911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3769910,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3779911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3789910,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3799911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3709910,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3719911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3729910,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3739911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3749910,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3759911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3769910,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3779911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151'),
(3789910,20141220,2210,20141219,215800,38,300,604,2,'KEY412227,QTY1,WGT173,UPC85162600203'),
(3799911,20141220,2210,20141219,215600,232,300,155,2,'KEY412227,QTY1,WGT52,UPC3151');
Avatar of Enyimba

ASKER

kdo,

I have tested the sql code you presented and it worked. the result of the test is shown below:
 1
 -------
 0.52
 1.73
 1527.36
 171.56
 10.25
 0.52

Now, how can I modified the code to update the rows of the table? Can you help with this?

Thanks

Enyimba
Hi Enyimba,

You can replace the final SELECT statement with another SQL verb, such as UPDATE.  :)

What do you need to change?  A column in the same table?  In a different table?  If in a different table what is the join key between them?  Also, about how many rows are in your source table and the table that you need to update.  If it's a large number, we may want to change the algorithm a bit for better performance.


Kent
Avatar of Enyimba

ASKER

Kent,
Wat needs to change is to update a column in the same table ad more specifically, a substring itin the column. The column is declared VARCHAR(2000).  This is more of "inplace update" hopefully with no need for unload/load to do the "etl-like" update. There wil be no join needed but the  volume may be significant - about 600K every 3 minutes!! I will think that i large, you agree?

kdo, again thanks for your continuing interest and assistance. Look forward to perhaps, the final cut, as they say in the movies

Thanks

Enyimba
Ok.  It's certainly possible.  :)

Let's get the entire problem on the table.  Once that 'PATxxxx' substring is converted to 'xx.xx', does it get replaced into that same string?  Does it replace the PATxxxx substring?  Does it get appended to the string, etc...
Avatar of Enyimba

ASKER

It get replaced into the same string. It's like shifting the numbers two positions to the right and insert the decimal point. Obviously the rest of the data has to shift right also. Also, 99.9% rows will contain the WGT character and some may not. Infact the size of this table is in the millions of rows!!.

Thank you

Enyimba
A clarification...

Does

 xxxxx,PAT123,xxxxx

become

 xxxxx,1.23,xxxxx

or

 xxxxx,PAT1.23,xxxxx
Does the table have a primary key?  It's probably most efficient to write an UPDATE statement that makes the updates on the fly.  It's simpler SQL if the new strings can be generated in a subquery and edited into the column in the UPDATE portion of the statement.


Kent
Avatar of Enyimba

ASKER

It becomes xxxxx,PAT1.23,xxxxx . The only difference is the insertion of the decimal point.

Thanks

Enyimba
Ok.  That's actually less work than the path that I had started down.  All we have to do is find the place to insert the decimal point (and possibly a zero or two).

This code finds the start position of the numeric string after 'PAT' and the position of the comma after the last digit.  Inserting the decimal point is now trivial.

But as I said, it can get wordy due to the special case handling.  Does the table have a primary key?



WITH strings (text)
AS
(
  SELECT 'LGG20140115092300,SNO0388,CUP20316700000,DNO20,QTY1,AMT1732,PAT1025,CUP85162600203' as text from sysibm.sysdummy1
  UNION ALL
  SELECT 'LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT17156,CUP3151' from sysibm.sysdummy1
  UNION ALL
  SELECT 'LGG20140108150500,SNO0388,CUP20476000000,DNO30,QTY1,AMT90114,PAT152736,CUP3199' from sysibm.sysdummy1
  UNION ALL
  SELECT 'LGG20140108150500,SNO0388,CUP20476000000,DNO30,QTY1,AMT90114,PAT173,CUP1099' from sysibm.sysdummy1
  UNION ALL
  SELECT 'LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT052,CUP0051' from sysibm.sysdummy1
  UNION ALL
  SELECT 'LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT52,CUP0151' from sysibm.sysdummy1
),
patstrings (fc, lc)
as
(
  SELECT 
    case when locate_in_string (text, 'PAT') = 0 then 0
         else locate_in_string (text, 'PAT')+3
    end,
    case when locate_in_string (text, 'PAT') = 0 then 0
         else locate_in_string (text, ',', locate_in_string (text, 'PAT'))
         end
  FROM strings
)
select * from patstrings;

Open in new window

Avatar of Enyimba

ASKER

Oh , also, I believe the table has a primary key and yes, I'll take your word that it is simpler  to generate  the new string in a subquery and edited into the column.  

Update: I now have about 23 millions rows to update between Friday afternoon and batch window that begins at 11 pm EST.

I will like to test about 5k rowsbefore I deploy to QA.

Thanks

Enyimba
Avatar of Enyimba

ASKER

Oh, I forgat to sat that yes the table has a key and the key is the ID column - the first det of numbers in each row.
Let me ask you this, would it matter key or key if you are going to process the entire table regardless? Wouldn't db2 chose to table Scan through the data it determines that the entire table will be processed? At least that is what I am reading up on how DB2 works. But, trust me I'll defer to your experience.

Thank you

Enyimba
If the SQL uses the primary key to join temporary results, a table scan will be done to generate the the derived table, and another will be done on the derived table to join it to the permanent table.  Skipping 1 of the table scans will cut a lot of time from the query...

Here's a query and test for it.  There are two possible exceptions that are not handled.  If the PAT string already has a period in it, it will now have 2.  Also, if the PAT string is the last clause on the line (no trailing comma) the query will fail.

It's pretty easy to write a query to see if there are exceptions that need to be managed.

If you need to break the production update into chunks, modify the BETWEEN clause.  Perhaps chunks of 50,000 at a time.  (Again, don't let the ranges overlap or there will be a double period.)

Good Luck!
Kent



DROP TABLE tt;
CREATE TABLE tt (id integer generated always as identity, text varchar (200));
INSERT INTO tt (text) values  'LGG20140115092300,SNO0388,CUP20316700000,DNO20,QTY1,AMT1732,PAT1025,CUP85162600203';
INSERT INTO tt (text) values  'LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT17156,CUP3151';
INSERT INTO tt (text) values  'LGG20140108150500,SNO0388,CUP20476000000,DNO30,QTY1,AMT90114,PAT152736,CUP3199';
INSERT INTO tt (text) values  'LGG20140108150500,SNO0388,CUP20476000000,DNO30,QTY1,AMT90114,PAT173,CUP1099';
INSERT INTO tt (text) values  'LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT052,CUP0051';
INSERT INTO tt (text) values  'LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT52,CUP0151';
INSERT INTO tt (text) values  'LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT5,CUP0151';
INSERT INTO tt (text) values  'LGG20140113094400,SNO0388,CUP20476000000,DNO30,QTY1,AMT10122,PAT,CUP0151';
COMMIT;
  
SELECT * FROM tt;

UPDATE tt t0
  SET t0.text =
  (
    SELECT
      case lc - fc
           when 0 then substr (text, 1, fc-1) || '0.00' || substr (text, fc)
           when 1 then substr (text, 1, fc-1) || '0.0' || substr (text, fc)
           when 2 then substr (text, 1, fc-1) || '0.' || substr (text, fc)
           else substr (text, 1, lc-3) || '.' || substr (text, lc-2)
      end text
    FROM
    (
      SELECT
        ID, 
        case when locate_in_string (text, 'PAT') = 0 then 0
             else locate_in_string (text, 'PAT')+3
        end fc,
        case when locate_in_string (text, 'PAT') = 0 then 0
             else locate_in_string (text, ',', locate_in_string (text, 'PAT'))
             end lc,
        text
      FROM tt 
      WHERE id between 1 and 8
    ) t1
    WHERE t0.id = t1.id
  ) 
  WHERE id between 1 and 8
;

SELECT * FROM tt;

Open in new window

Avatar of Enyimba

ASKER

kdo,

Thank you for the modified code. You mentioned that they are two exception that may need to be handled -  If the PAT string already has a period in it, it will now have 2.  Also, if the PAT string is the last clause on the line (no trailing comma) the query will fail.

Can this two exceptions be checked in your code. The first exception is the more troubling. Yes, if the data file is processed in ranges, the problem can be avoided, but as you, "the best laid out plan of mice and men..." . So how can I handle the double period exception in your code. Please help if you can.

Thanks

Enyimba
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Enyimba

ASKER

Unparalleled help in solving my sql problem