Solved

Round 2: SQL to manipulate a substring to contain decimal point running very slow and not updating table. Can anyone help with this?

Posted on 2015-01-14
18
335 Views
Last Modified: 2015-02-12
Dear Gurus...kdo,

Format of my table (11 columns), last column is of interest and column name is "TEXT". Size of table 32 million rows. Planning to run script in chunks like monthly range (Jan1 - Jan 30) thru Dec 1 - Dec 31 2014.
Even when I run the script to process only one day's worth of data, it is still running way too long (25 minutes to process 252,800 rows!!) and had to be cancelled each time.
The last column, col11, is the column that contains the string I want to update where needed.
The first column is the ID key and the SQL that I have is shown below followed by the data format and examples. Could someone help me tweak this sql or perhaps happens to have a SQL script that have the potential of running somewhat faster!

The current being used is shown below:
UPDATE tt t0
  SET t0.text =
  (
    SELECT
      case when pc <> 0     then text
           when lc = 0      then
           case length (text) - fc
                when -1     then text || '0.00'
                when 0      then left (text, fc-1) || '0.0' || right (text, 1)
                when 1      then left (text, fc-1) || '0.' || right (text, 2)
                else             left (text, length (text) - 2) || '.' || right (text, 2)
           end
           when lc - fc = 0 then substr (text, 1, fc-1) || '0.00' || substr (text, fc)
           when lc - fc = 1 then substr (text, 1, fc-1) || '0.0' || substr (text, fc)
           when lc - fc = 2 then substr (text, 1, fc-1) || '0.' || substr (text, fc)
           else substr (text, 1, lc-3) || '.' || substr (text, lc-2)
      end text
    FROM
    (
      SELECT
        ID,
        case when locate_in_string (text, 'WGT') = 0 then 0
             else locate_in_string (text, 'WGT')+3
        end fc,
        case when locate_in_string (text, 'WGT') = 0 then 0
             else locate_in_string (text, ',', locate_in_string (text, 'WGT'))
             end lc,
        case when locate_in_string (text, 'WGT') = 0 then 0
             else locate_in_string (text, '.', locate_in_string (text, 'WGT'))
             end pc,
        text
      FROM tt
--      WHERE id between 1 and 8
    ) t1
--  WHERE t0.id = t1.id

where t0.id = t1.id and Tran_Date>='20140101' and Tran_Date<='20140105'
      and TRAN_TYPE='SALE'
  )
--  WHERE id between 1 and 8
;
Can anyone (kdo are around?) help with making this script run faster and also really do the update because it is not currently updating when I run it against my table...

Thanks

Enyimba
0
Comment
Question by:Enyimba
18 Comments
 

Author Comment

by:Enyimba
ID: 40549513
The data format for this post is attached...
0
 
LVL 24

Expert Comment

by:Tomas Helgi Johannsson
ID: 40550761
Hi!

Do you have index on the table tt that are on these columns in this order

(id asc ,Tran_type asc, Tran_Date asc,Text asc )

This index could help your script run faster.

Regards,
     Tomas Helgi
0
 
LVL 27

Expert Comment

by:tliotta
ID: 40552290
The data format for this post is attached...
Attached to what? I see no attachment.

Tom
0
 

Author Comment

by:Enyimba
ID: 40556989
Tom, all,

I have attached the data format and sample data for this thread. I am a bit despate for a solution to speed up the update for a 32 million row table. The TEXT column that you see is the data format is the column of interest. The column has a three character substring + up to 6 digit numbers. What I am lookig to do is locate the substring which will have the characters PAT and numbers and I want to do a inplace insert a decimal period  in the numbers two places from the right of these numbers. For example, if I have PAT1, I want to update the substruing to PAT0.01,
PAT11=PAT0.11,  PAT111=PAT1.11,  PAT1234=PAT12.34,  PAT12345=PAT123. 45,  PAT123456=PAT1234.56
The current SQL show in this thread originally looked as it will solve my problem but I have since found out that it never seems to finish even one day's worth of data - about 459,000 rows of data. I will appreciate a very helpful and sql solution as soon as someone can help out eith this..

Data Format:

ID          POST_DATE   POST_TIME   TRAN_DATE   TRAN_TIME   TRAN_TYPE PRODUCT_KEY QTY                      AMOUNT      LOCATION_KEY   TEXT                                                                                                                                                                                                                                                            
----------- ----------- ----------- ----------- ----------- --------- ----------- ------------------------ ----------- ----------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
196620    20140421      164456    20140101        4700 SALE             1141   +1.00000000000000E+000        1099         262 KEY1692518,QTY1                                                                                                                                                                                                                                                 
     196621    20140421      164456    20140101        4700 SALE             1141   +1.00000000000000E+000        1099         262 KEY1692518,QTY1                                                                                                                                                                                                                                                 
     196691    20140421      164456    20140101        4600 SALE            14744   +1.00000000000000E+000         699         262 KEY1692518,QTY1                                                                                                                                                                                                                                                 
     196709    20140421      164455    20140101       95600 SALE              147   +1.00000000000000E+000         399         262 KEY1692522,QTY1                                                                                                                                                                                                                                                 
     196710    20140421      164455    20140101       95600 SALE              147   +1.00000000000000E+000         399         262 KEY1692522,QTY1                                                                                                                                                                                                                                                 
     196711    20140421      164456    20140101      100400 SALE              174   +1.00000000000000E+000         370         262 KEY1692522,PAT53,QTY                                                                                                                                                                                                                                       
     196712    20140421      164456    20140101      101300 SALE              251   +1.00000000000000E+000         489         262 KEY1692522,QTY1                                                                                                                                                                                                                                                 
     196713    20140421      164455    20140101      100300 SALE              435   +1.00000000000000E+000        1611         262 KEY1692522,PAT124,QTY1                                                                                                                                                                                                                                          
     196714    20140421      164455    20140101      100300 SALE              573   +1.00000000000000E+000        2041         262 KEY1692522,PAT465,QTY1                                                                                                                                                                                                                                          
     196715    20140421      164456    20140101      101500 SALE              575   +1.00000000000000E+000         399         262 KEY1692522,QTY1                                                                                                                                                                                                                                                 
     196716    20140421      164503    20140101      102300 SALE              625   +1.00000000000000E+000         413         262 KEY1692522,PAT78,QTY1                                                                                                                                                                                                                                           
     196717    20140421      164503    20140101      102300 SALE              625   +1.00000000000000E+000         402         262 KEY1692522,PAT76,QTY1                                                                                                                                                                                                                                           
     196718    20140421      164455    20140101      100400 SALE              647   +1.00000000000000E+000         176         262 KEY1692522,PAT104,QTY1                                                                                                                                                                                                                                          
     196719    20140421      164455    20140101       95300 SALE              661   +1.00000000000000E+000         120         262 KEY1692522,PAT71,QTY1                                                                                                                                                                                                                                           
     196720    20140421      164503    20140101      101900 SALE              728   +1.00000000000000E+000         703         262 KEY1692522,PAT130,QTY1                                                                                                                                                                                                                                          
     196721    20140421      164456    20140101      101200 SALE              810   +1.00000000000000E+000         192         262 KEY1692522,PAT107,QTY1                                                                                                                                                                                                                                          
     196722    20140421      164503    20140101      102200 SALE              836   +1.00000000000000E+000         283         262 KEY1692522,PAT142,QTY1                                                                                                                                                                                                                                          
     196723    20140421      164455    20140101       95900 SALE              836   +1.00000000000000E+000         185         262 KEY1692522,PAT93,QTY1                                                                                                                                                                                                                                           
     196724    20140421      164456    20140101      101800 SALE             1172   +1.00000000000000E+000         597         262 KEY1692522,PAT171,QTY1                                                                                                                                                                                                                                          
     196725    20140421      164455    20140101      100000 SALE             1226   +1.00000000000000E+000         890         262 KEY1692522,PAT255,QTY1                                                                                                                                                                                                                                          
     196726    20140421      164455    20140101      100400 SALE             1241   +1.00000000000000E+000         499         262 KEY1692522,QTY1                                                                                                                                                                                                                                                 
     196727    20140421      164503    20140101      102300 SALE             1537   +1.00000000000000E+000         199         262 KEY1692522,QTY1                                                                                                                                                                                                                                                 

Open in new window

Thanks
Enyimba
0
 
LVL 45

Accepted Solution

by:
Kdo earned 500 total points
ID: 40559912
You're probably going to want to break this query up to manage smaller chunks.  You're also need to be VERY careful that the filter is applied correctly.

UPDATE tt t0
  SET t0.text =
  (
    SELECT
      case when pc <> 0     then text
           when lc = 0      then
           case length (text) - fc
                when -1     then text || '0.00'
                when 0      then left (text, fc-1) || '0.0' || right (text, 1)
                when 1      then left (text, fc-1) || '0.' || right (text, 2)
                else             left (text, length (text) - 2) || '.' || right (text, 2)
           end
           when lc - fc = 0 then substr (text, 1, fc-1) || '0.00' || substr (text, fc)
           when lc - fc = 1 then substr (text, 1, fc-1) || '0.0' || substr (text, fc)
           when lc - fc = 2 then substr (text, 1, fc-1) || '0.' || substr (text, fc)
           else substr (text, 1, lc-3) || '.' || substr (text, lc-2)
      end text
    FROM
    (
      SELECT
        ID,
        case when locate_in_string (text, 'WGT') = 0 then 0
             else locate_in_string (text, 'WGT')+3
        end fc,
        case when locate_in_string (text, 'WGT') = 0 then 0
             else locate_in_string (text, ',', locate_in_string (text, 'WGT'))
             end lc,
        case when locate_in_string (text, 'WGT') = 0 then 0
             else locate_in_string (text, '.', locate_in_string (text, 'WGT'))
             end pc,
        text
      FROM tt
--  First critical filter
--      WHERE id between 1 and 8
    ) t1
--  WHERE t0.id = t1.id
where t0.id = t1.id and Tran_Date>='20140101' and Tran_Date<='20140105'
      and TRAN_TYPE='SALE'
  )
--  Second critical filter
--  WHERE id between 1 and 8
;

Open in new window


The table has 32M rows.  Depending on the database structure on content, that could mean more than 131G written to the log files, and of course, that much read and updated in the database.  That's a worst case scenario, but if you're using 4K pages with an average row size over 2K, it's a reality.  If the database has an average row length around 200 characters, the I/O drops from 131G logged and updated to 13+G.  That will take some time as the update will require several full-table scans in addition to the logging/updating requirements.

I suggest that the query be broking up into smaller chunks.  The primary key is probably the best, but any indexed column will do.

I've reposted the SQL and inserted to comment lines to show the location where you'll want matching filters.  If the filter is applied only on the inner query, the filters become an implied outer join and all rows not selected in the inner query will get set to NULL.  If the filter is applied only on the outer query, the full inner query may prepare the data for all 32M rows, but only the rows selected in the outer filter will be updated.  That's a lot of wasted processing.

That may solve by rewriting the UPDATE as a MERGE statement, but since we know that the update works, modifying it to work on up 1,000,000 rows at a time seems reasonable, and duplicating the filter isn't egregious.


Kent
0
 

Author Comment

by:Enyimba
ID: 40560178
Kent',

you said that "...since we know that the update works,"...that's my problem. The update is not working! I am currently running this latest SQL rewrite on a small subset(250,000) of the 32million rows table and it has been running for over 10 minutes now! I think you'll agree that it is taking a bit too long. My test table has no index and I am going to add some indexes just as I have it in the QA environment which is where I want to ultimately run and update the table.
Any index suggestion, sql changes?

Thanks

Enyimba
0
 
LVL 45

Expert Comment

by:Kdo
ID: 40560217
You should only need 1 index for this query.

Make that the IDENTITY column or primary key.  That becomes the key item used in the filters and the join operations.

Kent
0
 

Author Comment

by:Enyimba
ID: 40560371
Kent,
The ID column is the identity key and yes I have created the index. I will report the result as soon I have them...I had to cancel the first run without indexes...

Stay tuned..
0
 

Author Comment

by:Enyimba
ID: 40560717
Kent, well the sql is still running long even after the ID key index is defined. Not only is it running slow, itis not updating the table either?

What to do, I have no clue right now...anything you want me to try?

Thanks

Enyimba
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 45

Expert Comment

by:Kdo
ID: 40561003
Are you trying to run it on the entire table or have you applied filters to limit the number of rows?
0
 

Author Comment

by:Enyimba
ID: 40561186
Kent,.

I applied some filter using tran_date and tran_type.  What beats me is that if you run your code without the update, it runs relatiely fast but when the update option is introduced, the code practically runs without end.  I start thinking is the code in a loop, a cartesian join or just hang...If I ever get this to work, obviously I will follow your earlier suggestion to run the process in chunks.

Anyway, the importan thing is to get it to at least update the table and I can begin to find a way to use parrallel jobs to process the entire 32million row table...I don't think I am going to have fun doing that...

If you think of anything else to try, I guess you will advise on that...

Thanks

Enyimba
0
 
LVL 45

Expert Comment

by:Kdo
ID: 40561189
Are tran_date and tran_type indexed?  The filter AND the join key need to be indexed.
0
 
LVL 45

Expert Comment

by:Kdo
ID: 40561190
If you'll include a filter on the IDENTITY column, you should see a huge improvement, but probably not enough to want to run it over 32M rows.

WHERE id_column between 1 and 1000
  AND tran_date = xxxx
  AND tran_type = yyyy


Kent
0
 

Author Comment

by:Enyimba
ID: 40561294
Thanks Kent...let me add that and see what I get...
0
 

Author Comment

by:Enyimba
ID: 40561302
Kent,

What do you mean by "The filter AND the join key need to be indexed"? Please explain?
I have an index on ID, TRAN_DATE, TRAN_TYPE, and MEMO. MEMO is questionable, but I added anyway.


Enyimba
0
 

Author Comment

by:Enyimba
ID: 40561350
Kent,

Does this look like what you are advusing to do? I ran this and got no record found. The message tell me that I have the where clause in the wrong places(s).
0
 
LVL 45

Expert Comment

by:Kdo
ID: 40561845
Your sample data (posted above) doesn't show any line with 'WGT' in it.  That would certainly result in 'No Records Found'.

The index on Memo won't be used.
0
 

Author Closing Comment

by:Enyimba
ID: 40568709
Again, absolutely unsurpassed technical assistance...here's to you, kdo!!
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now