?
Solved

sybase iq 12.7 incremental loadind

Posted on 2011-02-14
11
Medium Priority
?
1,369 Views
Last Modified: 2012-05-11
hello!

how can I do data loading in Sybase IQ 12.7, but I want to load only the modified/deleted/added records. I can't use Sybase ETL. I want to use only SQL.
Can I do this efficiently using Java support from Sybase?

Thanks in advanece.
0
Comment
Question by:i_viorel2010
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
11 Comments
 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 34893325
This is a rather broad question - it usually takes a few months in a project to get this stuff working as a custom in-house solution.

Broadly speaking you need:

1

to be able to identify changed rows in your source system(s)

2

some way to deal with multiple operations on a single row - if the same row gets updated twice you need to consolidate this into one update; if a row gets inserted, updated and then deleted what do you want to have happen in your IQ?

3

to then apply a batched insert, batched delete and multiple update statements in IQ
Note this will be very slow. IQ is not designed to deal well with single-row updates. Whenever I hear someone talking about wanting to do updates and deletes in IQ I want to ask them if they're sure they're doing the right thing. What is your business requirement for this? It sounds to me like you are making the mistake of thinking of IQ as a transactional RDBMS - which of course it isn't.

Many sites have instead chosen to only ever insert, with some kind of "DateLoadedAt" column to track when any version of a row was inserted. You can then define views that show you only currently live rows, plus you also get the ability to go back in time and do "as at" queries for a specific point in time. Not only is this a lot more powerful for the users, it will be much more efficient in loading because you're only ever inserting, which is IQ's strength. Yes it means more data but do we care about that in IQ?

So the quick answer to your question is yes you can do this, yes you can build your own custom solution, but it will take you a while, it will never be very fast, and it is probably the wrong solution anyway.
0
 

Author Comment

by:i_viorel2010
ID: 34894345
Thanks for the response.
My idea is :
-create a stored procedure
-use load table based sql syntax  to load data in buffers
-for every buffer:
  -add new records from buffer to system data
  -update the records  
  -delete from system data the records marked as deleted in buffer
.
Suppose you have the following tables: create table myData(ID int, Sales biginy);
 and the following data in file sal.csv

3 1000 A
4 1200 A
3 1500 U
5 1600 A
4 1300 U
3 1500 D

A is for new record, U is for updated record and D for deleted record
Let be tmp_data the following temporary table.

create tmp_data(ID int, Sales int, Op char);

load table tmp_data(ID,Saled,Op,'\n')
from'd:\\sal.csv'
QUOTES OFF
ESCAPES OFF
FORMAT ascii
delimited by ';'    
skip 1;

How modify load table to load data in buffers?
 
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 34894373
I'm not sure what you're asking...
0
Percona Live Europe 2017 | Sep 25 - 27, 2017

The Percona Live Open Source Database Conference Europe 2017 is the premier event for the diverse and active European open source database community, as well as businesses that develop and use open source database software.

 

Author Comment

by:i_viorel2010
ID: 34894816
Ok. I want to implement the following algorithm:

declare start_id int;
declare block_size int;
declare ok int;
set skip_records=0;
set block_size=1000; %we load data in batch of 1000 records
set ok=1;
create tmp_data(ID int, Sales int, Op char);

while ok>0 loop
delete from tmp_data;
load table tmp_data(ID,Saled,Op,'\n')
from'd:\\sal.csv'
QUOTES OFF
ESCAPES OFF
BLOCKSIZE block_size
SKIP skip_records
FORMAT ascii
delimited by ';'    

insert into myData
select id,sales
where Op='A'

update myData
set myData.sales=tmp_Data.sales
from tmp_data inner join myData on
(tmp_Data.id=myData.id) and (tmp_Data.Op='U')

delete from myData
inner join tmp_Data
on (tmp_data.id=myData.id) and (tmp_data.Op='D')

set skip_records=skip_records+block_size;
select count(*) in OK
from tmp_data;

end loop

Can be optimized?It's a good approach?
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 34894890
Um.

Firstly the BLOCKSIZE parameter in the IQ LOAD TABLE statement has nothing to do with how many rows are in your batch. That is a specific measure for how memory should be allocated during loads and should not be changed unless you have very specific technical reasons for doing so. You haven't said anything to suggest you're in any of those circumstances.

Secondly a batch size of 1000 is absurdly small for IQ. I suspect you are in the habit of tuning batch jobs in ASE where indeed a smaller batch size helps things like transaction logging. This is not a factor in IQ and you kinda need to reverse all of your standard RDBMS habits. :) Load in as large a batch as you can.

Do you have a specific business requirement for only working on 1000 changes at a time?

Before we even get into how to optimise and implement your pseudocode, you need to go back to the drawing board. How will you deal with multiple operations on the same primary key? What happens if the same row is updated three times? Your current approach will fail because it will appear in your temp table three times and thus no longer be unique.

Don't even think about commands, syntax & tuning just yet - you still have quite a way to go before you're ready to look at physical implementation. You need to figure out many logical design issues like how to deal with (or prevent) multiple operations to the same primary key, how to detect and deal with errors at any stage of the process, whether it even makes sense to load the deltas directly into IQ (where is the data coming from? It might make more sense to have IQ connect directly to that source).

You also need to have a good think about whether it makes sense to do any updates or deletes at all. Not everyone agrees with this but I believe that if someone is talking about deletes in a data warehouse there is a problem with their business analysis and design. I suspect you are still thinking in relational transactional database terms. A data warehouse is not a database the what makes sense for one does not make sense for the other. Why would we willingly destroy information? Hint: what reasons are there for deleting data in your source systems? Most of those reasons will not apply to a data warehouse.

There's a reason I say this sort of thing takes months to do right. You are not going to get the answer in just a couple of hours from a web site, this is a huge job. ETL often takes 50% of a data warehouse project!
0
 

Author Comment

by:i_viorel2010
ID: 34895153
Thanks for the response. I
 It's about BLOCK FACTOR. It's my mistake. The batch size can be huge. But for 3-4 TB data,
1 GB(in the number of records approximately) batch size can be good?  

What happens if the same row is updated three times? Your current approach will fail because it will appear in your temp table three times and thus no longer be unique.

If the id is unique and the change of a field different from primary key is recorderd in the Op column,
for example the data is in the following form:

3 1500 U
5 1600 A
3 1300 U

can I suppose the order in sequentially form for update statement?

I will remove the delete operation.

The code become:


declare start_id int;
declare block_size int;
declare ok int;
set skip_records=0;
set block_size=1000000000; %we load data in batch of 1 billion records
set ok=1;

create tmp_data(ID int, Sales int, Op char);
while ok>0 loop
delete from tmp_data;
load table tmp_data(ID,Saled,Op,'\n')
from'd:\\sal.csv'
QUOTES OFF
ESCAPES OFF
block factor block_size
SKIP skip_records
FORMAT ascii
delimited by ';'    

insert into myData
select id,sales
where Op='A'

update myData
set myData.sales=tmp_Data.sales
from tmp_data inner join myData on
(tmp_Data.id=myData.id) and (tmp_Data.Op='U')

set skip_records=skip_records+block_size;
select count(*) in OK
from tmp_data;

end loop

0
 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 34895203
I think you're being distracted by the technology. :) I do that too.

It's way too soon to be writing a single line of code. Get the design right. The coding is the easy part but it has to be done last or it all goes to hell.

Re. multiple updates: ordering sequentially won't work, because you are proposing to update the permanent rows based on a join to the temp table - and your temp table will have multiple rows in it. You have several approaches - either collapse the updates into one update per row (difficult - you have to detect which column(s), figure out what the final after-image should look like), or you can enforce a rule that only one change to a row is applied in any one load (difficult - what if there are dependencies on this row? You'll have to identify those and push them into the next load also).

And even then, how do you want to handle a row being inserted, then updated, then deleted?

Also if you're thinking in terms of loops your code isn't ready for a database yet. :) Do as much as possible in set-based operations. If you're batching up your loads - what do you want to happen if the second batch fails? Do you still want to do the third? What if the third contains an update to a row that was inserted in the second (failed) batch and now therefore won't work?

It's way too soon to be writing code. It'll be too soon to write code next week, too. :) Like I keep saying, this takes months to get right. If you're only doing this for one table then perhaps it will only take weeks. :)
0
 

Author Comment

by:i_viorel2010
ID: 34895357
Thanks for the advice.

The idea is that primary key doesn't change.
The problem with selection of last update can be solved with an inner join and rowid.


And even then, how do you want to handle a row being inserted, then updated, then deleted?

The insertion suppose a new ID.

The size of  batch can be reduced if I will delete all records that have the state add and delete.

I can use transaction for batch insert.

But the mean question is: Can I use approach with while?
They are many problems and I suppose that batch load with transaction can help me.
Or is better idea to load all data in a table and to make changes at final?
In Sybase IQ 12.7 I don't have incremental loading and I try to implement this facillity.
0
 
LVL 24

Accepted Solution

by:
Joe Woodhouse earned 2000 total points
ID: 34895419
Right, the PK won't change, but what if you have multiple changes to the same row?

There's no need for any WHILE loop here.

IQ will be very fast for all reads and inserts, slow for deletes and very very slow for updates. You want to minimise those.
0
 

Author Comment

by:i_viorel2010
ID: 34903491
Thanks for the advices. i will give 500 points.
0
 
LVL 24

Expert Comment

by:Joe Woodhouse
ID: 34903538
Hope that helped! Sorry there isn't a quick & easy answer here, this is actually the most difficult part of any DW implementation...
0

Featured Post

Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Active Directory can easily get cluttered with unused service, user and computer accounts. In this article, I will show you the way I like to implement ADCleanup..
This month, Experts Exchange’s free Course of the Month is focused on CompTIA IT Fundamentals.
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…
This is my first video review of Microsoft Bookings, I will be doing a part two with a bit more information, but wanted to get this out to you folks.

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question