Link to home
Start Free TrialLog in
Avatar of FamousMortimer
FamousMortimerFlag for United States of America

asked on

Synchronize Tables from AS400 to SQL Server on Schedule

Hi Experts,

The title basically says it all.  We have a AS400 located at a remote site so i would like to sync a few of the tables to our SQL server to improve the speed of queries.  I am really not sure where to start on this one, but any help would be greatly appreciated.

The tables can be huge (hundreds of thousands of rows) so it would be best to only update the new rows since the previous sync.

Thanks
-Jeremy
SOLUTION
Avatar of ScottParker
ScottParker
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Oh the SSIS script thing is for MSSQL server 2005 and above.   If your using SQL 2000, then they call it a DTS.
Avatar of Gary Patterson, CISSP
Lots of solutions, of varying complexities, costs, and levels of technical expertise.

You can buy packaged software that handles these types of ETL chores for you.
You can build your own Windows-based "pull" solution using BIDS, SSIS, etc.
You can build your own AS/400 based "push" solution using RPG, COBOL, SQL, Java, triggers, and/or journaling
You can build a Java solution using JDBC.

The AS/400's DB2 database supports access, import, and export via a lot of interfaces:

You can export data to flat files or delimited files (like CSV's) using the CPYTOIMPF command, and then download the exported stream file using FTP or other tools.
You can access the database remotely using ODBC, JDBC, OleDB, ADO.NET, DRDA, or DDM.  IBM provides drivers for all of these interfaces.  All of these interface choices means that you can write programs in your Windows language of choice, or use just about any commercial or open source ETL tool you choose.

I can help you narrow it down, but you'll need to give me more to go on.  

I work for a consulting company, and I've done this type of project dozens of times now across numerous platforms, and for various needs.  

Things you'll want to consider:

Build or buy?  Lots of commercial products in this space, but they can be expensive.
Timeline to implement?
Volume of data for initial load, and how much is the daily delta?
If you plan to build, what technologies are the best fit for your shop?  Do you understand the complexities of building applications that have to maintain synchronized data sets?  What happens when the sets get out of sync?  How will you know?  How will you correct it?
How will you deal with data type differences between the two databases?
How do you plan to identify all of the changed rows?  Adds are usually easy, assuming there is a timestamp or sequential key, but updates and deletes can be trickier, of that is an issue for you.  Comparing two tables on two different systems is often very tricky, especially if you haven't done this sort of thing before.
How critical is the data?  Is there any tolerance for error between the two systems?
Which system is considered the "master".
Large volume batch data transfers can be taxing on both the source and target systems.  Do you have a low-usage good window adequate for the daily data transfer?
How are the two systems physically connected?  You said "remote".  Do you have adequate bandwidth to transfer the volume of data in the window you have available?
Would a near-real-time or real-time data replication solution be more appropriate then a batch-oriented ETL process?

Anyway, post back if you want to discuss further.

Regards

- Gary Patterson
Avatar of FamousMortimer

ASKER

Wow, lots said, and lots to answer.

I would prefer to build a solution if possible.  The data is real-time (it is production data from our manufacturing plant, i.e. what the machines are doing).  The reason to sync the data is mainly because our manufacturing monitors query every minute or two large amounts of data and it can be lengthy because of latency.  If the data ever goes out of sync, it would not be a huge ordeal, although the more accurate is better.  For most applications we are concerned about the most recent data (from 'today')

We would have to use a Windows-based 'pull' solution, although if a 'push' solution is recommended, i may be able to have our 400 admin develop something for me.  We are using SQL 2008 Standard.  There is no specific timeline to implement.  The data type difference is handled through the manufacturing monitor program.  AS/400 is the master system.  Securing a time window and bandwidth is not a problem.  Closer to real-time the better.

I think i answered most question.  With that info, do either of you have any specific recommendations or examples?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hybrid

Note, too, that you can use a hybrid approach.  For example, you can use a journal or trigger based process on the AS/400 to build a table or table that contains only the keys of the records that need to be updated, for example, or copies of the records themselves.

Then, when your target-system "pull" process runs, it doesn't have to touch the underlying "live" database, or at least it doesn't have to run massive "search for added records, search for changed records, search for deleted records" query batches.  Instead, it can just use the "staging" table to determine what records need to be added/updated/deleted.  This help eliminate the "deleted records problem" inherent in the "pull" architecture.

- Gary Patterson
Thanks a lot for all of the info.  This gives me a good step in the right direction.  I still am not sure exactly how we are going to go about it, but me and our AS/400 admin are going to review all of this info an discuss our options.