Sybase bcp in is running slow and taking too long.

learning_sybase
learning_sybase used Ask the Experts™
on
Sybase bcp in tuning needed

Hi, my bcp in is taking too long. Its running in batch of 5000 but taking too long. The bcp file is 30GB, taken from sybase bcp out. The table has text data too , the table size is 31GB and file size is 30GB. Its running too slow.
Used packets size of -A 8192 and 16284 too.
Increased number of pre-allocated extents to 32 too but no help.

Experts please advise. Otherwise this bcp will take ages to complete
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Author

Commented:
Guys, please help...waiting for responses

Author

Commented:
Dear Joe Woodhouse,

Request you to help me in this. The source table is 30GB big and has 3-4 columns which are of type text. I am doing some migration from 15.7(2K Page) to ASE 16(8k page), hence need to bcp out the data and bcp in into ASE16 (8K page).
I used the blow command , but its taking too long

bcp tablename in "filename.bcp" -c  -t '|' -r '@'  -J utf8 -Y -A 8192 -e"errorfilename" -S db_server -Ulogin -b5000

I also see messages like :

blk_rowxfer(): blk layer: internal BLK-Library error: Data truncated while doing local character set conversion.  col = 1
CTLIB Message:  - L1/O2/S1/N42/1/0:
blk_rowxfer(): blk layer: internal BLK-Library error: Data truncated while doing local character set conversion.  col = 1
CTLIB Message:  - L1/O2/S1/N42/1/0:


Kindly advise to resolve these errors and how to increase performance of the bcp in.
Principal Consultant
Most Valuable Expert 2012
Commented:
There are actually three issues here:

1) bcp performance
2) character conversion
3) data loss

Looking at the first section in detail because that's what you asked for...

1) bcp performance

The main things that affect bcp performance are:

1a) bcp logging mode (slow/fully logged vs. fast/minimally logged)
1b) bcp portability mode (native mode vs. character mode)
1c) network packet size
1d) transaction batch size
1e) bcp concurrency
1f) general ASE performance of the target ASE

1a) bcp logging mode (slow/fully logged vs. fast/minimally logged)

Do the tables receiving these bcp jobs have indexes or triggers? Is the "select into" database option set?

1b) bcp portability mode (native mode vs. character mode)

You're using character mode with terminators. Are the source and target ASEs both the same platform (running on the same CPU hardware and same OS)?

It doesn't matter that they're different ASE page sizes, as bcp neither knows nor cares about page size of either the source or the destination.

1c) network packet size

You've set an 8Kb network packet size on the bcp command line, but is your target ASE configured to accept connections at this packet size? (At a minimum, set "max network packet size" to this number, and set "additional network memory" to be 3 * this number * number of connections at this packet size, rounded up to the next integer multiple of 2048).

That said 8Kb packet size is not that large for a bcp run locally to this ASE. (You are running this bcp local to your target ASE, right?) You can easily set this to be 64Kb or even 128Kb. Remember to set the ASE server-side config to support this as above.

1d) transaction batch size

It's good that you're using batching. I've had good success with larger batches... when the ASE is able to support heavier writing (see below). I'd suggest trying -b25000 or even -b50000 if you have a well-tuned ASE and storage system.

1e) bcp concurrency

How many tables are you having to bcp? In principle you can run multiple concurrent bcp jobs, one per table, so long as your ASE and storage systems can handle it (see below). You should see good scaling up to N concurrent jobs where N=number of ASE engines OR number of physical CPUs/2, whichever is smaller. (32 CPUs and 30 ASE engines = run 16 concurrent bcp jobs.) This is because you need CPU for both the bcp job itself and a receiving ASE engine. You can cautiously try higher than N but you won't see linear scaling and after a point you'll hurt more than you help. I generally won't go above # of bcp jobs = # of CPUs.

If you have a small number of very large tables you can investigate trying concurrent bcp on these. This works best when the table is partitioned and where you have already split the bcp file into multiple files. It's difficult to do this well and improve performance, but not impossible. (Simply using -F and -L isn't enough - you need partitioned tabled and partitioned files.)

1f) general ASE write performance of the target ASE

Bottom line: bcp looks to the target ASE like either an INSERT statement (slow bcp) or SELECT INTO (fast bcp). So everything that speeds up writes also speeds up bcp and allows you to push the bcp harder.

Does the database you are writing to have its data and log logically separated? (As in, sp_helpdb doesn't show "mixed data and log"?)
Does the database you are writing to have its data and log physically separated? (As in, different mount points/LUNs?)
Are the ASE devices well tuned for writes? (As in, dsync=false, directio=true?)
If using file system devices for data and/or logs, are the file systems well tuned for writes? (At a minimum: noatime,nodiratime. Preferred, if using a sufficient UPS: nobarrier, and if possible filesystem journaling fully disabled?)
Does the database you are writing to have a log cache (defined as logonly and relaxed)?


Re. character conversion and data loss... do your source and destination ASEs use different character sets? bcp character set conversion is intended when the client (i.e. bcp) character set doesn't match the server (ASE) that the client (bcp) is talking to.

If your CPU hardware is different then you have a potential problem with approximate datatypes like floats and reals because those are platform-dependent.

You also say that you have text/image data but I don't see you setting a maximum text side (-T). Set this to something you know is bigger than any text/image data that you have. I like to use "-T999999999999" just to be absolutely sure there is no truncation.

One last issue that everyone forgets is if you have NULL data in any column that has since had a default added to it. Think it through... you bcp out a row, and it properly writes a NULL in this column. Then you bcp in the row... ASE sees the NULL... and applies the default. It will do this silently, and change your data.
Joe WoodhousePrincipal Consultant
Most Valuable Expert 2012

Commented:
Of course the fastest way to do something is to just not do it at all.

Have you considered a Replication-based alternative? You can take as long as you like to get the data copied without any impact to production.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial