asked on

INSERT INTO too slow in Sybase

I am trying to run a query in Sybase 11 and have the following code:

create table #temp_prod(
                 tin       char(10) NULL,
                 tin9       char(9) NULL,
                 address_type char(1) NULL,
                 company_name char(50) NULL,
                 use_ind char(3) NULL,
                 encrypt_tin      char(10) NULL)

create nonclustered index i_producer_tin on #temp_prod(tin)

INSERT INTO #temp_prod (tin, tin9, address_type, company_name, use_ind, encrypt_tin)
SELECT p.tin, substring(p.tin,1,9), "",
"company_name" =
CASE
WHEN p.company_name = "" then "No Company Name"
ELSE p.company_name
END
,
u.use_ind, p.encrypt_tin
FROM producer p, producer_use u
WHERE p.tin = u.tin AND p.term_dt = "12/31/9999" AND u.use_ind IN ("ALL", "PHYS")

When I run the query, because I have over 1,000,000 rows in the producer table, it takes about 45 minutes to run. Is there a more efficient way of running the query like BEGIN TRANS, WHILE Loop, etc?

Thanks,

Mark

grant300

First, SELECT INTO is much faster than INSERT SELECT. Second, putting the index on the table before it has rows in it is really a waste of resources. The fastest way to build an index is against a populated table.

FYI, SELECT INTO is much faster because it is a minimally logged operation. It does the equivalent of a Fast Bulk Copy into the table and only logs the metadata and space allocation information.

So....

- Get rid of the CREATE TABLE
- Change to a SELECT INTO
- Move the CREATE INDEX down in the code so it is after the SELECT INTO

A note about indexes. You do not specify the index as UNIQUE. Unique indexes are generally much faster than non-unique. If the TIN is unique, definitely make the index unique. If TIN is not unique, add an additional field (or fields) until you can make a unique index.

BTW, avoid the use of double quotes altogether. The only time you use double quotes in SQL is when you have to use a Quoted Identifier; that is, a column or table name that is a reserved word that would otherwise cause a syntax error. Use single quotes so the query processor does not have to search through the list of all identifiers before deciding you have a literal value.

SELECT p.tin, substring(p.tin,1,9) AS tin9, '' AS address_type,
CASE WHEN p.company_name = ''
THEN 'No Company Name'
ELSE p.company_name
END AS company_name,
u.use_ind, p.encrypt_tin
INTO #temp_prod
FROM producer p, producer_use u
WHERE p.tin = u.tin
AND p.term_dt = '12/31/9999'
AND u.use_ind IN ('ALL', 'PHYS')

CREATE UNIQUE NONCLUSTERED INDEX #i_producer_tin ON #temp_prod(tin)

There may also be some query tuning you can do on the Producer and Producer_Use tables. Are there any indexes that might help? What is the cardinality of Producer_use? Depending on the situation, you might want ot make Producer_Use the driving table, particularly if it is significantly smaller than Producer. You can also hint the index assuming there is one on Producer_Use.tin. You might also consider adding use_ind to the tin index on Producer_use. While you are undoubtedly doing a table scan on Producer, you want to make sure you are doing an indexed lookup on

Oh, I almost forgot. As with any tuning effort on Sybase, start by updating the statistics on all the tables involved.

The best speed you will get will depend on how long the underlying SELECT takes to run. You will find it will stuff the results into a temp table faster than it can return them to a client application, e.g. isql.

Finally, prefix the Index name when you create it on #temp_prod with a '#' character. It probably does not make a difference however, there is a chance you are creating a permanent index against a temp table.

If you are still having problems, run the query with SET SHOWPLAN ON and SET NOEXEC ON and post the results.

Best of luck,
Bill

ghostdog74

i suggest using bcp instead.
get all your data from the query:

SELECT p.tin, substring(p.tin,1,9), "",
"company_name" =
CASE
WHEN p.company_name = "" then "No Company Name"
ELSE p.company_name
END
,
u.use_ind, p.encrypt_tin
FROM producer p, producer_use u
WHERE p.tin = u.tin AND p.term_dt = "12/31/9999" AND u.use_ind IN ("ALL", "PHYS")

into a bcp file, then use bcp.

ASKER CERTIFIED SOLUTION

grant300

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Mark1110

ASKER

I tried a SELECT INTO but I got this error message "SELECT INTO failed because column 2 in table #temp_prod has a null column name. Null column names are not allowed" . Is there a switch I can use to let the query accept NULLS?

grant300

It is not that the column does not except nulls, it is that the "column name cannot be null".

If you cut and past my version of the query, you will see that I aliased all of the columns including tin9

.......substring(p.tin,1,9) AS tin9........

Try that and see if it doesn't like it.

Regards,
Bill

Jan Franek

Just one quick question - is it INSERT or SELECT that is slow ? You wrote, that you have over milion rows in producer, but you did't specify, how many of them are you inserting into #temp_prod.

Can you run this query and see, how long it takes to execute and what number of rows does it return ?

SELECT count(1)
FROM producer p, producer_use u
WHERE p.tin = u.tin AND p.term_dt = "12/31/9999" AND u.use_ind IN ("ALL", "PHYS")

Joe Woodhouse

Agree with all the advice so far, nothing to add there.

I can't entirely resist one quibble though:

> The BCP in cannot even start until the query is completely finished either.

That turns out not to be the case. The only way to bcp "out of a query" is to create a view based on the query, and bcp out of that. Depending on the type of query it may be able to return rows as soon as they are processed. I don't see anything in this SELECT that would rule that out (the usual suspects are GROUP BY, DISTINCT, ORDER BY, HAVING, perhaps some kinds of outer joins, etc).

If we're clever with UNIX named pipes, we could in fact create a named pipe, bcp *in* to the destination table from the named pipe, then start the bcp *out* of the view to the named pipe. Depending how the platform implemented named pipes, and whether the bcp in can keep up with the bcp out, we might do it entirely in memory and not need any disk I/Os at all.

Still, I don't think it's the best solution, too many moving parts, and it probably won't outperform a SELECT INTO. Not being able to bcp into a temp table is also an issue as Bill says. 8-)

grant300

Of course, Joe, your correct. It can be made to work. (BTW, we haven't heard from you in a while. Welcome back.)

Beside the fact that it is a very complex way to emulate the SELECT/INTO, my experience with named pipes is that their performance is not always as good as one might hope. The buffer size on most platforms defaults to just 4K which makes for a pretty inelastic link between two processes so there tends to be a lot of 'hurry up an wait'.

The one place it might make sense from a performance standpoint is between two database instances. It might (big if) be faster to NFS mount a remote named pipe so you could bcp out of one database and directly into another. Of course, you have to test this against CIS to be certain. I have built a migration tool that uses CIS and had problems getting the database to turn of the distributed transaction stuff. Oh, well; we digress.

Regards,
Bill

Joe Woodhouse

My limited experience says a well tuned CIS method beats a named pipe over NFS, but the named pipe is better than two local bcps with a ftp/rcp in the middle. (And tremendously better than scp unless you use no security, in which case why use scp?)

Anyway, none of this is helping the actual problem here. I have nothing to add to what you and Jan have already covered. 8-)