• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 7354
  • Last Modified:

INSERT INTO too slow in Sybase

I am trying to run a query in Sybase 11 and have the following code:

create table #temp_prod(
                     tin                       char(10) NULL,
                     tin9                       char(9)  NULL,
                     address_type            char(1)  NULL,  
                     company_name            char(50) NULL,
                     use_ind                 char(3)  NULL,
                     encrypt_tin                  char(10) NULL)

create nonclustered index i_producer_tin on #temp_prod(tin)


INSERT INTO #temp_prod (tin, tin9, address_type, company_name, use_ind, encrypt_tin)
SELECT p.tin, substring(p.tin,1,9), "",
"company_name" =
CASE
   WHEN p.company_name = "" then "No Company Name"
   ELSE p.company_name
END
,
u.use_ind, p.encrypt_tin
FROM producer p, producer_use u
WHERE p.tin = u.tin AND p.term_dt = "12/31/9999" AND u.use_ind IN ("ALL", "PHYS")

When I run the query, because I have over 1,000,000 rows in the producer table, it takes about 45 minutes to run. Is there a more efficient way of running the query like BEGIN TRANS, WHILE Loop, etc?

Thanks,

Mark
0
Mark1110
Asked:
Mark1110
1 Solution
 
grant300Commented:
First, SELECT INTO is much faster than INSERT SELECT.  Second, putting the index on the table before it has rows in it is really a waste of resources.  The fastest way to build an index is against a populated table.

FYI, SELECT INTO is much faster because it is a minimally logged operation.  It does the equivalent of a Fast Bulk Copy into the table and only logs the metadata and space allocation information.

So....

 - Get rid of the CREATE TABLE
 - Change to a SELECT INTO
 - Move the CREATE INDEX down in the code so it is after the SELECT INTO

A note about indexes.  You do not specify the index as UNIQUE.  Unique indexes are generally much faster than non-unique.  If the TIN is unique, definitely make the index unique.  If TIN is not unique, add an additional field (or fields) until you can make a unique index.

BTW, avoid the use of double quotes altogether.  The only time you use double quotes in SQL is when you have to use a Quoted Identifier; that is, a column or table name that is a reserved word that would otherwise cause a syntax error.  Use single quotes so the query processor does not have to search through the list of all identifiers before deciding you have a literal value.

SELECT p.tin, substring(p.tin,1,9) AS tin9, '' AS address_type,
             CASE WHEN p.company_name = ''
                        THEN 'No Company Name'
                        ELSE p.company_name
               END AS company_name,
             u.use_ind, p.encrypt_tin
    INTO #temp_prod
   FROM producer p, producer_use u
WHERE p.tin = u.tin
     AND p.term_dt = '12/31/9999'
     AND u.use_ind IN ('ALL', 'PHYS')

CREATE UNIQUE NONCLUSTERED INDEX #i_producer_tin ON #temp_prod(tin)

There may also be some query tuning you can do on the Producer and Producer_Use tables.  Are there any indexes that might help?  What is the cardinality of Producer_use?  Depending on the situation, you might want ot make Producer_Use the driving table, particularly if it is significantly smaller than Producer.  You can also hint the index assuming there is one on Producer_Use.tin.  You might also consider adding use_ind to the tin index on Producer_use.  While you are undoubtedly doing a table scan on Producer, you want to make sure you are doing an indexed lookup on

Oh, I almost forgot.  As with any tuning effort on Sybase, start by updating the statistics on all the tables involved.

The best speed you will get will depend on how long the underlying SELECT takes to run.  You will find it will stuff the results into a temp table faster than it can return them to a client application, e.g. isql.

Finally, prefix the Index name when you create it on #temp_prod with a '#' character.  It probably does not make a difference however, there is a chance you are creating a permanent index against a temp table.

If you are still having problems, run the query with SET SHOWPLAN ON and SET NOEXEC ON and post the results.

Best of luck,
Bill
0
 
ghostdog74Commented:
i suggest using bcp instead.
get all your data from the query:

SELECT p.tin, substring(p.tin,1,9), "",
"company_name" =
CASE
   WHEN p.company_name = "" then "No Company Name"
   ELSE p.company_name
END
,
u.use_ind, p.encrypt_tin
FROM producer p, producer_use u
WHERE p.tin = u.tin AND p.term_dt = "12/31/9999" AND u.use_ind IN ("ALL", "PHYS")


into a bcp file, then use bcp.
0
 
grant300Commented:
ghostdog74:

That technique has a couple of problems.  First, you cannot BCP into a temporary table.  Second, it will be slower than the SELECT/INTO.  This is because the you have to extract all the data from the context of the database engine (TDS and all the rest of the network protocol), write it to a flat file (bunch of disk I/Os), incur the overhead of converting the binary data types to characters, and then read the flat file (yet more disk I/Os), convert the character data types back to the internal format, and do the bulk load of the table.  If the table already has the index on it, you will be doing "slow" bcp and logging the changes to the index along the way.  The BCP in cannot even start until the query is completely finished either.

You can get around the data type conversions by creating a view of the query and doing a native BCP out but there is still a lot going against it, not the least of which is that it needs to be done outside the database in scripts.

The SELECT/INTO does the equivalent of a Fast BCP without all the additional overhead.  In fact, the database option you have to have set on to do SELECT/INTO is called "SELECT/INTO BULK COPY"

Regards,
Bill
0
New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

 
Mark1110Author Commented:
I tried a SELECT INTO but I got this error message "SELECT INTO failed because column 2 in table #temp_prod has a null column name. Null column names are not allowed" . Is there a switch I can use to let the query accept NULLS?
0
 
grant300Commented:
It is not that the column does not except nulls, it is that the "column name cannot be null".

If you cut and past my version of the query, you will see that I aliased all of the columns including tin9

.......substring(p.tin,1,9) AS tin9........

Try that and see if it doesn't like it.

Regards,
Bill
0
 
Jan FranekCommented:
Just one quick question - is it INSERT or SELECT that is slow ? You wrote, that you have over milion rows in producer, but you did't specify, how many of them are you inserting into #temp_prod.

Can you run this query and see, how long it takes to execute and what number of rows does it return ?

SELECT count(1)
FROM producer p, producer_use u
WHERE p.tin = u.tin AND p.term_dt = "12/31/9999" AND u.use_ind IN ("ALL", "PHYS")
0
 
Joe WoodhousePrincipal ConsultantCommented:
Agree with all the advice so far, nothing to add there.

I can't entirely resist one quibble though:

> The BCP in cannot even start until the query is completely finished either.

That turns out not to be the case. The only way to bcp "out of a query" is to create a view based on the query, and bcp out of that. Depending on the type of query it may be able to return rows as soon as they are processed. I don't see anything in this SELECT that would rule that out (the usual suspects are GROUP BY, DISTINCT, ORDER BY, HAVING, perhaps some kinds of outer joins, etc).

If we're clever with UNIX named pipes, we could in fact create a named pipe, bcp *in* to the destination table from the named pipe, then start the bcp *out* of the view to the named pipe. Depending how the platform implemented named pipes, and whether the bcp in can keep up with the bcp out, we might do it entirely in memory and not need any disk I/Os at all.

Still, I don't think it's the best solution, too many moving parts, and it probably won't outperform a SELECT INTO. Not being able to bcp into a temp table is also an issue as Bill says. 8-)
0
 
grant300Commented:
Of course, Joe, your correct.  It can be made to work.  (BTW, we haven't heard from you in a while.  Welcome back.)

Beside the fact that it is a very complex way to emulate the SELECT/INTO, my experience with named pipes is that their performance is not always as good as one might hope.  The buffer size on most platforms defaults to just 4K which makes for a pretty inelastic link between two processes so there tends to be a lot of 'hurry up an wait'.

The one place it might make sense from a performance standpoint is between two database instances.  It might (big if) be faster to NFS mount a remote named pipe so you could bcp out of one database and directly into another.  Of course, you have to test this against CIS to be certain.  I have built a migration tool that uses CIS and had problems getting the database to turn of the distributed transaction stuff.  Oh, well; we digress.

Regards,
Bill
0
 
Joe WoodhousePrincipal ConsultantCommented:
My limited experience says a well tuned CIS method beats a named pipe over NFS, but the named pipe is better than two local bcps with a ftp/rcp in the middle. (And tremendously better than scp unless you use no security, in which case why use scp?)

Anyway, none of this is helping the actual problem here. I have nothing to add to what you and Jan have already covered. 8-)
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now