vb.net iterative ms sql 2005 lookup.

i am pretty novice at vb.net so i ask your patience.
i have made a visual studio 2003 vb.net application that essentialy splits out a csv file to an array,
does some conditional processing and creates an xml file for each row. each file contains about 40,000 records (rows).
these xml files contain a pointer(path) to an image file for the purpose of importing both the metadata contained in the xml file(descriptor) and the image into a content mangement system.
the part i am unsure about is that i want to lookup into a mssql 2005 table using one of the array values, for each record. i then add the products from the lookup to the appropriate elements in the xml file.
and then proceed to the next row.
my question is:should i be using a data adapter, data reader or just a plain query or stored procedure.?
i have it working with a data adapter, but when i run the application on the whole 40000 record file, it (the application) looks  like it freezes, but it actually completes and there are 40000 xml files with the appropriate results, after about 1.5 hours/40000 records.
it takes about 1.5 hours to process the the entire 40000 records. and the processor is maxed. does this sound normal?
so if someone could advise me if i am using the correct methodology for this, i would be very grateful.
thank you
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Jeff CertainCommented:
I'm not entirely sure I understand your process. However, I'll make some comments and see if we can't improve your application.

1. Any time you go out to a file, it's slower than working in memory. If the only purpose of the XML metadata file is to be put into the database, you could probably just keep that data in memory.
2. 40000 files may take a lot of time to process. However, I'd guess your real issue is going to be in the database connections/network latency. In this particular scenario, I'd likely do all my processing into an in-memory datatable, then open the database connection, and update the table. [As a side note, there are mechanisms to do bulk copy in .NET 2.0 -- it might be worth upgrading just to get these]
3. A data reader cannot put data in the database. It only reads.
4. If you're updating a single record at a time, stored procedures should work just fine. Often, using data adapters for updates just cause more confusion, since people don't use them correctly.
5. Check your memory usage as well as CPU time. If you have a lot of applications running and you're memory-constrained, you may end up with "thrashing" -- which will make your application take much longer to run.
6. Run this application on the CMS/DB box if possible. While there are concerns with this in some cases, the more you can minimize your network latency, the better.
7. How big are your images? If they're obnoxiously large, you may want to consider resizing them if appropriate. (If they're 1 MB each, you're transfering 40 GB of data.... that can take a while on a slow pipe)
mstucchiAuthor Commented:
the images are really not part of the equasion as the xml merely describes their location. but they also are typically around 50 to 150 kb.
basically i am retrieving a value from the csv file, looking into a sql table and returning whats found to
the xml file. there is no writing to the database, only to the xml file that i create.
i need only return 1 row from my lookup and then retrieve some values from that row to write to the xml file representing that record.
this process is part of an legacy data/image migration into a more current document management system.
my question is really centered around what is the best way to iteratively pass a single unique,parameter to query a table , return the row and then, retrieve some values, write them to an xml file and then
repeat that query from another row in the csv file, around 40000 times.
read the values from the csv file, query the table, write whats found to the xml. thats pretty much it.
i am looking for a recomendation for the best way to do that.
Jeff CertainCommented:
The best way is to NOT do what you describe. You should NOT query the database 40K times... you should ask it once for all the data, put it in memory, and then refer to the local copy of your data.
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

mstucchiAuthor Commented:
aaah, thats what i was looking for.
this is the first time i have done this on this scale, so please forgive me if i appear dense.
i want to do this the right way
so now let me ask you this. this table has 18 million records, should i load that into a data adapter
and then query it?
Jeff CertainCommented:
Ouch.... 18M records? Any way of knowing what 40K you'll need?
mstucchiAuthor Commented:
no, it is a data extract from an old mainframe application. what i am looking for could be anywhere in that 18 million records so
after pondering your advice i thought i might try to read and process maybe 100 rows at a time, get 100 parameters and use a IN clause in my select query.
but i am not sure how to keep my place in the csv file
Jeff CertainCommented:
Well, let's try something a little different.

Read the whole 40K records in from the csv file. Unless there's something really unexpected there, that won't be prohibitive. Then you can process the resultant dataset a hundred rows at a time.

Make sure you leave you actual database connection open the whole time.

If you want to get fancy, you could give the list of keys to a BackgroundWorker to go off and process them on their own thread. Just be careful of your connection if you're taking this approach.

Keep in mind that your SQL query (including the IN statement) cannot exceed 4096 characters.
mstucchiAuthor Commented:
i appreciate the time you are spending on this so thank you.
the fields are fixed width so the parameters would each be 9 characters each plus delimiter.
can i read the entire 40000 into an array so i can count through 100 or so at a time?
Jeff CertainCommented:
A dataset would probably be better than an array, although either would work.

Sounds like you should be able to do a few hundred at a time without any problem from the SQL query string length.

I'd highly advise making sure that the database has an index on whatever you're searching against. If it's a primary key, then you've already got it; otherwise, it's worth making sure.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mstucchiAuthor Commented:
thank you for your help, i think you've set me on the right track, i've got my loops working through my csv file and keeping track of where i've been. now i just have to plug in the rest of the logic to process the groups that i have isolated. it should reduce the trips to the database by from 40000 to 400.
hopefully that will reduce the I/O and resource utilization
thanks again.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Development

From novice to tech pro — start learning today.