Celebrate National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


vb.net iterative ms sql 2005 lookup.

Posted on 2007-04-10
Medium Priority
Last Modified: 2013-11-25
i am pretty novice at vb.net so i ask your patience.
i have made a visual studio 2003 vb.net application that essentialy splits out a csv file to an array,
does some conditional processing and creates an xml file for each row. each file contains about 40,000 records (rows).
these xml files contain a pointer(path) to an image file for the purpose of importing both the metadata contained in the xml file(descriptor) and the image into a content mangement system.
the part i am unsure about is that i want to lookup into a mssql 2005 table using one of the array values, for each record. i then add the products from the lookup to the appropriate elements in the xml file.
and then proceed to the next row.
my question is:should i be using a data adapter, data reader or just a plain query or stored procedure.?
i have it working with a data adapter, but when i run the application on the whole 40000 record file, it (the application) looks  like it freezes, but it actually completes and there are 40000 xml files with the appropriate results, after about 1.5 hours/40000 records.
it takes about 1.5 hours to process the the entire 40000 records. and the processor is maxed. does this sound normal?
so if someone could advise me if i am using the correct methodology for this, i would be very grateful.
thank you
Question by:mstucchi
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
LVL 24

Expert Comment

by:Jeff Certain
ID: 18885899
I'm not entirely sure I understand your process. However, I'll make some comments and see if we can't improve your application.

1. Any time you go out to a file, it's slower than working in memory. If the only purpose of the XML metadata file is to be put into the database, you could probably just keep that data in memory.
2. 40000 files may take a lot of time to process. However, I'd guess your real issue is going to be in the database connections/network latency. In this particular scenario, I'd likely do all my processing into an in-memory datatable, then open the database connection, and update the table. [As a side note, there are mechanisms to do bulk copy in .NET 2.0 -- it might be worth upgrading just to get these]
3. A data reader cannot put data in the database. It only reads.
4. If you're updating a single record at a time, stored procedures should work just fine. Often, using data adapters for updates just cause more confusion, since people don't use them correctly.
5. Check your memory usage as well as CPU time. If you have a lot of applications running and you're memory-constrained, you may end up with "thrashing" -- which will make your application take much longer to run.
6. Run this application on the CMS/DB box if possible. While there are concerns with this in some cases, the more you can minimize your network latency, the better.
7. How big are your images? If they're obnoxiously large, you may want to consider resizing them if appropriate. (If they're 1 MB each, you're transfering 40 GB of data.... that can take a while on a slow pipe)

Author Comment

ID: 18886023
the images are really not part of the equasion as the xml merely describes their location. but they also are typically around 50 to 150 kb.
basically i am retrieving a value from the csv file, looking into a sql table and returning whats found to
the xml file. there is no writing to the database, only to the xml file that i create.
i need only return 1 row from my lookup and then retrieve some values from that row to write to the xml file representing that record.
this process is part of an legacy data/image migration into a more current document management system.
my question is really centered around what is the best way to iteratively pass a single unique,parameter to query a table , return the row and then, retrieve some values, write them to an xml file and then
repeat that query from another row in the csv file, around 40000 times.
read the values from the csv file, query the table, write whats found to the xml. thats pretty much it.
i am looking for a recomendation for the best way to do that.
LVL 24

Expert Comment

by:Jeff Certain
ID: 18886046
The best way is to NOT do what you describe. You should NOT query the database 40K times... you should ask it once for all the data, put it in memory, and then refer to the local copy of your data.
Survive A High-Traffic Event with Percona

Your application or website rely on your database to deliver information about products and services to your customers. You can’t afford to have your database lose performance, lose availability or become unresponsive – even for just a few minutes.


Author Comment

ID: 18886102
aaah, thats what i was looking for.
this is the first time i have done this on this scale, so please forgive me if i appear dense.
i want to do this the right way
so now let me ask you this. this table has 18 million records, should i load that into a data adapter
and then query it?
LVL 24

Expert Comment

by:Jeff Certain
ID: 18886513
Ouch.... 18M records? Any way of knowing what 40K you'll need?

Author Comment

ID: 18886539
no, it is a data extract from an old mainframe application. what i am looking for could be anywhere in that 18 million records so
after pondering your advice i thought i might try to read and process maybe 100 rows at a time, get 100 parameters and use a IN clause in my select query.
but i am not sure how to keep my place in the csv file
LVL 24

Expert Comment

by:Jeff Certain
ID: 18886602
Well, let's try something a little different.

Read the whole 40K records in from the csv file. Unless there's something really unexpected there, that won't be prohibitive. Then you can process the resultant dataset a hundred rows at a time.

Make sure you leave you actual database connection open the whole time.

If you want to get fancy, you could give the list of keys to a BackgroundWorker to go off and process them on their own thread. Just be careful of your connection if you're taking this approach.

Keep in mind that your SQL query (including the IN statement) cannot exceed 4096 characters.

Author Comment

ID: 18886790
i appreciate the time you are spending on this so thank you.
the fields are fixed width so the parameters would each be 9 characters each plus delimiter.
can i read the entire 40000 into an array so i can count through 100 or so at a time?
LVL 24

Accepted Solution

Jeff Certain earned 500 total points
ID: 18888495
A dataset would probably be better than an array, although either would work.

Sounds like you should be able to do a few hundred at a time without any problem from the SQL query string length.

I'd highly advise making sure that the database has an index on whatever you're searching against. If it's a primary key, then you've already got it; otherwise, it's worth making sure.

Author Comment

ID: 18899732
thank you for your help, i think you've set me on the right track, i've got my loops working through my csv file and keeping track of where i've been. now i just have to plug in the rest of the logic to process the groups that i have isolated. it should reduce the trips to the database by from 40000 to 400.
hopefully that will reduce the I/O and resource utilization
thanks again.

Featured Post

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
In this article we will learn how to fix  “Cannot install SQL Server 2014 Service Pack 2: Unable to install windows installer msi file” error ?
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question