Solved

vb.net iterative ms sql 2005 lookup.

Posted on 2007-04-10
10
273 Views
Last Modified: 2013-11-25
i am pretty novice at vb.net so i ask your patience.
i have made a visual studio 2003 vb.net application that essentialy splits out a csv file to an array,
does some conditional processing and creates an xml file for each row. each file contains about 40,000 records (rows).
these xml files contain a pointer(path) to an image file for the purpose of importing both the metadata contained in the xml file(descriptor) and the image into a content mangement system.
the part i am unsure about is that i want to lookup into a mssql 2005 table using one of the array values, for each record. i then add the products from the lookup to the appropriate elements in the xml file.
and then proceed to the next row.
my question is:should i be using a data adapter, data reader or just a plain query or stored procedure.?
i have it working with a data adapter, but when i run the application on the whole 40000 record file, it (the application) looks  like it freezes, but it actually completes and there are 40000 xml files with the appropriate results, after about 1.5 hours/40000 records.
it takes about 1.5 hours to process the the entire 40000 records. and the processor is maxed. does this sound normal?
so if someone could advise me if i am using the correct methodology for this, i would be very grateful.
thank you
0
Comment
Question by:mstucchi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
10 Comments
 
LVL 24

Expert Comment

by:Jeff Certain
ID: 18885899
I'm not entirely sure I understand your process. However, I'll make some comments and see if we can't improve your application.

1. Any time you go out to a file, it's slower than working in memory. If the only purpose of the XML metadata file is to be put into the database, you could probably just keep that data in memory.
2. 40000 files may take a lot of time to process. However, I'd guess your real issue is going to be in the database connections/network latency. In this particular scenario, I'd likely do all my processing into an in-memory datatable, then open the database connection, and update the table. [As a side note, there are mechanisms to do bulk copy in .NET 2.0 -- it might be worth upgrading just to get these]
3. A data reader cannot put data in the database. It only reads.
4. If you're updating a single record at a time, stored procedures should work just fine. Often, using data adapters for updates just cause more confusion, since people don't use them correctly.
5. Check your memory usage as well as CPU time. If you have a lot of applications running and you're memory-constrained, you may end up with "thrashing" -- which will make your application take much longer to run.
6. Run this application on the CMS/DB box if possible. While there are concerns with this in some cases, the more you can minimize your network latency, the better.
7. How big are your images? If they're obnoxiously large, you may want to consider resizing them if appropriate. (If they're 1 MB each, you're transfering 40 GB of data.... that can take a while on a slow pipe)
0
 

Author Comment

by:mstucchi
ID: 18886023
the images are really not part of the equasion as the xml merely describes their location. but they also are typically around 50 to 150 kb.
basically i am retrieving a value from the csv file, looking into a sql table and returning whats found to
the xml file. there is no writing to the database, only to the xml file that i create.
i need only return 1 row from my lookup and then retrieve some values from that row to write to the xml file representing that record.
this process is part of an legacy data/image migration into a more current document management system.
my question is really centered around what is the best way to iteratively pass a single unique,parameter to query a table , return the row and then, retrieve some values, write them to an xml file and then
repeat that query from another row in the csv file, around 40000 times.
read the values from the csv file, query the table, write whats found to the xml. thats pretty much it.
i am looking for a recomendation for the best way to do that.
0
 
LVL 24

Expert Comment

by:Jeff Certain
ID: 18886046
The best way is to NOT do what you describe. You should NOT query the database 40K times... you should ask it once for all the data, put it in memory, and then refer to the local copy of your data.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:mstucchi
ID: 18886102
aaah, thats what i was looking for.
this is the first time i have done this on this scale, so please forgive me if i appear dense.
i want to do this the right way
so now let me ask you this. this table has 18 million records, should i load that into a data adapter
and then query it?
0
 
LVL 24

Expert Comment

by:Jeff Certain
ID: 18886513
Ouch.... 18M records? Any way of knowing what 40K you'll need?
0
 

Author Comment

by:mstucchi
ID: 18886539
no, it is a data extract from an old mainframe application. what i am looking for could be anywhere in that 18 million records so
after pondering your advice i thought i might try to read and process maybe 100 rows at a time, get 100 parameters and use a IN clause in my select query.
but i am not sure how to keep my place in the csv file
0
 
LVL 24

Expert Comment

by:Jeff Certain
ID: 18886602
Well, let's try something a little different.

Read the whole 40K records in from the csv file. Unless there's something really unexpected there, that won't be prohibitive. Then you can process the resultant dataset a hundred rows at a time.

Make sure you leave you actual database connection open the whole time.

If you want to get fancy, you could give the list of keys to a BackgroundWorker to go off and process them on their own thread. Just be careful of your connection if you're taking this approach.

Keep in mind that your SQL query (including the IN statement) cannot exceed 4096 characters.
0
 

Author Comment

by:mstucchi
ID: 18886790
i appreciate the time you are spending on this so thank you.
the fields are fixed width so the parameters would each be 9 characters each plus delimiter.
can i read the entire 40000 into an array so i can count through 100 or so at a time?
0
 
LVL 24

Accepted Solution

by:
Jeff Certain earned 125 total points
ID: 18888495
A dataset would probably be better than an array, although either would work.

Sounds like you should be able to do a few hundred at a time without any problem from the SQL query string length.

I'd highly advise making sure that the database has an index on whatever you're searching against. If it's a primary key, then you've already got it; otherwise, it's worth making sure.
0
 

Author Comment

by:mstucchi
ID: 18899732
thank you for your help, i think you've set me on the right track, i've got my loops working through my csv file and keeping track of where i've been. now i just have to plug in the rest of the logic to process the groups that i have isolated. it should reduce the trips to the database by from 40000 to 400.
hopefully that will reduce the I/O and resource utilization
thanks again.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question