Solved

vb.net iterative ms sql 2005 lookup.

Posted on 2007-04-10
10
269 Views
Last Modified: 2013-11-25
i am pretty novice at vb.net so i ask your patience.
i have made a visual studio 2003 vb.net application that essentialy splits out a csv file to an array,
does some conditional processing and creates an xml file for each row. each file contains about 40,000 records (rows).
these xml files contain a pointer(path) to an image file for the purpose of importing both the metadata contained in the xml file(descriptor) and the image into a content mangement system.
the part i am unsure about is that i want to lookup into a mssql 2005 table using one of the array values, for each record. i then add the products from the lookup to the appropriate elements in the xml file.
and then proceed to the next row.
my question is:should i be using a data adapter, data reader or just a plain query or stored procedure.?
i have it working with a data adapter, but when i run the application on the whole 40000 record file, it (the application) looks  like it freezes, but it actually completes and there are 40000 xml files with the appropriate results, after about 1.5 hours/40000 records.
it takes about 1.5 hours to process the the entire 40000 records. and the processor is maxed. does this sound normal?
so if someone could advise me if i am using the correct methodology for this, i would be very grateful.
thank you
0
Comment
Question by:mstucchi
  • 5
  • 5
10 Comments
 
LVL 24

Expert Comment

by:Jeff Certain
ID: 18885899
I'm not entirely sure I understand your process. However, I'll make some comments and see if we can't improve your application.

1. Any time you go out to a file, it's slower than working in memory. If the only purpose of the XML metadata file is to be put into the database, you could probably just keep that data in memory.
2. 40000 files may take a lot of time to process. However, I'd guess your real issue is going to be in the database connections/network latency. In this particular scenario, I'd likely do all my processing into an in-memory datatable, then open the database connection, and update the table. [As a side note, there are mechanisms to do bulk copy in .NET 2.0 -- it might be worth upgrading just to get these]
3. A data reader cannot put data in the database. It only reads.
4. If you're updating a single record at a time, stored procedures should work just fine. Often, using data adapters for updates just cause more confusion, since people don't use them correctly.
5. Check your memory usage as well as CPU time. If you have a lot of applications running and you're memory-constrained, you may end up with "thrashing" -- which will make your application take much longer to run.
6. Run this application on the CMS/DB box if possible. While there are concerns with this in some cases, the more you can minimize your network latency, the better.
7. How big are your images? If they're obnoxiously large, you may want to consider resizing them if appropriate. (If they're 1 MB each, you're transfering 40 GB of data.... that can take a while on a slow pipe)
0
 

Author Comment

by:mstucchi
ID: 18886023
the images are really not part of the equasion as the xml merely describes their location. but they also are typically around 50 to 150 kb.
basically i am retrieving a value from the csv file, looking into a sql table and returning whats found to
the xml file. there is no writing to the database, only to the xml file that i create.
i need only return 1 row from my lookup and then retrieve some values from that row to write to the xml file representing that record.
this process is part of an legacy data/image migration into a more current document management system.
my question is really centered around what is the best way to iteratively pass a single unique,parameter to query a table , return the row and then, retrieve some values, write them to an xml file and then
repeat that query from another row in the csv file, around 40000 times.
read the values from the csv file, query the table, write whats found to the xml. thats pretty much it.
i am looking for a recomendation for the best way to do that.
0
 
LVL 24

Expert Comment

by:Jeff Certain
ID: 18886046
The best way is to NOT do what you describe. You should NOT query the database 40K times... you should ask it once for all the data, put it in memory, and then refer to the local copy of your data.
0
 

Author Comment

by:mstucchi
ID: 18886102
aaah, thats what i was looking for.
this is the first time i have done this on this scale, so please forgive me if i appear dense.
i want to do this the right way
so now let me ask you this. this table has 18 million records, should i load that into a data adapter
and then query it?
0
 
LVL 24

Expert Comment

by:Jeff Certain
ID: 18886513
Ouch.... 18M records? Any way of knowing what 40K you'll need?
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 

Author Comment

by:mstucchi
ID: 18886539
no, it is a data extract from an old mainframe application. what i am looking for could be anywhere in that 18 million records so
after pondering your advice i thought i might try to read and process maybe 100 rows at a time, get 100 parameters and use a IN clause in my select query.
but i am not sure how to keep my place in the csv file
0
 
LVL 24

Expert Comment

by:Jeff Certain
ID: 18886602
Well, let's try something a little different.

Read the whole 40K records in from the csv file. Unless there's something really unexpected there, that won't be prohibitive. Then you can process the resultant dataset a hundred rows at a time.

Make sure you leave you actual database connection open the whole time.

If you want to get fancy, you could give the list of keys to a BackgroundWorker to go off and process them on their own thread. Just be careful of your connection if you're taking this approach.

Keep in mind that your SQL query (including the IN statement) cannot exceed 4096 characters.
0
 

Author Comment

by:mstucchi
ID: 18886790
i appreciate the time you are spending on this so thank you.
the fields are fixed width so the parameters would each be 9 characters each plus delimiter.
can i read the entire 40000 into an array so i can count through 100 or so at a time?
0
 
LVL 24

Accepted Solution

by:
Jeff Certain earned 125 total points
ID: 18888495
A dataset would probably be better than an array, although either would work.

Sounds like you should be able to do a few hundred at a time without any problem from the SQL query string length.

I'd highly advise making sure that the database has an index on whatever you're searching against. If it's a primary key, then you've already got it; otherwise, it's worth making sure.
0
 

Author Comment

by:mstucchi
ID: 18899732
thank you for your help, i think you've set me on the right track, i've got my loops working through my csv file and keeping track of where i've been. now i just have to plug in the rest of the logic to process the groups that i have isolated. it should reduce the trips to the database by from 40000 to 400.
hopefully that will reduce the I/O and resource utilization
thanks again.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now