Link to home
Start Free TrialLog in
Avatar of Autoeforms
Autoeforms

asked on

string management

my problem is simple.
i need to import data and insert/update my databases tables.
i am keeping track of the adds, inserts and deletes.

My current structure first reads all the key values from the target table into a dataset. During the process i scan this table for the key. If found it is an update, if not found i add the key and update the database, if it is a delete i remove the key from the table.

This works and eliminates hits on my database and web site.  My database is accessed via web services so it is critical not to make to many hits on the database.

I ran into a problem the other day when i tried to import 200,000 records into a blank table. As the data set did repeated adds my preformance went into the tank.

Can anyone suggest the best way to build an in memory table of keys with up o 1/4 of million records that allows fast add delete and searching. Key size can be upto 40 characters.

By the way I know it is the dataset management, because i disabled it on my initial import where the table was blank and I was able to update my table at a rate of 100+ records a second.  With the dataset enabled by peformance dropped to 5 records a second.

hope this all makes sense
thanks
in advance


 






Avatar of Bob Learned
Bob Learned
Flag of United States of America image

What is the underlying database system type (SQL Server, Oracle, ...)?

Bob
Avatar of Autoeforms
Autoeforms

ASKER

it is mysql but could be mssql it is client selectable.

 but what does it matter

greg
Greg,

If you have VB.NET 2005, and SQL Server, then there is an optimized way of doing bulk insert with a single DataTable, and the SqlBulkCopy class, versus doing multiple single-record inserts.

Bob
thanks but my DB does not need to be SQL Server.

It is also not a direct connect, the updates happen to the remote server via web services.  Unless I am missing something is there a better way to be doing this with out web services?

greg
Where is the remote server located?  Web services provide a simple approach to database connectivity, but is not the "perfect" solution, since it requires SOAP serialization/deserialization in order to work.  It can really simplify things if you are working with Oracle, since it requires the client software to be installed on each client machine.  

Bob
not using oracle and in our model we don't want the database connectivity from the clients. we have centralized it for management, control and security.

the web services are actually working fine. my problem is more with the localized dataset that i am inserting new keys into. It is dragging the system and the process int the ground after insert about 10000 keys in the dataset.

I am thinking about changing it to a collection but I am looking for the best practice for managing and searching what could be 1/4 million strings that could be up to 40 characters in length. My best guess is the alloc and re-alloc of memory is causing the problem.

is this making sense?
thanks for the help, i am glad you are the one that jumped on this.

greg


Oracle was only an example of why you would use a web service.  There are plenty of other reasons for web services, but if it is for security, then there are better approaches then using a web service.

DataSets can add a bit of overhead, and it depends on what the end result of adding data to the database requirements are, as to whether they are the right choice.  With hash table-type structures, you would have O(n) to add, and O(1) to search.

Bob
i know i can test it but am under the gun :). will the hash table fragment memory or can i reserve a huge block.

what direction would you point me in for better options then web services?

greg
What kind of security are you wanting to achieve with web services?  

With 2005, there is the generics version of the Hashtable structure, and is type-safe (Dictionary(Of String, String)).  One of the constructors for the Dictionary takes a capacity argument.  If you don't specify the initial capacity, then it defaults to 16, and doubles in size when additional capacity is required.  

Example:
Dim dataList As New Dictionary(Of String, String)(250000)

Bob
we have two conversations running.

the security is for my databases. that is what the web services are managing.

My client does need the security on the keys. I just need a structure that is super fast.

to recap i started with an empty table and needed to insert all the keys. by the time i got to 10000 the system was crawling. this is where i am guessing memory was really busted into thousands of little pieces (i am old time C programmer and can only image what was going on behind the covers).

I am going to try this
Dim dataList As New Dictionary(Of String, String)(250000)


can i change the realloc point so i could say make the size 5000 and realloc in 5000 blocks automatically so i don't waste size for smaller imports.

thanks for the help by the way
g
ASKER CERTIFIED SOLUTION
Avatar of Bob Learned
Bob Learned
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
no i could guesstimate, it is a delimited file terminated with CRLF.
g
Do you read that file into memory, before you process?  You could borrow from a WinForms RichTextBox control to process the text.  Then, you could get the number of lines to determine size.

Bob
nah, i read line by line, i was trying to conserve memory realizing the file could be hundreds of thousands of lines.

g
I would read one of the large files in, and see what kind of memory usage you get, before you say something like that.  If you use a local RichTextBox control, it should be garbage collected fairly quickly, or you could even implement IDisposable, if you don't want to wait.

Bob