Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

load data into a hashtable

Posted on 2006-05-25
13
Medium Priority
?
663 Views
Last Modified: 2012-08-14
I need to perform lookups comparisons against a large set of data. The data is comprised of three columns and over 80,000 rows.  I am not sure if it is possible, but I want to load this data in a hashtable once the application is initiated.  I think this will greatly improve the performance of a slow application I have been using.

Thanks for your time and expertise,
Gr8life
0
Comment
Question by:gr8life
  • 7
  • 5
13 Comments
 
LVL 10

Expert Comment

by:heintalus
ID: 16765958
Where is this data stored? would it not be better to do the work at the DB side & only return the info your after or am I missing something?   If all the data does need to be within the app then couldn't you use a DataTable & then use DataViews on that to filter what you want?
0
 

Author Comment

by:gr8life
ID: 16766106
I need to use the hashtable to determine a country name from an IP address.  I am currently using a DB and have tried numerous methods to increase performance but all have made little or negative impact on the applications overall performance.

Sample of what I want to store in the hashtable (this will have ~80,000 total rows):

333333 4444444 US
555555 7777777 CN

Note this values are not accurate I'm just tried to illustrate my objective.

I want to use this hashtable to determine values in a raw file:

data data 3.3.3.3 data data 4.4.4.4 data data
data data 3.3.4.4 data data 5.5.5.5 data data

Note this values are not accurate I'm just tried to illustrate my objective.

Thank you for taking the time to read this post,
Gr8life
0
 

Author Comment

by:gr8life
ID: 16766121
Also the data I am processing is about 2 gig in size

This portion below:

data data 3.3.3.3 data data 4.4.4.4 data data
data data 3.3.4.4 data data 5.5.5.5 data data

Thank you again,
Gr8life
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 34

Expert Comment

by:Sancler
ID: 16768983
Gr8life

This looks like it is connected with your earlier question at

http://www.experts-exchange.com/Programming/Programming_Languages/Dot_Net/VB_DOT_NET/Q_21847301.html

Is that right?  If so, and although I'm probably butting in on something that doesn't concern me and is off-topic as well, I wonder whether rather than asking "how can I do this", it might be worth first considering "whether I should be doing this".

So far as I can see you are processing many files, some of them up to 2 gig, to insert into them values read from an ~80,000, 3 column look-up table.  Again so far as I can see the data you are inserting is Country names - in string form - which equate to IP addresses already in the file.  Again so far as I can see the look-up process involves translating the IP address from the file you are processing into an integer value and finding, in the look-up table, the Country name that equates to the range within which that integer value falls.

I may have got some (or even all) of that description wrong but, in so far as I have got it right, it raises in my mind the following questions

1)  What practical use is going to be made of a 2 gig file?  No human is going to be able to cope with it without machine assistance - e.g. some sort of search facility - and on any one occasion when a human (or machine) does need to get information from it that information is likely to consist of a small (even minute) proportion of the total number of records that a 2 gig file must contain.  The file already contains, without the processing that you are doing at the moment, the IP address from which the Country name is derivable. So why is this processing being done for all records now rather than for the records which are actually going to be used when that usage occurs?  

2)  The answer to the question at the end of (1) might be because the files concerned are being stored for historical purposes, and the look-up table will change over time.  But, if that is the case, an alternative answer would be to store different look-up tables as they do change.  I imagine there must be date stamps on the relevant files so that each could then be linked to the appropriate version of the look-up table.

3)  having to translate each IP address from a 2 gig file (and, I understand, many other files that need to be similarly processed) into an integer value so that it can be compared with integer values in an ~80,000 3-field file containing integers might be tackling (part of) the problem from the wrong end.  Would it not be possible instead to translate the integers in the look-up table to IP addresses?  If so, then the translation has to be done once - for each of two fields in ~80,000 records - and not twice for each record in every file (including those up to 2 gig) that is processed.

4)  however fast any reading and processing algorithms may be, simply writing a 2 gig file is going to take a "long" time.  Have you checked, in relation to the whole process (not just the hashtable angle covered by this specific question), how long simply writing a file of the length concerned would take on the system your process is running on?  Because that - however "long" or "short" - is, it seems to be, going to be the irreducible minimum you will face.  And if that is "too" long, then all the detailed efforts in relation to reading and processing algorithms may be wasted.

As I say, I realise this is not what your question is about, and it's really nothing to do with me.  So you can ignore this, and/or I won't be at all offended if you tell me to mind my own b... business.

Roger
0
 

Author Comment

by:gr8life
ID: 16769211
Roger thank you for your time in reading this post.  You are right in assuming this is connected to the link to the question you posted.  Also you are right no human will interpret each line of data.  I use a software product for that.  However I researched as much as I could on the problems I was having with time of processing and believe the answer is the approach of loading the data to a hashtable.  I believe my approach to the problem was flawed and I am attempting to start from scratch with a more educated approach.  I haven't been programming very long so forgive my lack of knowledge, but I believe this process can be reduced greatly by a different approach.  "There is more than one way to skin a cat." If you have some incite into helping me with the hashtable issue I would greatly appreciate it. If you believe I am wasting my time I also appreciate that response because you are the experts and I am relying on you for your opinions.
Thank you,
Gr8life
0
 
LVL 34

Expert Comment

by:Sancler
ID: 16771827
A hashtable is a collection of key/value pairs.  Each key has to be unique.  You can, if you know the key, quickly find the value that is paired with it.  Both the key and the value can be any object, although it is usual for the key to be a string or number.  The value might be, and often is, a class or a structure.  Before going into the detail of loading ~80,000 such key/value pairs into a hashtable we need to consider how you are then intending to use it.  As each hashtable entry can only have TWO elements, the key and the value, and you have got THREE columns in each record, what is going to be the key and what (structured in what way) is going to be the value?  If you want to use the hashtable for look-up purposes, the key needs to be something that you already know - in this case, I think, an IP address from the record you are processing, either in its raw form or translated into an integer - which (a) will definitely exist among the keys you have put into the hashtable and (b) will lead directly to the lookup answer that you want.  I'm not sure how, from what I currently understand of the setup, condition (a) is going to be met.

My picture of the look up file is that one of the columns is the (integer-translated) start of a range of IP addresses, a second is the (integer-translated) end of that range, and the third is the name of the Country associated with that range.  If that picture is correct, then any IP address that you get from the record you are processing will not necessarily exist in the hashtable keys whichever of the columns you use for as the key.  If your first record in the look-up table was 0, 10, Ruritania and you had a (translated) value of 5 for the IP address in one of the records you were processing, your hashtable would not produce the answer Ruritania for you.  This is because the only key relating to that range would be 0, if you used the first column, or 10, if you used the last column, and 5 is neither of those.

So, I'll certainly be happy, in principle, to help with loading the data into a hashtable if that's what you want.  But is it what you want?  And, if it is, what data is to go where?

Roger
0
 

Author Comment

by:gr8life
ID: 16773543
Prior to this posting I was provided some excellent advice (from several experts here) which lead me to rereading several books as well as researching online.  I found this C# code and thought that I might be able to use this approach and if so I believed the application could be made process faster.  I also agree that the writing portion of the process can only go as fast as it can go. Anyway I am going to read/research the recommendations you posted.
Thank you very much for your time,
Gr8life

http://www.eggheadcafe.com/articles/20051109.asp
0
 
LVL 34

Expert Comment

by:Sancler
ID: 16775024
I've had a look at the link you posted and now see why you were asking about a hashtable.  But in that code what is loaded into the hashtable is a CountryCode/CountryName key/value pair.  The data for it (in the code as displayed) comes from this part of the code

  private static string[] countryCodes =
   {
    "AP", "EU", "AD", "AE", "AF" ....
   };

  private static string[] countryNames =
   {
    "Asia/Pacific Region", "Europe", "Andorra", "United Arab Emirates", "Afghanistan", ...
   };

That is, it is hardcoded into the program (and incomplete in the code displayed) rather than being loaded from the 3-column table that, I think, is equivalent to the one you are using.  For the purposes of the program the 3-column table equivalent to yours is in the form StartOfIPAddressRange, EndOfIPAddressRange, CountryCode and the hashtable is just used to translate CountryCode from that into CountryName.  It does not appear to me to contribute anything to the efficiency of actually finding, for any given single IP Address, into which range (hence into which Country) in the 3-column table it fits.  Indeed (and the article comments on this in paragraph immediately following the code) it tackles that part of the problem by simply starting at the beginning of the 3-column table and looping through every record until it hits the range concerned.  It says it does this because

>>
there can be a ("gasp!") BUG in the DataTable Select method which prevents the use of the AND keyword (such as "ip <= EndIp AND ip >=BeginIp")
<<

The code in your Public Function IPLookup(ByVal DottedIP As String) As String - from your earlier post - adopts that alternative approach

        sql = lngIP.ToString & ">=ipl1 and " & lngIP.ToString & " <= ipl2"
        foundrows = ds.Tables("Country").Select(sql)

So far as I know, you have not experienced any such "bug" problems with it and, in theory, your approach ought to be faster than the iteration approach used in the code from the link.

Indeed, overall (although I have not gone through it in great detail, as I am not an adept at C#), the approach in the link looks to me to be, so far as processing is concerned, very similar to the one you have already.  The essential point of the article, as I understand it, is the saving and compressing of the look-up table(s) as part of the app rather than accessing/loading them - as separate database files - at run-time.  Whilst I don't doubt that that could make a difference, the essence of your problem as I understand it is that you are wanting to process so much data.  Speeding up the initial loading of the look-up data may be significant, in relative terms, when (as the demo app is concerned with) information for just one or two IP addresses is being sought.  But, in relative terms, its effect will be very marginal when information about thousands, even millions, of IP addresses is being sought.

Roger
0
 

Author Comment

by:gr8life
ID: 16793109
Roger, after researching all weekend I came to the conclusion you are right. I can not use a hashtable to load the resource database.  However if it is not too much trouble could you point me in the right direction as to load data in a hashtable.  I have spent so much energy reading about I would really like to be able be able to experiment with it.  

Thank you for your time and expertise,
Gr8life
0
 
LVL 34

Expert Comment

by:Sancler
ID: 16793923
I'll try.  I'm for an early bed tonight, but I'll look at it tomorrow.

What data do you want to load into a hashtable?  Where will it come from and, remembering what I said a couple of posts up, what will be the key and what the value?

Roger
0
 

Author Comment

by:gr8life
ID: 16794937
How about loading data like the one from the post that dealt with the code below.  Also if you have any good reference urls or snippets that would be great.


private static string[] countryCodes =
   {
    "AP", "EU", "AD", "AE", "AF" ....
   };

  private static string[] countryNames =
   {
    "Asia/Pacific Region", "Europe", "Andorra", "United Arab Emirates", "Afghanistan", ...
   };

Thank you,
Gr8life
0
 
LVL 34

Accepted Solution

by:
Sancler earned 2000 total points
ID: 16796784
All that happens in that code is that two one-dimension arrays - countryCode and CountryNames - are created and filled by the lines

  private static string[] countryCodes =
   {
    "AP", "EU", "AD", "AE", "AF" ....
   };

  private static string[] countryNames =
   {
    "Asia/Pacific Region", "Europe", "Andorra", "United Arab Emirates", "Afghanistan", ...
   };

The equivalent - taking out the "...", which just indicated that the code was incomplete: it was just for purposes of example - in VB.NET would be

        Dim CountryCodes() As String = New String() {"AP", "EU", "AD", "AE", "AF"}
        Dim CountryNames() As String = New String() {"Asia/Pacific Region", "Europe", "Andorra", "United Arab Emirates", "Afghanistan"}

In the code from that the link the hashtable is declared by the line

  private Hashtable htCountryNames = new Hashtable();

and it is loaded by these lines

   for (int i = 0; i < CountryLookup.countryCodes.Length; i++)
   {
    this.htCountryNames.Add(CountryLookup.countryCodes[i], CountryLookup.countryNames[i]);
   }

The equivalents in VB.NET would be

        Dim htCountryNames As New Hashtable

and

        For i As Integer = 0 To CountryCodes.Length - 1
            htCountryNames.Add(CountryCodes(i), CountryNames(i))
        Next

And the hashtable was used by this line

     this.foundCountryLong = (string) this.htCountryNames[foundCountry];

which would translate into VB.NET (assuming that foundCountryLong had been declared As String and foundCountry was one of the String codes that had been loaded into htCountryNames as the key - e.g. "AP", "EU" etc.)

        foundCountryLong = htCountryNames(foundCountry)

The example is a bit artificial.  It would be most unusual in practice for a program to hard-code the values for two one-dimensional string arrays and then fill a hashtable by looping through those arrays.  Much more likely would be a persistent data source - e.g. a database table or a .csv file - with two fields which would then be read into a hashtable perhaps with a datareader on the lines

       While myReader.Read
          myHashTable.Add(myReader(0), myReader(1))
       End While

As for general guidance, you could do far worse than studying this

http://msdn2.microsoft.com/en-us/system.collections.hashtable.aspx

Roger      
0
 

Author Comment

by:gr8life
ID: 16797646
Thank you very much for your expertise,
Gr8life
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Microsoft Reports are based on a report definition, which is an XML file that describes data and layout for the report, with a different extension. You can create a client-side report definition language (*.rdlc) file with Visual Studio, and build g…
Creating an analog clock UserControl seems fairly straight forward.  It is, after all, essentially just a circle with several lines in it!  Two common approaches for rendering an analog clock typically involve either manually calculating points with…
Integration Management Part 2
This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …
Suggested Courses

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question