Link to home
Start Free TrialLog in
Avatar of James Hancock
James HancockFlag for United States of America

asked on

Data mining bot DB upload suggestions? anyone?

Hi
My data mining bot is finished. It is staggeringly cunning, if I may say.
All I need to do is have a method to save and retrieve large data.
I expect to collect thousands of email addresses to sort. Is text not considered large? I have a Godaddy web-space.
For a simple Java data structure
like  
struct emailsNode {
int addressCount;
String[] addresses; //large string[] array
}
Must I use SQL? This is all in .java, not PHP. I doubt a simple .txt file in my web-space root directory is good
?
Is there an idiot proof page of SQL data upload and download?
Is it just "create an SQL database in my godaddy panel and paste in the upload-code?"
thoughts?
Thanks
Avatar of mwochnick
mwochnick
Flag of United States of America image

A simple database and table should be fine.  If you have a 1 GB database and you assume 1kB per email address - which is seems large -
you would be able to store approximately 1 million email addresses

plus with a SQL solution you'd have the ability to use sql for querying, sorting, etc.
Avatar of James Hancock

ASKER

Maybe, since I don't mind how the email addresses are stored, I could have 2 databases, or 3 and just pick them off randomly, when needed? - meaning 3 million addresses?

How do I upload the addresses data to a database, and then query a field, in Java? Since it is just simple string arrays, isn't it trivial? What is a good walk-thru page?

Should I download the entire string array each time and add a new address to the end of the array and re-upload the data, or do they have an append type function?
Thanks
My Godaddy Panel says I have used 7GB out of 250GB
So, I suspect I don't have to be too concerned about ceilings.
SOLUTION
Avatar of mwochnick
mwochnick
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of CEHJ
If you have a 1 GB database and you assume 1kB per email address - which is seems large -
It certainly does seem large ;). I would have thought an average of 32 bytes per address, not 1000.

Of course, you might or might not have additional info along with the address
If you have a 1 GB database and you assume 1kB per email address

From BOL:
http://msdn.microsoft.com/en-us/library/ms176089.aspx

char [ ( n ) ]
Fixed-length, non-Unicode string data. n defines the string length and must be a value from 1 through 8,000. The storage size is n bytes. The ISO synonym for char is character.
varchar [ ( n | max ) ]
Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are char varying or character varying.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I briefly considered just putting all the addresses in a public static Java class big, static String array,
and calling on that class's data member when needed, to get an email address. .

String email = giantEmailListClass.emails[0];

SQL is probably more reliable?
I have never set up an SQL database.
It is simple, just a giant array of indexed Strings.
So I go to Godaddy and add a new DB in my webspace and bingo? Do I have to specify its data structure there?
My database is just a giant list of index, String
Can I make the first item in the DB the count? or is that unnecessary, since they are indexed.
Size = Highest Index+1;

Thanks
SQL is probably more reliable
You need to define what you mean by reliable as far as your application is concerned.  SQL Server and most DBMS would provide, not just data retrieval capabilities but also out of box concurrent user transaction handling as well as embedded security and 30 years of compiler optimizations.  But perhaps you don't need all that.

I have never set up an SQL database.
Well if you decide to do that, look at it as an opportunity to complete your toolset with something out of the data layer.

Can I make the first item in the DB the count? or is that unnecessary, since they are indexed.
Size = Highest Index+1;

Yes.  You have a variety of persistent incremental features when using a table.  You may use fr instance the IDENTITY feature to create and store that increment with each new line entered in the table.

Hope this helps.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial