Data mining bot DB upload suggestions? anyone?

Hi
My data mining bot is finished. It is staggeringly cunning, if I may say.
All I need to do is have a method to save and retrieve large data.
I expect to collect thousands of email addresses to sort. Is text not considered large? I have a Godaddy web-space.
For a simple Java data structure
like  
struct emailsNode {
int addressCount;
String[] addresses; //large string[] array
}
Must I use SQL? This is all in .java, not PHP. I doubt a simple .txt file in my web-space root directory is good
?
Is there an idiot proof page of SQL data upload and download?
Is it just "create an SQL database in my godaddy panel and paste in the upload-code?"
thoughts?
Thanks
beavoidAsked:
Who is Participating?
 
mwochnickCommented:
A simple database and table should be fine.  If you have a 1 GB database and you assume 1kB per email address - which is seems large -
you would be able to store approximately 1 million email addresses

plus with a SQL solution you'd have the ability to use sql for querying, sorting, etc.
0
 
beavoidAuthor Commented:
Maybe, since I don't mind how the email addresses are stored, I could have 2 databases, or 3 and just pick them off randomly, when needed? - meaning 3 million addresses?

How do I upload the addresses data to a database, and then query a field, in Java? Since it is just simple string arrays, isn't it trivial? What is a good walk-thru page?

Should I download the entire string array each time and add a new address to the end of the array and re-upload the data, or do they have an append type function?
Thanks
0
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

 
beavoidAuthor Commented:
My Godaddy Panel says I have used 7GB out of 250GB
So, I suspect I don't have to be too concerned about ceilings.
0
 
mwochnickConnect With a Mentor Commented:
insert a row for each address
each email address is its own row in the database table
do a simple search and update each time an email address is captured

here are some tutorials on using JDBC to write records to a database
http://alvinalexander.com/java/edu/pj/jdbc/jdbc0002
http://www.tutorialspoint.com/jdbc/jdbc-insert-records.htm
0
 
CEHJCommented:
If you have a 1 GB database and you assume 1kB per email address - which is seems large -
It certainly does seem large ;). I would have thought an average of 32 bytes per address, not 1000.

Of course, you might or might not have additional info along with the address
0
 
Racim BOUDJAKDJIDatabase Architect - Dba - Data ScientistCommented:
If you have a 1 GB database and you assume 1kB per email address

From BOL:
http://msdn.microsoft.com/en-us/library/ms176089.aspx

char [ ( n ) ]
Fixed-length, non-Unicode string data. n defines the string length and must be a value from 1 through 8,000. The storage size is n bytes. The ISO synonym for char is character.
varchar [ ( n | max ) ]
Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are char varying or character varying.
0
 
Racim BOUDJAKDJIConnect With a Mentor Database Architect - Dba - Data ScientistCommented:
In SQL Server,there is already a ETL program attached to SQL Server called SSIS known as SQL Intergration Services to parse data back and forth from a database.  You should use this tool as opposed to redevelop an entire set of functionnalities that already exist and work very well.  SSIS can load up to 2TB of text based data within an hour.

More info on SSIS

http://technet.microsoft.com/en-us/library/ms141026.aspx
http://en.wikipedia.org/wiki/SQL_Server_Integration_Services
0
 
beavoidAuthor Commented:
I briefly considered just putting all the addresses in a public static Java class big, static String array,
and calling on that class's data member when needed, to get an email address. .

String email = giantEmailListClass.emails[0];

SQL is probably more reliable?
I have never set up an SQL database.
It is simple, just a giant array of indexed Strings.
So I go to Godaddy and add a new DB in my webspace and bingo? Do I have to specify its data structure there?
My database is just a giant list of index, String
Can I make the first item in the DB the count? or is that unnecessary, since they are indexed.
Size = Highest Index+1;

Thanks
0
 
Racim BOUDJAKDJIDatabase Architect - Dba - Data ScientistCommented:
SQL is probably more reliable
You need to define what you mean by reliable as far as your application is concerned.  SQL Server and most DBMS would provide, not just data retrieval capabilities but also out of box concurrent user transaction handling as well as embedded security and 30 years of compiler optimizations.  But perhaps you don't need all that.

I have never set up an SQL database.
Well if you decide to do that, look at it as an opportunity to complete your toolset with something out of the data layer.

Can I make the first item in the DB the count? or is that unnecessary, since they are indexed.
Size = Highest Index+1;

Yes.  You have a variety of persistent incremental features when using a table.  You may use fr instance the IDENTITY feature to create and store that increment with each new line entered in the table.

Hope this helps.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.