Solved

Data mining bot DB upload suggestions? anyone?

Posted on 2013-06-29
10
328 Views
Last Modified: 2016-02-11
Hi
My data mining bot is finished. It is staggeringly cunning, if I may say.
All I need to do is have a method to save and retrieve large data.
I expect to collect thousands of email addresses to sort. Is text not considered large? I have a Godaddy web-space.
For a simple Java data structure
like  
struct emailsNode {
int addressCount;
String[] addresses; //large string[] array
}
Must I use SQL? This is all in .java, not PHP. I doubt a simple .txt file in my web-space root directory is good
?
Is there an idiot proof page of SQL data upload and download?
Is it just "create an SQL database in my godaddy panel and paste in the upload-code?"
thoughts?
Thanks
0
Comment
Question by:beavoid
  • 3
  • 3
  • 3
  • +1
10 Comments
 
LVL 12

Expert Comment

by:mwochnick
ID: 39287641
A simple database and table should be fine.  If you have a 1 GB database and you assume 1kB per email address - which is seems large -
you would be able to store approximately 1 million email addresses

plus with a SQL solution you'd have the ability to use sql for querying, sorting, etc.
0
 

Author Comment

by:beavoid
ID: 39287654
Maybe, since I don't mind how the email addresses are stored, I could have 2 databases, or 3 and just pick them off randomly, when needed? - meaning 3 million addresses?

How do I upload the addresses data to a database, and then query a field, in Java? Since it is just simple string arrays, isn't it trivial? What is a good walk-thru page?

Should I download the entire string array each time and add a new address to the end of the array and re-upload the data, or do they have an append type function?
Thanks
0
 

Author Comment

by:beavoid
ID: 39287656
My Godaddy Panel says I have used 7GB out of 250GB
So, I suspect I don't have to be too concerned about ceilings.
0
 
LVL 12

Assisted Solution

by:mwochnick
mwochnick earned 333 total points
ID: 39287667
insert a row for each address
each email address is its own row in the database table
do a simple search and update each time an email address is captured

here are some tutorials on using JDBC to write records to a database
http://alvinalexander.com/java/edu/pj/jdbc/jdbc0002
http://www.tutorialspoint.com/jdbc/jdbc-insert-records.htm
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39287971
If you have a 1 GB database and you assume 1kB per email address - which is seems large -
It certainly does seem large ;). I would have thought an average of 32 bytes per address, not 1000.

Of course, you might or might not have additional info along with the address
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 39287984
If you have a 1 GB database and you assume 1kB per email address

From BOL:
http://msdn.microsoft.com/en-us/library/ms176089.aspx

char [ ( n ) ]
Fixed-length, non-Unicode string data. n defines the string length and must be a value from 1 through 8,000. The storage size is n bytes. The ISO synonym for char is character.
varchar [ ( n | max ) ]
Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are char varying or character varying.
0
 
LVL 23

Assisted Solution

by:Racim BOUDJAKDJI
Racim BOUDJAKDJI earned 167 total points
ID: 39287986
In SQL Server,there is already a ETL program attached to SQL Server called SSIS known as SQL Intergration Services to parse data back and forth from a database.  You should use this tool as opposed to redevelop an entire set of functionnalities that already exist and work very well.  SSIS can load up to 2TB of text based data within an hour.

More info on SSIS

http://technet.microsoft.com/en-us/library/ms141026.aspx
http://en.wikipedia.org/wiki/SQL_Server_Integration_Services
0
 

Author Comment

by:beavoid
ID: 39290384
I briefly considered just putting all the addresses in a public static Java class big, static String array,
and calling on that class's data member when needed, to get an email address. .

String email = giantEmailListClass.emails[0];

SQL is probably more reliable?
I have never set up an SQL database.
It is simple, just a giant array of indexed Strings.
So I go to Godaddy and add a new DB in my webspace and bingo? Do I have to specify its data structure there?
My database is just a giant list of index, String
Can I make the first item in the DB the count? or is that unnecessary, since they are indexed.
Size = Highest Index+1;

Thanks
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 39290411
SQL is probably more reliable
You need to define what you mean by reliable as far as your application is concerned.  SQL Server and most DBMS would provide, not just data retrieval capabilities but also out of box concurrent user transaction handling as well as embedded security and 30 years of compiler optimizations.  But perhaps you don't need all that.

I have never set up an SQL database.
Well if you decide to do that, look at it as an opportunity to complete your toolset with something out of the data layer.

Can I make the first item in the DB the count? or is that unnecessary, since they are indexed.
Size = Highest Index+1;

Yes.  You have a variety of persistent incremental features when using a table.  You may use fr instance the IDENTITY feature to create and store that increment with each new line entered in the table.

Hope this helps.
0
 
LVL 12

Accepted Solution

by:
mwochnick earned 333 total points
ID: 39290576
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now