Solved

Data mining bot DB upload suggestions? anyone?

Posted on 2013-06-29
10
331 Views
Last Modified: 2016-02-11
Hi
My data mining bot is finished. It is staggeringly cunning, if I may say.
All I need to do is have a method to save and retrieve large data.
I expect to collect thousands of email addresses to sort. Is text not considered large? I have a Godaddy web-space.
For a simple Java data structure
like  
struct emailsNode {
int addressCount;
String[] addresses; //large string[] array
}
Must I use SQL? This is all in .java, not PHP. I doubt a simple .txt file in my web-space root directory is good
?
Is there an idiot proof page of SQL data upload and download?
Is it just "create an SQL database in my godaddy panel and paste in the upload-code?"
thoughts?
Thanks
0
Comment
Question by:beavoid
  • 3
  • 3
  • 3
  • +1
10 Comments
 
LVL 12

Expert Comment

by:mwochnick
ID: 39287641
A simple database and table should be fine.  If you have a 1 GB database and you assume 1kB per email address - which is seems large -
you would be able to store approximately 1 million email addresses

plus with a SQL solution you'd have the ability to use sql for querying, sorting, etc.
0
 

Author Comment

by:beavoid
ID: 39287654
Maybe, since I don't mind how the email addresses are stored, I could have 2 databases, or 3 and just pick them off randomly, when needed? - meaning 3 million addresses?

How do I upload the addresses data to a database, and then query a field, in Java? Since it is just simple string arrays, isn't it trivial? What is a good walk-thru page?

Should I download the entire string array each time and add a new address to the end of the array and re-upload the data, or do they have an append type function?
Thanks
0
 

Author Comment

by:beavoid
ID: 39287656
My Godaddy Panel says I have used 7GB out of 250GB
So, I suspect I don't have to be too concerned about ceilings.
0
 
LVL 12

Assisted Solution

by:mwochnick
mwochnick earned 333 total points
ID: 39287667
insert a row for each address
each email address is its own row in the database table
do a simple search and update each time an email address is captured

here are some tutorials on using JDBC to write records to a database
http://alvinalexander.com/java/edu/pj/jdbc/jdbc0002
http://www.tutorialspoint.com/jdbc/jdbc-insert-records.htm
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39287971
If you have a 1 GB database and you assume 1kB per email address - which is seems large -
It certainly does seem large ;). I would have thought an average of 32 bytes per address, not 1000.

Of course, you might or might not have additional info along with the address
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 39287984
If you have a 1 GB database and you assume 1kB per email address

From BOL:
http://msdn.microsoft.com/en-us/library/ms176089.aspx

char [ ( n ) ]
Fixed-length, non-Unicode string data. n defines the string length and must be a value from 1 through 8,000. The storage size is n bytes. The ISO synonym for char is character.
varchar [ ( n | max ) ]
Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are char varying or character varying.
0
 
LVL 23

Assisted Solution

by:Racim BOUDJAKDJI
Racim BOUDJAKDJI earned 167 total points
ID: 39287986
In SQL Server,there is already a ETL program attached to SQL Server called SSIS known as SQL Intergration Services to parse data back and forth from a database.  You should use this tool as opposed to redevelop an entire set of functionnalities that already exist and work very well.  SSIS can load up to 2TB of text based data within an hour.

More info on SSIS

http://technet.microsoft.com/en-us/library/ms141026.aspx
http://en.wikipedia.org/wiki/SQL_Server_Integration_Services
0
 

Author Comment

by:beavoid
ID: 39290384
I briefly considered just putting all the addresses in a public static Java class big, static String array,
and calling on that class's data member when needed, to get an email address. .

String email = giantEmailListClass.emails[0];

SQL is probably more reliable?
I have never set up an SQL database.
It is simple, just a giant array of indexed Strings.
So I go to Godaddy and add a new DB in my webspace and bingo? Do I have to specify its data structure there?
My database is just a giant list of index, String
Can I make the first item in the DB the count? or is that unnecessary, since they are indexed.
Size = Highest Index+1;

Thanks
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 39290411
SQL is probably more reliable
You need to define what you mean by reliable as far as your application is concerned.  SQL Server and most DBMS would provide, not just data retrieval capabilities but also out of box concurrent user transaction handling as well as embedded security and 30 years of compiler optimizations.  But perhaps you don't need all that.

I have never set up an SQL database.
Well if you decide to do that, look at it as an opportunity to complete your toolset with something out of the data layer.

Can I make the first item in the DB the count? or is that unnecessary, since they are indexed.
Size = Highest Index+1;

Yes.  You have a variety of persistent incremental features when using a table.  You may use fr instance the IDENTITY feature to create and store that increment with each new line entered in the table.

Hope this helps.
0
 
LVL 12

Accepted Solution

by:
mwochnick earned 333 total points
ID: 39290576
0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.

919 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now