Solved

Data mining bot DB upload suggestions? anyone?

Posted on 2013-06-29
10
350 Views
Last Modified: 2016-02-11
Hi
My data mining bot is finished. It is staggeringly cunning, if I may say.
All I need to do is have a method to save and retrieve large data.
I expect to collect thousands of email addresses to sort. Is text not considered large? I have a Godaddy web-space.
For a simple Java data structure
like  
struct emailsNode {
int addressCount;
String[] addresses; //large string[] array
}
Must I use SQL? This is all in .java, not PHP. I doubt a simple .txt file in my web-space root directory is good
?
Is there an idiot proof page of SQL data upload and download?
Is it just "create an SQL database in my godaddy panel and paste in the upload-code?"
thoughts?
Thanks
0
Comment
Question by:beavoid
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 3
  • +1
10 Comments
 
LVL 12

Expert Comment

by:mwochnick
ID: 39287641
A simple database and table should be fine.  If you have a 1 GB database and you assume 1kB per email address - which is seems large -
you would be able to store approximately 1 million email addresses

plus with a SQL solution you'd have the ability to use sql for querying, sorting, etc.
0
 

Author Comment

by:beavoid
ID: 39287654
Maybe, since I don't mind how the email addresses are stored, I could have 2 databases, or 3 and just pick them off randomly, when needed? - meaning 3 million addresses?

How do I upload the addresses data to a database, and then query a field, in Java? Since it is just simple string arrays, isn't it trivial? What is a good walk-thru page?

Should I download the entire string array each time and add a new address to the end of the array and re-upload the data, or do they have an append type function?
Thanks
0
 

Author Comment

by:beavoid
ID: 39287656
My Godaddy Panel says I have used 7GB out of 250GB
So, I suspect I don't have to be too concerned about ceilings.
0
Edgartown IT Case Study

Learn about Edgartown's quest to ensure the safety and security of the entire town's employee and citizen data. Read the case study!

 
LVL 12

Assisted Solution

by:mwochnick
mwochnick earned 333 total points
ID: 39287667
insert a row for each address
each email address is its own row in the database table
do a simple search and update each time an email address is captured

here are some tutorials on using JDBC to write records to a database
http://alvinalexander.com/java/edu/pj/jdbc/jdbc0002
http://www.tutorialspoint.com/jdbc/jdbc-insert-records.htm
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 39287971
If you have a 1 GB database and you assume 1kB per email address - which is seems large -
It certainly does seem large ;). I would have thought an average of 32 bytes per address, not 1000.

Of course, you might or might not have additional info along with the address
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 39287984
If you have a 1 GB database and you assume 1kB per email address

From BOL:
http://msdn.microsoft.com/en-us/library/ms176089.aspx

char [ ( n ) ]
Fixed-length, non-Unicode string data. n defines the string length and must be a value from 1 through 8,000. The storage size is n bytes. The ISO synonym for char is character.
varchar [ ( n | max ) ]
Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are char varying or character varying.
0
 
LVL 23

Assisted Solution

by:Racim BOUDJAKDJI
Racim BOUDJAKDJI earned 167 total points
ID: 39287986
In SQL Server,there is already a ETL program attached to SQL Server called SSIS known as SQL Intergration Services to parse data back and forth from a database.  You should use this tool as opposed to redevelop an entire set of functionnalities that already exist and work very well.  SSIS can load up to 2TB of text based data within an hour.

More info on SSIS

http://technet.microsoft.com/en-us/library/ms141026.aspx
http://en.wikipedia.org/wiki/SQL_Server_Integration_Services
0
 

Author Comment

by:beavoid
ID: 39290384
I briefly considered just putting all the addresses in a public static Java class big, static String array,
and calling on that class's data member when needed, to get an email address. .

String email = giantEmailListClass.emails[0];

SQL is probably more reliable?
I have never set up an SQL database.
It is simple, just a giant array of indexed Strings.
So I go to Godaddy and add a new DB in my webspace and bingo? Do I have to specify its data structure there?
My database is just a giant list of index, String
Can I make the first item in the DB the count? or is that unnecessary, since they are indexed.
Size = Highest Index+1;

Thanks
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 39290411
SQL is probably more reliable
You need to define what you mean by reliable as far as your application is concerned.  SQL Server and most DBMS would provide, not just data retrieval capabilities but also out of box concurrent user transaction handling as well as embedded security and 30 years of compiler optimizations.  But perhaps you don't need all that.

I have never set up an SQL database.
Well if you decide to do that, look at it as an opportunity to complete your toolset with something out of the data layer.

Can I make the first item in the DB the count? or is that unnecessary, since they are indexed.
Size = Highest Index+1;

Yes.  You have a variety of persistent incremental features when using a table.  You may use fr instance the IDENTITY feature to create and store that increment with each new line entered in the table.

Hope this helps.
0
 
LVL 12

Accepted Solution

by:
mwochnick earned 333 total points
ID: 39290576
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows the steps required to install WordPress on Azure. Web Apps, Mobile Apps, API Apps, or Functions, in Azure all these run in an App Service plan. WordPress is no exception and requires an App Service Plan and Database to install
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question