Solved

what is the best program to store huge amount of data

Posted on 2009-07-07
11
237 Views
Last Modified: 2012-05-07
I have to store in excess of 10 million lines of data.

Notepad doesnt support files this size, niether does MS word.

What could I use that would accept all this data, allow perl to read and write to the file and adjust the data as necesary

(other than shorten the file to chunks)
0
Comment
Question by:MichaelGlancy
  • 3
  • 2
  • 2
  • +4
11 Comments
 
LVL 7

Expert Comment

by:rcflyr
ID: 24799328
SQL Database?
0
 
LVL 48

Expert Comment

by:Tintin
ID: 24799363
What sort of data?
What do you want to do with the data?
What type of manipulation/calculations do you want to do on the data?
What type of formatting/reporting do you require from the data?
What type of editing do you need to do to the data?
0
 
LVL 84

Expert Comment

by:ozo
ID: 24799369
perl should be able to read and write a line at a time without needing any other program to support the file.
what do you need Notepad or MS word for?
0
 
LVL 40

Assisted Solution

by:mrjoltcola
mrjoltcola earned 100 total points
ID: 24799621
Typically you will want an efficient random access to the data, so you will need a storage that uses indexing, not a plain flat file.

Besides the suggested relational database, you can also use Berkeley DB (Sleepycat) or GNU DBM http://www.gnu.org/software/gdbm

Even though Oracle now owns Sleepycat, it is still free and open source, depending on your uses of it.

http://www.oracle.com/database/berkeley-db/index.html

It will handle 10 million indexed records easily.

0
 
LVL 7

Expert Comment

by:Fairlight2cx
ID: 24800299
Assuming one was required to still go with a flatfile for some reason, my bet would be on gvim being up to the task.  http://www.vim.org
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 48

Expert Comment

by:Tintin
ID: 24800394
We're all making wild guesses at this stage until we find out more information.
0
 

Author Comment

by:MichaelGlancy
ID: 24801445
What sort of data?
a group of between seven and ten numbers (value between 1 and 999, randomly generated)

What do you want to do with the data?
I have to set the data out in a readable form, and then examine it for patterns. So writing the data, each group to a line on the text file has done be well so far.

What type of manipulation/calculations do you want to do on the data?
The calculations are relatively simple. Just examination of the data line by line and how and when it occurs. I will want to delete a lot of it eventually down to about 1 million records, I think.

what do you need Notepad or MS word for?
I have been using notepad so far as a text file reader (im using windows). They are not coping now with the vast amount of data I have. I have reached the maximum number of pages allowed in Windows.

Is this any clearer ? The manipulation of the data is relatively very simple - no more than simple statistics really.

You have suggested a couple of database apps.
1. I assume because you have suggested them on here, that PERL can interact with them easily
2. Remembering I am a complete (but keen) beginner, i need the simplest option

Now, what do you think ?

Thank you
0
 
LVL 7

Expert Comment

by:Fairlight2cx
ID: 24801516
You haven't reached the number of pages allowed in Windows, you've reached the maximum sizes allowed by Notepad and Wordpad.  For viewing (and even editing) purposes, try gvim at http://www.vim.org/ and you should have no problems.  (You'd be surprised just how big a file I've dragged into vim.)

If you go with a database solution, you could define ten fields and just have them NULLable.  I don't know about anyone else here, but I think even MySQL is overkill for this.  Especially if you want the simplest solution.  That's an awful lot of infrastructure to prop up the simple manipulations you indicate.

I'd probably go with File_GDBM and tie() to a hash file.  I'd store each row with a sequential number (ie., line number) as the key, and store the numbers joined with something like \001 as the value.  It makes accessing your rows quite easy, and you can just split the results into an array at will and manipulate them as necessary.

Even if you want to delete whole rows, it's not a big deal, and you could even renumber entries to cut out missing rows when you re-save the information.  That renumbering can be done any one of several ways.  There are a bunch of ways to skin that cat--although you want to do it row by row to be memory conscious, with that much data in the mix.

My vote is a tied hash.  The downside to this is that -no- text editor will be even remoetly useful for you in this scenario unless you write a tiny export routine (and I do mean -tiny-...probably under 5 lines).
0
 

Author Comment

by:MichaelGlancy
ID: 24801589
Ok I am going to try the alternatives suggested here
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 400 total points
ID: 24804630
If you don't want the overhead from MySQL, or some other DBMS, you could use sqlite.  The perl installation is extremely simple - install the DBD::sqlite module (CPAN or ppm).

Or, if you just want a flat file, you could use a tied array.  You access each line in the file as though it is an element in the array.  Changing the array changes the file.
0
 

Author Closing Comment

by:MichaelGlancy
ID: 31600880
Im just going to close this question as I have chosen the sqlite module and now need to install it and learn how to use it.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now