Solved

Read a large file line by line into a map

Posted on 2012-04-03
7
863 Views
Last Modified: 2012-06-21
Hi

I have a large file, that cannot be loaded into the memory all at once.
Say the file has the following format

A   1
A   2
B   2
A   4
B   4
A   8
B   8
A   4
I want to read the file line by line, and dynamically create a "map?" in C++ such that

map user, count

where user = A, B ....
Count = a list of sets? (not sure what the best data structure is) to hold
 {{1,2,4} {2,4,8} {4,8,4} }
basically the flow is

1) read file line by line
2) initialize the key of some map by the first column of first row (user) example A
3) take the value of this and add it into a list? so it would be {{1}}
4) look at the next line.The map would now contain

map ={
key:A value{{1,2}}
}


5) look at third line
map={
key=A value :{{1,2}}
key =B value :{{2}}
}

6) 4th line

map={
key=A value :{{1,2,4}}
key =B value :{{2}}
}


Now that there are 3 elements are reached {1,2,4} for A, i want to create a new map with this triplet as the key

map2 = {
key = {1,2,4} and value = 1}

5th line =

map={
key=A value :{{1,2,4}}
key =B value :{{2,9}}
}

and 6th line we find the 4th value for A so

map={
key=A value :{{1,2,4} {2,4,8 }
key =B value :{{2,4}}
}

and also create a new entry in map2 for {2,4,8}

map2 = {
key = {1,2,4} and value = 1
key = {2,4,8} and value = 1
}
for the next line

map={
key=A value :{{1,2,4} {2,4,8 }
key =B value :{{2,4, 8}}
}

so we increment map2

map2 = {
key = {1,2,4} and value = 1
key = {2,4,8} and value = 2
}

After i finish reading the file, i want to be able to read map2 and pullout the triplet key with the maximum value

how do i do this in c++, is reading tab delimited entries simple?
0
Comment
Question by:Vlearns
7 Comments
 

Author Comment

by:Vlearns
Comment Utility
i want to use ifstream
0
 
LVL 53

Accepted Solution

by:
Infinity08 earned 167 total points
Comment Utility
>> how do i do this in c++, is reading tab delimited entries simple?

It's not hard ;)

You can use getline to read one line of input from the file (and doing so in a loop until the end of the file is reached will read all lines).

        http://www.cplusplus.com/reference/string/getline/

That will give you the line in a std::string, which you can then parse to get whatever you need from it.

If a line contains two tab-separated items, you could either use a substr-based approach (look for the tab, and then take two substrings - one left and one right of the tab),

        http://www.cplusplus.com/reference/string/string/find/
        http://www.cplusplus.com/reference/string/string/find_first_of/
        http://www.cplusplus.com/reference/string/string/substr/

or use a stringstream, and simply "stream" the two items out of the string.

        http://www.cplusplus.com/reference/iostream/istream/operator%3E%3E/
0
 
LVL 32

Expert Comment

by:sarabande
Comment Utility
for the container you could use std::map<std::string, std::set<int> >.

if you have parsed key and value from line you would add that pair by

mymap[key].insert(value);

Open in new window


the mymap[key] would return a reference to a set<int> which is either empty when the key was new or was the set already stored to the key. the insert would add the value to the set unless it already contains it.

Sara
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 32

Expert Comment

by:sarabande
Comment Utility
if the file is too large for loading it into memory, it also could give memory problems with the map unless there are many duplicates for keys or for key-value pairs.

did you try to load the whole text file into memory? if yes, did you use one big char buffer for all or did you use a dynamic container like std::vector<std::string> where you added line by line? for the second container you should be able to allocate much more memory cause it allocates memory in (small) chunks and doesn't need one huge contiguous free memory space.

Sara
0
 
LVL 22

Expert Comment

by:ambience
Comment Utility
>> After i finish reading the file, i want to be able to read map2 and pullout the triplet key with the maximum value

I dont think you need a map for that, that is just a simple search for the maximum and given the nature of your algorithm you should be able to do it easily.

1) I suggest to use std::tuple<> if your STL supports it. That would save you from have to write a lot of basic operators. See the example here: http://msdn.microsoft.com/en-us/library/bb982837.aspx

2) Looking at the data and your approach - the way I understood the problem

A   1
A   2
A   4
A   8
A   4

You want to find the max contiguous triplet 4,8,4 from the sequence. This should be done this way

// pseudocode

typedef std::tuple<int, int, int> triplet_t;

triple_t tmax = make_tuple(1,2,4)
while(!eof)
{
      triplet_t temp = make_tuple(get<1>(tmax), get<2>(tmax), new value for A);
      if( temp > tmax)
         tmax = temp;
}

Of course this is for just one user 'A', and you'd need a map but just one map.
0
 
LVL 22

Assisted Solution

by:ambience
ambience earned 166 total points
Comment Utility
I think I missed one point. You havent defined the comparison rules for triplets, like which is greater 4,8,4 or 8,1,1. But - then all you need is define the comparison operators like

bool operator<(const triplet_t& t1, const triplet_t& t2) {
 ...
}

Just in case, if theres a conflict with std::tuple then just define a replica or even do it manually like

int compare(const triplet_t& t1, const triplet_t& t2) {
 ... return -1 if <, 0 for equal and 1 for t1 > t2
}

triple_t tmax = make_tuple(1,2,4)
while(!eof)
{
      triplet_t temp = make_tuple(get<1>(tmax), get<2>(tmax), new value for A);
      if( compare(temp, tmax) < 0 )
         tmax = temp;
}
0
 
LVL 32

Assisted Solution

by:sarabande
sarabande earned 167 total points
Comment Utility
sorry for my wrong suggestion using std::set. i didn't understand the requirements.

i think now the triplet is the key and the required result is the triplet with the maximum count for different keys.

if so, it could not be made with one map if the same triple sequence should not be counted twice for a key of the first map.

i would suggest std::map<std::string, std::string > for map1 and std::map<std::string, int> for map2 . don't think that we need a vector of triples for first map cause by storing the sequence as comma separated list of integers we could use std::find to check whether the next triple for a key already had a count.

for the sample sequence we would have

1) map1:  "A"   ",1,"
2) map1:  "A"   ",1,2,"
3) map1:  "A"   ",1,2,"
                 "B"   ",2,"
4) map1:  "A"   ",1,2,4,"
                 "B"   ",2,"
    map2:  "1,2,4"    1
5) map1:  "A"   ",1,2,4,"
                 "B"   ",2,4,"
    map2:  "1,2,4"    1
6) map1:  "A"   ",1,2,4,8,"
                 "B"   ",2,4,"
    map2:  "1,2,4"    1  
                 "2,4,8"    1
7) map1:  "A"   ",1,2,4,8,"
                 "B"   ",2,4,8,"
    map2:  "1,2,4"    1  
                 "2,4,8"    2
8) map1:  "A"   ",1,2,4,8,4,"
                 "B"   ",2,4,8,"
    map2:  "1,2,4"    1  
                 "2,4,8"    2
                 "4,8,4"    1

we would parse a line end extract key and value both as strings.

then we could get a reference of a map entry of map1 by

std::string & seq = map1[key];

Open in new window


the 'seq' string is empty for new keys and would contain a comma separated sequence of value(s) if the key already existed in map1.

you now would check whether the seq already contains at least two values. you could do that by calling seq.rfind(',', pos) up to 3 times where pos starts with seq.length()-1 and then is pos-1. of course you need to break whenever you reached the begin of the string before you counted 3 commas. if you found two values you extract a temporary substring from seq and build a a temporary triplet string with 3 values where the last is the new value (for example ",1,2,4,"). then you could check whether the current seq already contains the new triplet by calling seq.find(triplet). if the result is std::string::npos you have a new triplet for the key.
 
in any case you would now add value + comma to a non-empty seq string or comma + value + comma to an empty seq. as seq is a reference to a string of a map1 entry  you would already edit in the map and don't have to store a new value.

map2 only would be filled in case of a new triplet. that would be done by stripping off the leading and trailing commas and do

if ((m = map2[triplet]++) > lastmax)
{
     lastmax = m;
     maxtriplet = triplet;
}

Open in new window


again we would use that std::map::operator[] returns a reference to an already allocated and initialized value, which would be 0 if the triplet didn't exist and is the count so far if the triplet already existed. the operator++ then would increment the map entry value.

finally the maximum count is in lastmax and the maxtriplet would contain the associated triplet.

Sara
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use while-loops in the C programming language.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now