Solved

remove duplicates

Posted on 2007-11-18
59
452 Views
Last Modified: 2012-08-13
Hello there,
I have over a million unique ids in (posts_ids.txt) each line by line.. I have a new file called (newposts_ids.txt) that has over 50 thousand ids each line by line too.. If I combine both files and remove dups then I wont be able to know which ones are the new ids.. so Is there anything that can scan the (newposts_ids.txt) and remove dups regarding to the (post_ids.txt) file?
0
Comment
Question by:Gonzales2009
  • 15
  • 13
  • 11
  • +3
59 Comments
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Read them into a std::set and the duplicates will be removed automatically, see http://www.sgi.com/tech/stl/set.html
0
 
LVL 40

Expert Comment

by:evilrix
Comment Utility
If this is a one off task and you aren't looking to develop this code for any other purpose you can knock something up in Perl -- it's be about a 5 line script :) If you'd be happy with that and jkr doesn't mind moving this to a Perl Q I'll be happy to help you out in that respect; otherwise, this is a C++ Q so I'm not sure it'd be appropriate for me to post Perl code here!

-Rx.
0
 
LVL 17

Expert Comment

by:rstaveley
Comment Utility
sort < posts_id.txt | uniq > output.txt
0
 
LVL 17

Expert Comment

by:rstaveley
Comment Utility
Sorry I should have read the Q properly.
0
 

Author Comment

by:Gonzales2009
Comment Utility
hello evilrix, how long will it take to scan the file and remove dups? the reason why I selected c++ is because its a really fast language! anyways Im running windows xp if you can tell me how to do it.. I can compare both and ill decide which one is better/faster and if I accept your answer then well move it into perl.. thanks
0
 

Author Comment

by:Gonzales2009
Comment Utility
this is the code that i am using, but it only remove duplicates from one file and exports into other file!!
is it possible to edit this so it can do what im looking for?
#include <stdio.h>

#include <stdlib.h>

#include <string.h>

//FUNCTIONS

int check_string(char *string2);

void additem(char passed[501]);

//STRUCTS

struct dupe

    {

    char string[500];

    struct dupe *next;

} *mylist;
 

struct dupe *myptr;

struct dupe *tmp, *prev;
 

    int main() {

    FILE *fInput;

    FILE *fOutput;

    char InputText[500];

    char FileInput[256], FileOutput[256];

    unsigned long int RemovedDupes = 0;

    printf("removes duplicated lines from text files\n");

    printf("dupe remover\n------------------------\n\n");

    mylist = NULL;

    printf("enter input file: ");

    fgets(FileInput, sizeof(FileInput), stdin);

    FileInput[strlen(FileInput)-1] = 0;

    fInput = fopen(FileInput,"r");

    
 
 

        if (fInput == NULL) {

        	 printf(" *) Could not open %s for reading!\n", FileInput);

        	 system("PAUSE"); exit(0);

    }
 

    printf("enter output file: ");

    fgets(FileOutput, sizeof(FileOutput), stdin);

    FileOutput[strlen(FileOutput)-1] = 0;

    

    fOutput = fopen(FileOutput,"w");

    
 
 

        if (fOutput == NULL) {

        	 printf(" *) Could not open %s for writing!\n", FileOutput);

        	 system("PAUSE"); exit(0);

    }
 

    printf(" *) Successfully opened %s\n", FileInput);

    printf(" *) Filtering for duplicates...\n\n");

    additem("_null_");
 
 

        while (fgets(InputText, sizeof InputText, fInput)) {

        	 InputText[strlen(InputText)-1] = '\0';

        	 
 
 

            	 switch (check_string(InputText)) {

            	 case 0: 

            	 additem(InputText);

            		 break;

            	 case 1:

            		 ++RemovedDupes;

            		 break;

            	 }

            	

        }
 

        tmp = prev = mylist;

        while(tmp && tmp->next) { prev = tmp; tmp = tmp->next; }

        prev->next = NULL;

        free(tmp);

        printf(" *) Finished adding to memory, writing to %s...\n", FileOutput);

        

        myptr = mylist;
 
 

            while (myptr) {

            	 fputs(myptr->string,fOutput);

            	 fputs("\n",fOutput);

            	 myptr = myptr->next;

        }
 

        printf(" !) Finished writing! [%d] duplicates were successfully removed.\n\n",RemovedDupes);

        system("PAUSE");

        return 0;

    }
 
 

        void additem(char passed[501]) {

        struct dupe *b;

        b = (struct dupe *)malloc(sizeof(struct dupe));

        if (b == NULL) { printf("Could not allocate any more memory.\n"); exit(0); }

        strcpy(b->string,passed);

        b->next = mylist;

        mylist = b;

    }
 
 

        int check_string(char *string2) {

        	myptr = mylist;
 
 

            	while (myptr) {
 
 

                		if (strcmp(myptr->string,string2) == 0) {

                			return 1;

                		}

                	myptr = myptr->next;

                	}

                	return 0;

            }

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Hm, that can be as simple as

#include <fstream>
#incude <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is("input.txt");
ofstream os("output.txt");
set<string> data;

while(!is.is_eof()) {

  string sLine;

  getline(is.sLine);

  data.insert(sLine);
}

set<string>::iterator i;

for (i = data.begin(); i != data.end(); ++i) {

  os << *i << endl;
}

return 0;
}

0
 

Author Comment

by:Gonzales2009
Comment Utility
there has to be two inputs and one output

input
>posts_ids.txt
>newposts_ids.txt

output
>newposts_nodups_ids.txt)
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Sorry, just change that to

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data;

while(!is1.is_eof()) {

  string sLine;

  getline(is1,sLine);

  data.insert(sLine);
}

while(!is2.is_eof()) {

  string sLine;

  getline(is2,sLine);

  data.insert(sLine);
}
0
 

Author Comment

by:Gonzales2009
Comment Utility
thanks jkr, this is the code that im trying to compile as what you have showed me but its displaying some errors with dev c++ v4
#include <fstream.h>

#incude <iostream.h>

#include <string.h>

#include <set.h>

using namespace std;
 

int main () {
 

ifstream is1("post_ids.txt");

ifstream is2("newpost_ids.txt");

ofstream os("newposts_nodups_ids.txt");

set<string> data;
 

while(!is1.is_eof()) {
 

  string sLine;
 

  getline(is1,sLine);
 

  data.insert(sLine);

}
 

while(!is2.is_eof()) {
 

  string sLine;
 

  getline(is2,sLine);
 

  data.insert(sLine);

}
 

set<string>::iterator i;
 

for (i = data.begin(); i != data.end(); ++i) {
 

  os << *i << endl;

}
 

return 0;

}

Open in new window

0
 

Author Comment

by:Gonzales2009
Comment Utility
using visual c++ I get this


Deleting intermediate files and output files for project 'a - Win32 Debug'.
--------------------Configuration: a - Win32 Debug--------------------
Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\myprojects\a\a.cpp(44) : fatal error C1010: unexpected end of file while looking for precompiled header directive
Error executing cl.exe.

a.exe - 1 error(s), 0 warning(s)
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Make that

#include "StdAfx.h"
#include <fstream>
#incude <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data;

while(!is1.is_eof()) {

  string sLine;

  getline(is1,sLine);

  data.insert(sLine);
}

while(!is2.is_eof()) {

  string sLine;

  getline(is2,sLine);

  data.insert(sLine);
}

set<string>::iterator i;

for (i = data.begin(); i != data.end(); ++i) {

  os << *i << endl;
}

return 0;
}

VC++ needs that file for precompiled eaders, if you don't have one, just provide an empty file with that name.
0
 

Author Comment

by:Gonzales2009
Comment Utility
these are the errors that its showing now..
Deleting intermediate files and output files for project 'a - Win32 Debug'.

--------------------Configuration: a - Win32 Debug--------------------

Compiling...

StdAfx.cpp

Compiling...

a.cpp

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Node' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(15) : error C2039: 'is_eof' : is not a member of 'basic_ifstream<char,struct std::char_traits<char> >'

C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(15) : fatal error C1903: unable to recover from previous error(s); stopping compilation

Error executing cl.exe.
 

a.exe - 2 error(s), 5 warning(s)

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Ooops, sorry,

#include "StdAfx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data;

while(!is1.eof()) {

  string sLine;

  getline(is1,sLine);

  data.insert(sLine);
}

while(!is2.eof()) {

  string sLine;

  getline(is2,sLine);

  data.insert(sLine);
}

set<string>::iterator i;

for (i = data.begin(); i != data.end(); ++i) {

  os << *i << endl;
}

return 0;
}

compiles fine for me.
0
 

Author Comment

by:Gonzales2009
Comment Utility
ok compiles and its runs fine but thats not exactly what im looking for!
The file (newposts_nodups_ids.txt) has all ids, and I need to have only the new ids from (newpost_ids.txt) that arent duplicates

example input
>post_ids.txt
111
222
333

>newpost_ids.txt
111
111abc
222
222abc

example output
>newposts_nodups_ids.txt
111abc
222abc

the software will read (post_ids.txt) then read (newpost_ids.txt) take out dups and put no dups in (newposts_nodups_ids.txt)
0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
In that case (you have taken a look at http://www.sgi.com/tech/stl/set.html , have you) use

//#include "Stddata^1fx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data1;
set<string> data2;
set<string> result;

while(!is1.eof()) {

  string sLine;

  getline(is1,sLine);

  data2.insert(sLine);
}

while(!is2.eof()) {

  string sLine;

  getline(is2,sLine);

  data2.insert(sLine);
}

  set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),
                 inserter(result, result.begin()));
  copy(result.begin(), result.end(), ostream_iterator<const char*>(os, "\n"));
  cout << endl;

return 0;
}
0
 
LVL 17

Expert Comment

by:rstaveley
Comment Utility
I suspect you need to change...

> ostream_iterator<const char*>(os, "\n")

...to...

ostream_iterator<string>(os, "\n")
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
>>>> while(!is1.eof()) {
>>>>  string sLine;
>>>>  getline(is1,sLine);
>>>>  data.insert(sLine);
>>>>> }

You better replace that kind of loop by

   string sLine;
   while (getline(is1, sLine))
          data.insert(sLine);

With the first loop you neither will catch read errors nor prevent from adding an empty string at end of file.

If you only want to store entries of the second file which were not in the first file you can do by

   ...
   string sLine;
   while (getline(is1, sLine))
          data.insert(sLine);

   while (getline(is2, sLine))
   {
         if (data.find(sLine) == data.end())
         {
               os << sLine;
         }
   }
 
Regards, Alex  


0
 

Author Comment

by:Gonzales2009
Comment Utility
I included and edited jkr source with rstaveley and itsmeandnobodyelse
but its showing one error.. the final code comes up to this

Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\myprojects\a\a.cpp(35) : fatal error C1010: unexpected end of file while looking for precompiled header directive
Error executing cl.exe.

a.exe - 1 error(s), 0 warning(s)
#include <fstream>

#include <iostream>

#include <string>

#include <set>

using namespace std;
 

int main () {
 

ifstream is1("post_ids.txt");

ifstream is2("newpost_ids.txt");

ofstream os("newposts_nodups_ids.txt");

set<string> data1;

set<string> data2;

set<string> result;
 

string sLine;

while (getline(is1, sLine))

      data.insert(sLine);
 

while (getline(is2, sLine))

{

     if (data.find(sLine) == data.end())

     {

           os << sLine;

     }

}
 

  set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),

                 inserter(result, result.begin()));

  copy(result.begin(), result.end(), ostream_iterator<string>(os, "\n"));

  cout << endl;
 

return 0;

}

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
>>fatal error C1010: unexpected end of file

Just add

#include "stdafx.h"

as the 1st line of the code - we went through that already.
0
 

Author Comment

by:Gonzales2009
Comment Utility
I did that but it gave more errors
Deleting intermediate files and output files for project 'a - Win32 Debug'.

--------------------Configuration: a - Win32 Debug--------------------

Compiling...

StdAfx.cpp

Compiling...

a.cpp

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Node' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(19) : error C2065: 'data' : undeclared identifier

C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(19) : error C2228: left of '.insert' must have class/struct/union type

C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(23) : error C2228: left of '.find' must have class/struct/union type

C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(23) : error C2228: left of '.end' must have class/struct/union type

C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(29) : error C2065: 'set_difference' : undeclared identifier

c:\program files\microsoft visual studio\vc98\include\iterator(143) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::

basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<

std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(30) : see reference to class template instantiation 'std::insert_iterator<std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<c

har,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > >' being compiled

Error executing cl.exe.
 

a.exe - 5 error(s), 6 warning(s)

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Not surprising, the code should be

set<string> data1;
set<string> data2;
set<string> result;

while(!is1.eof()) {

  string sLine;

  getline(is1,sLine);

  data2.insert(sLine);
}

while(!is2.eof()) {

  string sLine;

  getline(is2,sLine);

  data2.insert(sLine);
}
0
 

Author Comment

by:Gonzales2009
Comment Utility
sorry man but its still showing some errors


#include "stdafx.h"

#include <fstream>

#include <iostream>

#include <string>

#include <set>

using namespace std;
 

int main () {
 

ifstream is1("post_ids.txt");

ifstream is2("newpost_ids.txt");

ofstream os("newposts_nodups_ids.txt");

set<string> data1;

set<string> data2;

set<string> result;
 

while(!is1.eof()) {
 

  string sLine;
 

  getline(is1,sLine);
 

  data2.insert(sLine);

}
 

while(!is2.eof()) {
 

  string sLine;
 

  getline(is2,sLine);
 

  data2.insert(sLine);

}
 

  set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),

                 inserter(result, result.begin()));

  copy(result.begin(), result.end(), ostream_iterator<string>(os, "\n"));

  cout << endl;
 

return 0;

}
 

==================================

Deleting intermediate files and output files for project 'a - Win32 Debug'.

--------------------Configuration: a - Win32 Debug--------------------

Compiling...

StdAfx.cpp

Compiling...

a.cpp

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Node' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas

ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std

::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information

        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:

:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc

ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<

char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled

c:\program files\microsoft visual studio\vc98\include\utility(25) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::ba

sic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<st

d::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information

        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(23) : see reference to class template instantiation 'std::pair<std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_trait

s<char>,std::allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char

>,std::allocator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator,bool>' being compiled

C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(35) : error C2065: 'set_difference' : undeclared identifier

Error executing cl.exe.
 

a.exe - 1 error(s), 6 warning(s)

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
Comment Utility
Well, just one error - add

#include <algorithm>

I.e.

#include "stdafx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
#include <algorithm>
using namespace std;
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
The code is

#pragma warning (disable : 4786)

#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;

int main () {
   
    ifstream is1("post_ids.txt");
    ifstream is2("newpost_ids.txt");
    ofstream os("newposts_nodups_ids.txt");
    set<string> data;
   
    string sLine;
    while (getline(is1, sLine))
        data.insert(sLine);
   
    while (getline(is2, sLine))
    {
        if (data.find(sLine) == data.end())
        {
            os << sLine;
        }
    }
    is1.close();
    is2.close();
    os.close();
   
    return 0;
}


>>>> #pragma warning (disable : 4786)
That disables the warnings which are a bug in VC6

>>>> #include "stdafx.h"
Better switch off 'precompiled headers' in the project settings (C++ - Precompiled Headers). PCH doesn't make sende for non-MFC and non-WINAPI projects.




0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
correction:
        if (data.find(sLine) == data.end())
        {
            os << sLine << endl;   // add a linefeed for each non-duplicate
        }
0
 

Author Comment

by:Gonzales2009
Comment Utility
jkr code is not working, the text file shows as blank..

itsmeandnobodyelse: your code is working its showing the right ids but in this format 111a222a
can you help me make it line by line instead?
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
>>>> while(!is1.eof()) {
>>>>  string sLine;
>>>>  getline(is1, sLine);

As told, the above is bad coding as there is no check on error for the getline.

>>>> set<string> data2;
If the second file has no duplicate entries itself, you don't need to store all entries in a std::set but simply check whether the entry exists in the first set and write to file if it is a new entry.

>>>> set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),
>>>>                 inserter(result, result.begin()));
>>>>  copy(result.begin(), result.end(), ostream_iterator<string>(os, "\n"));

That is some sort of 'overkill' if the second file has no duplicates itself.

Moreover it is wrong, if the second file is not a superset of the first set. Then, the above method would add all entries which are in data1 but not in data2.
0
 

Author Comment

by:Gonzales2009
Comment Utility
wonderful itsmeandnobodyelse
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 

Author Comment

by:Gonzales2009
Comment Utility
this is the final code, just let me know if everything is right as its compiling fine!!
#pragma warning (disable : 4786)

#include "stdafx.h"

#include <fstream>

#include <iostream>

#include <string>

#include <set>

using namespace std;
 

int main () {

   

    ifstream is1("post_ids.txt");

    ifstream is2("newpost_ids.txt");

    ofstream os("newposts_nodups_ids.txt");

    set<string> data;

   

    string sLine;

    while (getline(is1, sLine))

        data.insert(sLine);

   

    while (getline(is2, sLine))

    {

        if (data.find(sLine) == data.end())

        {

            os << sLine << endl;   // add a linefeed for each non-duplicate

        }

    }

    is1.close();

    is2.close();

    os.close();

   

    return 0;

}

Open in new window

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
>>>> showing the right ids but in this format 111a222a
Yes, make the correction I posted in some previous comment
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
The main problem of jkr's code is that the files were not closed.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
>>>> #include "stdafx.h"
As told, your prog doesn't need 'precompiled headers' (PCH). PCH is a concept when using big header files like 'windows.h' (WINAPI) or 'afx.h' (MFC) . Then, compile time can be improved by compiling these headers separately (once). You will find the include statements ifor windows.h and afx.h in the stdafx.h the Wizard has generated for you. In case of your above prog the stdafx.h makes only trouble. You can't include the STL headers in stdafx.h cause template classes cannot be precompiled either. So, it is really best you switch off PCH both for Debug and Release configuration. It will make your life happier.
0
 
LVL 40

Expert Comment

by:evilrix
Comment Utility
As discussed, below is a Perl version.

NB. You will, of course, need to download and install a Perl interpreter: -
http://www.activestate.com/store/download.aspx?prdGUID=81fbce82-6bd5-49bc-a915-08d58c2648ca

I provide this purely for completeness and since it is off topic you should NOT award me with any pointer for this post as that would be unfair to all those who have contributed to the C++ solution.

-Rx.
#!/usr/bin/perl

use strict;
 

open POSTSIDS, "<", "posts_ids.txt" or die;

open NEWPOSTSIDS, "<", "newposts_ids.txt" or die;

open NODUPS, ">", "nodups.txt" or die;
 

my %newposts = map { $_ => undef } <NEWPOSTSIDS>;

while(<POSTSIDS>) { delete $newposts{$_}; }

print NODUPS keys %newposts;
 

close POSTSIDS;

close NEWPOSTSIDS;

close NODUPS;

Open in new window

0
 
LVL 17

Expert Comment

by:rstaveley
Comment Utility
I do like Perl. Clever trick with map too.

I think it is completely OK to present alternative tools for a job in any TA. Perl is a developer's Swiss Army Knife.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
>>>> my %newposts = map { $_ => undef } <NEWPOSTSIDS>;
>>>> while(<POSTSIDS>) { delete $newposts{$_}; }

Assuming the second file is a superset of the first file with only a few new entries, the above method is not very efficient because of the many deletions. But I am pretty sure that with PERL you could read the first file into the map and check for the entries of the second file as well.

>>>> I think it is completely OK to present alternative tools for a job in any TA.
I see two problems with that:

1. As long as there is no accepted solution, the alternative language
    code may confuse the asker more than help him/her.

2. Fans of a alternative language rarely were objective. Sometimes the
    'easiness' of a language doing some job with fewer statements than
    another language was combined with less efficiency or simplifying
    assumptions. E. g. I like the 'or die' in the above PERL script but a
    error message stating that file 2 doesn't exist may be a better solution
    even if it costs two more statements.

I personally do not post in other TA's than C/C++ and if I post in C TA I avoid C++ solutions.

Regards, Alex
0
 
LVL 40

Expert Comment

by:evilrix
Comment Utility
>> the above method is not very efficient
It was done this way as newpostids (50 thousand ids) is significantly smaller than postids (million unique ids) and as such I felt this way would be more memory efficient! I'm not convinced the delete is actually as inefficient as you assume; however, since I don't have the OPs data I cannot test either case so I went with what seemed to be the better solution.

>> but a error message stating that file 2 doesn't exist may be a better solution
Die states the line of the error, which should be clear enough! It can always be changed to...
open POSTSIDS, "<", "posts_ids.txt" or die "opsts_ids.txt cannot be opened"
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
>>>> as newpostids (50 thousand ids) is significantly smaller than postids (million unique ids)
I didn't remember that from the original question.

The question is whether a million of searches in a map of 50k + 50k inserts + (about) 50k deletes is faster than 50k searches on a set of million entries + 1 million of inserts. I will test that with a C++ prog (as unfortunately I never owned a Swiss Army Knife).

>>>> Die states the line of the error, which should be clear enough!
Sorry, but you didn't get my point..
0
 
LVL 40

Expert Comment

by:evilrix
Comment Utility
>> I will test that with a C++ prog
If you really must!

>> Sorry, but you didn't get my point
No, I did I just couldn't be bothered to rise to it!
0
 
LVL 17

Expert Comment

by:rstaveley
Comment Utility
Let's move this thread onto something less controversial like religion or politics, eh?
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
>>>> Let's move this thread onto something less controversial like religion or politics, eh?
Why not into Other\Misc\Somewhat TA?

Comparing philosophies of two programming languages is only interesting for someone who likes both. For others it is only annoying. I never experienced a good discussion when someone posted from solutions from another TA. But that maybe a subjective impression (or better told that may be caused by my comments).
0
 
LVL 40

Expert Comment

by:evilrix
Comment Utility
>> Comparing philosophies of two programming languages is only interesting for someone who likes both
I was not attempting to compare anything. I originally suggested the OP might find a Perl solution easier/quicker if this was a one off admin task -- it is often simpler to just choose the right tool for the right job and this kind of task is something Perl excels at! The OP then asked for a Perl version of the code so I obliged and posted it but made it clear the post was off topic and requested it NOT be included in acceptance of the final answer to ensure that only C++ answers would be accepted -- so everyone who deserves points for contributing to this C++ thread gets them. I do not feel this was unreasonable nor do I see why it is necessary to make such a big deal about it! I was attempting to assist the OP not upset you or anyone else for that matter.
0
 
LVL 39

Accepted Solution

by:
itsmeandnobodyelse earned 125 total points
Comment Utility
>>>> I do not feel this was unreasonable nor do I see why it is
>>>> necessary to make such a big deal about it!

rstaveley liked the PERL solution you posted.

I told that I don't like solutions of another TA before a solution was accepted.

Not more, not less.

The only one who makes it a big deal are you.

>>>> The question is whether a million of searches in a
>>>> map of 50k + 50k inserts + (about) 50k deletes is faster
>>>> than 50k searches on a set of million entries + 1 million
>>>> of inserts. I will test that with a C++ prog
The results were 28 seconds for the solution that makes 1 million of searches and erases the duplicate entries found in the small set and 44 seconds for the solution that searches in the big set and writes only these that were not found. Tested with 1 million of entries in file 1 and 52,500 entries in file 2 where 2,500 were no duplicates.

So the below code is more efficient in C++ (following the approach made in the PERL script).

int main ()
{
    ifstream is1("post_ids.txt");
    ifstream is2("newpost_ids.txt");
    ofstream os("newposts_nodups_ids_2.txt");
    set<string> data;
   
    string sLine;
    while (getline(is2, sLine))
        data.insert(sLine);

    set<string>::iterator f;
    while (getline(is1, sLine))
    {

        if ((f = data.find(sLine)) != data.end())
        {
            data.erase(f);
        }
    }
    for (f = data.begin(); f != data.end(); ++f)
    {
        os << *f << endl;
    }
    is1.close();
    is2.close();
    os.close();
   
    return 0;
}
 


0
 
LVL 40

Expert Comment

by:evilrix
Comment Utility
Muhahaha, the irony -- the accepted answer is the one that was born from my "not very efficient " Perl code :)
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> Muhahaha

Was that an evil laugh ? Now I know where you got your nick from ;)
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
>>>> Muhahaha, the irony
You actually don't get the point again.

My remarks regarding effency was based on the (wrong) assumption that file 2 is a superset of file 1. It nothing had to do with PERL. On the contrary, after recognizing my wrong assumption I adopted the algorithm (from your PERL script), made the tests ("if you really must")  and posted the results.
0
 
LVL 17

Expert Comment

by:rstaveley
Comment Utility
>> Muhahaha
>
> Was that an evil laugh ?

I have *proof* that it is. I accessed http://www.research.att.com/~ttsweb/tts/demo.php to investigate with Firefox (the browser of good guys) and pasted "Muhahaha" into the text to speech box and got a freeze up when I hit the "speak" button, but when I did the same with Internet explorer (the browser of the bad guys), not only did Crystal US English read the file I downloaded, but it sounded not at all like the Dr Evil voice that you'd expect... but instead a benign sounding rendition. Now that really makes me shudder!
0
 

Author Comment

by:Gonzales2009
Comment Utility
i remember back in the day when this site was more user friendly and wasnt all about the money.. too bad it was gone in the wrong way!
0
 
LVL 40

Expert Comment

by:evilrix
Comment Utility
>> You actually don't get the point again.
Actually, I just don't care!

>> Was that an evil laugh  ? Now I know where you got your nick from ;)
By name and by nature! I8 ;-)

>> i remember back in the day when this site was more user friendly and wasnt all about the money
Eh? Money? What money? Can I have some?
0
 

Author Comment

by:Gonzales2009
Comment Utility
would be sick if you didnt have to pay $189.95 to ask questions for two years... anyways is not like they pay the experts bummer
0
 
LVL 17

Expert Comment

by:rstaveley
Comment Utility
Read lots of tongue-in-cheek smilies in all of this, Gonzales2009. None of the commentators here are driven by money. I think the best you can get out of EE is kudos and a tee-shirt and my wife would divorce me, if she saw me wearing the latter. It does get a bit hot under the collar sometimes, but EE always has been like that as long as I've been on it [yikes... perhaps I'm the cause?]. Argument is actually a healthy sign that people care.
0
 
LVL 40

Expert Comment

by:evilrix
Comment Utility
>> None of the commentators here are driven by money
My time is given freely as are (all?) the other experts -- we get paid zilch!

>> my wife would divorce me
Yours too eh ? :)
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
>>>> i remember back in the day when this site was more user friendly

Sorry for spoiling your thread with some unfriendly remarks ...

... but as a result you got a better solution.

I think a controversy between experts is not so bad for the asker, but of course they shouldn't take it personally ...
0
 
LVL 40

Expert Comment

by:evilrix
Comment Utility
>> spoiling your thread with some unfriendly remarks
I'm glad you recognized that! :-$
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> >> None of the commentators here are driven by money
>> My time is given freely as are (all?) the other experts -- we get paid zilch!

Actually, we all get paid ... It's just you that does it for free ... lol ... j/k of course ;)
0
 
LVL 40

Expert Comment

by:evilrix
Comment Utility
>> Actually, we all get paid
I get paid by just knowing I have helped someone *cough* -- I did just get a free T-Shirt :)
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> >> Actually, we all get paid

I wish :)
0
 
LVL 17

Expert Comment

by:rstaveley
Comment Utility
If we did get paid, it would all be out-sourced to much cleverer people in Asia, willing to do it for fewer pennies.
0
 
LVL 40

Expert Comment

by:evilrix
Comment Utility
>> out-sourced to much cleverer people
Speak for yourself :-p
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
C++ Properties One feature missing from standard C++ that you will find in many other Object Oriented Programming languages is something called a Property (http://www.experts-exchange.com/Programming/Languages/CPP/A_3912-Object-Properties-in-C.ht…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now