?
Solved

remove duplicates

Posted on 2007-11-18
59
Medium Priority
?
460 Views
Last Modified: 2012-08-13
Hello there,
I have over a million unique ids in (posts_ids.txt) each line by line.. I have a new file called (newposts_ids.txt) that has over 50 thousand ids each line by line too.. If I combine both files and remove dups then I wont be able to know which ones are the new ids.. so Is there anything that can scan the (newposts_ids.txt) and remove dups regarding to the (post_ids.txt) file?
0
Comment
Question by:Gonzales2009
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 15
  • 13
  • 11
  • +3
59 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 20308315
Read them into a std::set and the duplicates will be removed automatically, see http://www.sgi.com/tech/stl/set.html
0
 
LVL 40

Expert Comment

by:evilrix
ID: 20308517
If this is a one off task and you aren't looking to develop this code for any other purpose you can knock something up in Perl -- it's be about a 5 line script :) If you'd be happy with that and jkr doesn't mind moving this to a Perl Q I'll be happy to help you out in that respect; otherwise, this is a C++ Q so I'm not sure it'd be appropriate for me to post Perl code here!

-Rx.
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 20309123
sort < posts_id.txt | uniq > output.txt
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 17

Expert Comment

by:rstaveley
ID: 20309153
Sorry I should have read the Q properly.
0
 

Author Comment

by:Gonzales2009
ID: 20309275
hello evilrix, how long will it take to scan the file and remove dups? the reason why I selected c++ is because its a really fast language! anyways Im running windows xp if you can tell me how to do it.. I can compare both and ill decide which one is better/faster and if I accept your answer then well move it into perl.. thanks
0
 

Author Comment

by:Gonzales2009
ID: 20309293
this is the code that i am using, but it only remove duplicates from one file and exports into other file!!
is it possible to edit this so it can do what im looking for?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//FUNCTIONS
int check_string(char *string2);
void additem(char passed[501]);
//STRUCTS
struct dupe
    {
    char string[500];
    struct dupe *next;
} *mylist;
 
struct dupe *myptr;
struct dupe *tmp, *prev;
 
    int main() {
    FILE *fInput;
    FILE *fOutput;
    char InputText[500];
    char FileInput[256], FileOutput[256];
    unsigned long int RemovedDupes = 0;
    printf("removes duplicated lines from text files\n");
    printf("dupe remover\n------------------------\n\n");
    mylist = NULL;
    printf("enter input file: ");
    fgets(FileInput, sizeof(FileInput), stdin);
    FileInput[strlen(FileInput)-1] = 0;
    fInput = fopen(FileInput,"r");
    
 
 
        if (fInput == NULL) {
        	 printf(" *) Could not open %s for reading!\n", FileInput);
        	 system("PAUSE"); exit(0);
    }
 
    printf("enter output file: ");
    fgets(FileOutput, sizeof(FileOutput), stdin);
    FileOutput[strlen(FileOutput)-1] = 0;
    
    fOutput = fopen(FileOutput,"w");
    
 
 
        if (fOutput == NULL) {
        	 printf(" *) Could not open %s for writing!\n", FileOutput);
        	 system("PAUSE"); exit(0);
    }
 
    printf(" *) Successfully opened %s\n", FileInput);
    printf(" *) Filtering for duplicates...\n\n");
    additem("_null_");
 
 
        while (fgets(InputText, sizeof InputText, fInput)) {
        	 InputText[strlen(InputText)-1] = '\0';
        	 
 
 
            	 switch (check_string(InputText)) {
            	 case 0: 
            	 additem(InputText);
            		 break;
            	 case 1:
            		 ++RemovedDupes;
            		 break;
            	 }
            	
        }
 
        tmp = prev = mylist;
        while(tmp && tmp->next) { prev = tmp; tmp = tmp->next; }
        prev->next = NULL;
        free(tmp);
        printf(" *) Finished adding to memory, writing to %s...\n", FileOutput);
        
        myptr = mylist;
 
 
            while (myptr) {
            	 fputs(myptr->string,fOutput);
            	 fputs("\n",fOutput);
            	 myptr = myptr->next;
        }
 
        printf(" !) Finished writing! [%d] duplicates were successfully removed.\n\n",RemovedDupes);
        system("PAUSE");
        return 0;
    }
 
 
        void additem(char passed[501]) {
        struct dupe *b;
        b = (struct dupe *)malloc(sizeof(struct dupe));
        if (b == NULL) { printf("Could not allocate any more memory.\n"); exit(0); }
        strcpy(b->string,passed);
        b->next = mylist;
        mylist = b;
    }
 
 
        int check_string(char *string2) {
        	myptr = mylist;
 
 
            	while (myptr) {
 
 
                		if (strcmp(myptr->string,string2) == 0) {
                			return 1;
                		}
                	myptr = myptr->next;
                	}
                	return 0;
            }

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
ID: 20309325
Hm, that can be as simple as

#include <fstream>
#incude <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is("input.txt");
ofstream os("output.txt");
set<string> data;

while(!is.is_eof()) {

  string sLine;

  getline(is.sLine);

  data.insert(sLine);
}

set<string>::iterator i;

for (i = data.begin(); i != data.end(); ++i) {

  os << *i << endl;
}

return 0;
}

0
 

Author Comment

by:Gonzales2009
ID: 20309355
there has to be two inputs and one output

input
>posts_ids.txt
>newposts_ids.txt

output
>newposts_nodups_ids.txt)
0
 
LVL 86

Expert Comment

by:jkr
ID: 20309389
Sorry, just change that to

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data;

while(!is1.is_eof()) {

  string sLine;

  getline(is1,sLine);

  data.insert(sLine);
}

while(!is2.is_eof()) {

  string sLine;

  getline(is2,sLine);

  data.insert(sLine);
}
0
 

Author Comment

by:Gonzales2009
ID: 20309730
thanks jkr, this is the code that im trying to compile as what you have showed me but its displaying some errors with dev c++ v4
#include <fstream.h>
#incude <iostream.h>
#include <string.h>
#include <set.h>
using namespace std;
 
int main () {
 
ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data;
 
while(!is1.is_eof()) {
 
  string sLine;
 
  getline(is1,sLine);
 
  data.insert(sLine);
}
 
while(!is2.is_eof()) {
 
  string sLine;
 
  getline(is2,sLine);
 
  data.insert(sLine);
}
 
set<string>::iterator i;
 
for (i = data.begin(); i != data.end(); ++i) {
 
  os << *i << endl;
}
 
return 0;
}

Open in new window

0
 

Author Comment

by:Gonzales2009
ID: 20309776
using visual c++ I get this


Deleting intermediate files and output files for project 'a - Win32 Debug'.
--------------------Configuration: a - Win32 Debug--------------------
Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\myprojects\a\a.cpp(44) : fatal error C1010: unexpected end of file while looking for precompiled header directive
Error executing cl.exe.

a.exe - 1 error(s), 0 warning(s)
0
 
LVL 86

Expert Comment

by:jkr
ID: 20310069
Make that

#include "StdAfx.h"
#include <fstream>
#incude <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data;

while(!is1.is_eof()) {

  string sLine;

  getline(is1,sLine);

  data.insert(sLine);
}

while(!is2.is_eof()) {

  string sLine;

  getline(is2,sLine);

  data.insert(sLine);
}

set<string>::iterator i;

for (i = data.begin(); i != data.end(); ++i) {

  os << *i << endl;
}

return 0;
}

VC++ needs that file for precompiled eaders, if you don't have one, just provide an empty file with that name.
0
 

Author Comment

by:Gonzales2009
ID: 20310118
these are the errors that its showing now..
Deleting intermediate files and output files for project 'a - Win32 Debug'.
--------------------Configuration: a - Win32 Debug--------------------
Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Node' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(15) : error C2039: 'is_eof' : is not a member of 'basic_ifstream<char,struct std::char_traits<char> >'
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(15) : fatal error C1903: unable to recover from previous error(s); stopping compilation
Error executing cl.exe.
 
a.exe - 2 error(s), 5 warning(s)

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
ID: 20310133
Ooops, sorry,

#include "StdAfx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data;

while(!is1.eof()) {

  string sLine;

  getline(is1,sLine);

  data.insert(sLine);
}

while(!is2.eof()) {

  string sLine;

  getline(is2,sLine);

  data.insert(sLine);
}

set<string>::iterator i;

for (i = data.begin(); i != data.end(); ++i) {

  os << *i << endl;
}

return 0;
}

compiles fine for me.
0
 

Author Comment

by:Gonzales2009
ID: 20310171
ok compiles and its runs fine but thats not exactly what im looking for!
The file (newposts_nodups_ids.txt) has all ids, and I need to have only the new ids from (newpost_ids.txt) that arent duplicates

example input
>post_ids.txt
111
222
333

>newpost_ids.txt
111
111abc
222
222abc

example output
>newposts_nodups_ids.txt
111abc
222abc

the software will read (post_ids.txt) then read (newpost_ids.txt) take out dups and put no dups in (newposts_nodups_ids.txt)
0
 
LVL 86

Expert Comment

by:jkr
ID: 20310196
In that case (you have taken a look at http://www.sgi.com/tech/stl/set.html , have you) use

//#include "Stddata^1fx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data1;
set<string> data2;
set<string> result;

while(!is1.eof()) {

  string sLine;

  getline(is1,sLine);

  data2.insert(sLine);
}

while(!is2.eof()) {

  string sLine;

  getline(is2,sLine);

  data2.insert(sLine);
}

  set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),
                 inserter(result, result.begin()));
  copy(result.begin(), result.end(), ostream_iterator<const char*>(os, "\n"));
  cout << endl;

return 0;
}
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 20311534
I suspect you need to change...

> ostream_iterator<const char*>(os, "\n")

...to...

ostream_iterator<string>(os, "\n")
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20312535
>>>> while(!is1.eof()) {
>>>>  string sLine;
>>>>  getline(is1,sLine);
>>>>  data.insert(sLine);
>>>>> }

You better replace that kind of loop by

   string sLine;
   while (getline(is1, sLine))
          data.insert(sLine);

With the first loop you neither will catch read errors nor prevent from adding an empty string at end of file.

If you only want to store entries of the second file which were not in the first file you can do by

   ...
   string sLine;
   while (getline(is1, sLine))
          data.insert(sLine);

   while (getline(is2, sLine))
   {
         if (data.find(sLine) == data.end())
         {
               os << sLine;
         }
   }
 
Regards, Alex  


0
 

Author Comment

by:Gonzales2009
ID: 20312660
I included and edited jkr source with rstaveley and itsmeandnobodyelse
but its showing one error.. the final code comes up to this

Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\myprojects\a\a.cpp(35) : fatal error C1010: unexpected end of file while looking for precompiled header directive
Error executing cl.exe.

a.exe - 1 error(s), 0 warning(s)
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;
 
int main () {
 
ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data1;
set<string> data2;
set<string> result;
 
string sLine;
while (getline(is1, sLine))
      data.insert(sLine);
 
while (getline(is2, sLine))
{
     if (data.find(sLine) == data.end())
     {
           os << sLine;
     }
}
 
  set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),
                 inserter(result, result.begin()));
  copy(result.begin(), result.end(), ostream_iterator<string>(os, "\n"));
  cout << endl;
 
return 0;
}

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
ID: 20312669
>>fatal error C1010: unexpected end of file

Just add

#include "stdafx.h"

as the 1st line of the code - we went through that already.
0
 

Author Comment

by:Gonzales2009
ID: 20312704
I did that but it gave more errors
Deleting intermediate files and output files for project 'a - Win32 Debug'.
--------------------Configuration: a - Win32 Debug--------------------
Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Node' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(19) : error C2065: 'data' : undeclared identifier
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(19) : error C2228: left of '.insert' must have class/struct/union type
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(23) : error C2228: left of '.find' must have class/struct/union type
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(23) : error C2228: left of '.end' must have class/struct/union type
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(29) : error C2065: 'set_difference' : undeclared identifier
c:\program files\microsoft visual studio\vc98\include\iterator(143) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::
basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<
std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(30) : see reference to class template instantiation 'std::insert_iterator<std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<c
har,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > >' being compiled
Error executing cl.exe.
 
a.exe - 5 error(s), 6 warning(s)

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
ID: 20312742
Not surprising, the code should be

set<string> data1;
set<string> data2;
set<string> result;

while(!is1.eof()) {

  string sLine;

  getline(is1,sLine);

  data2.insert(sLine);
}

while(!is2.eof()) {

  string sLine;

  getline(is2,sLine);

  data2.insert(sLine);
}
0
 

Author Comment

by:Gonzales2009
ID: 20312880
sorry man but its still showing some errors


#include "stdafx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;
 
int main () {
 
ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data1;
set<string> data2;
set<string> result;
 
while(!is1.eof()) {
 
  string sLine;
 
  getline(is1,sLine);
 
  data2.insert(sLine);
}
 
while(!is2.eof()) {
 
  string sLine;
 
  getline(is2,sLine);
 
  data2.insert(sLine);
}
 
  set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),
                 inserter(result, result.begin()));
  copy(result.begin(), result.end(), ostream_iterator<string>(os, "\n"));
  cout << endl;
 
return 0;
}
 
==================================
Deleting intermediate files and output files for project 'a - Win32 Debug'.
--------------------Configuration: a - Win32 Debug--------------------
Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Node' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\utility(25) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::ba
sic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<st
d::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(23) : see reference to class template instantiation 'std::pair<std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_trait
s<char>,std::allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char
>,std::allocator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator,bool>' being compiled
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(35) : error C2065: 'set_difference' : undeclared identifier
Error executing cl.exe.
 
a.exe - 1 error(s), 6 warning(s)

Open in new window

0
 
LVL 86

Expert Comment

by:jkr
ID: 20312926
Well, just one error - add

#include <algorithm>

I.e.

#include "stdafx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
#include <algorithm>
using namespace std;
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20312939
The code is

#pragma warning (disable : 4786)

#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;

int main () {
   
    ifstream is1("post_ids.txt");
    ifstream is2("newpost_ids.txt");
    ofstream os("newposts_nodups_ids.txt");
    set<string> data;
   
    string sLine;
    while (getline(is1, sLine))
        data.insert(sLine);
   
    while (getline(is2, sLine))
    {
        if (data.find(sLine) == data.end())
        {
            os << sLine;
        }
    }
    is1.close();
    is2.close();
    os.close();
   
    return 0;
}


>>>> #pragma warning (disable : 4786)
That disables the warnings which are a bug in VC6

>>>> #include "stdafx.h"
Better switch off 'precompiled headers' in the project settings (C++ - Precompiled Headers). PCH doesn't make sende for non-MFC and non-WINAPI projects.




0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20313015
correction:
        if (data.find(sLine) == data.end())
        {
            os << sLine << endl;   // add a linefeed for each non-duplicate
        }
0
 

Author Comment

by:Gonzales2009
ID: 20313017
jkr code is not working, the text file shows as blank..

itsmeandnobodyelse: your code is working its showing the right ids but in this format 111a222a
can you help me make it line by line instead?
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20313020
>>>> while(!is1.eof()) {
>>>>  string sLine;
>>>>  getline(is1, sLine);

As told, the above is bad coding as there is no check on error for the getline.

>>>> set<string> data2;
If the second file has no duplicate entries itself, you don't need to store all entries in a std::set but simply check whether the entry exists in the first set and write to file if it is a new entry.

>>>> set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),
>>>>                 inserter(result, result.begin()));
>>>>  copy(result.begin(), result.end(), ostream_iterator<string>(os, "\n"));

That is some sort of 'overkill' if the second file has no duplicates itself.

Moreover it is wrong, if the second file is not a superset of the first set. Then, the above method would add all entries which are in data1 but not in data2.
0
 

Author Comment

by:Gonzales2009
ID: 20313024
wonderful itsmeandnobodyelse
0
 

Author Comment

by:Gonzales2009
ID: 20313035
this is the final code, just let me know if everything is right as its compiling fine!!
#pragma warning (disable : 4786)
#include "stdafx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;
 
int main () {
   
    ifstream is1("post_ids.txt");
    ifstream is2("newpost_ids.txt");
    ofstream os("newposts_nodups_ids.txt");
    set<string> data;
   
    string sLine;
    while (getline(is1, sLine))
        data.insert(sLine);
   
    while (getline(is2, sLine))
    {
        if (data.find(sLine) == data.end())
        {
            os << sLine << endl;   // add a linefeed for each non-duplicate
        }
    }
    is1.close();
    is2.close();
    os.close();
   
    return 0;
}

Open in new window

0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20313041
>>>> showing the right ids but in this format 111a222a
Yes, make the correction I posted in some previous comment
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20313051
The main problem of jkr's code is that the files were not closed.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20313146
>>>> #include "stdafx.h"
As told, your prog doesn't need 'precompiled headers' (PCH). PCH is a concept when using big header files like 'windows.h' (WINAPI) or 'afx.h' (MFC) . Then, compile time can be improved by compiling these headers separately (once). You will find the include statements ifor windows.h and afx.h in the stdafx.h the Wizard has generated for you. In case of your above prog the stdafx.h makes only trouble. You can't include the STL headers in stdafx.h cause template classes cannot be precompiled either. So, it is really best you switch off PCH both for Debug and Release configuration. It will make your life happier.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 20316711
As discussed, below is a Perl version.

NB. You will, of course, need to download and install a Perl interpreter: -
http://www.activestate.com/store/download.aspx?prdGUID=81fbce82-6bd5-49bc-a915-08d58c2648ca

I provide this purely for completeness and since it is off topic you should NOT award me with any pointer for this post as that would be unfair to all those who have contributed to the C++ solution.

-Rx.
#!/usr/bin/perl
use strict;
 
open POSTSIDS, "<", "posts_ids.txt" or die;
open NEWPOSTSIDS, "<", "newposts_ids.txt" or die;
open NODUPS, ">", "nodups.txt" or die;
 
my %newposts = map { $_ => undef } <NEWPOSTSIDS>;
while(<POSTSIDS>) { delete $newposts{$_}; }
print NODUPS keys %newposts;
 
close POSTSIDS;
close NEWPOSTSIDS;
close NODUPS;

Open in new window

0
 
LVL 17

Expert Comment

by:rstaveley
ID: 20318353
I do like Perl. Clever trick with map too.

I think it is completely OK to present alternative tools for a job in any TA. Perl is a developer's Swiss Army Knife.
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20318580
>>>> my %newposts = map { $_ => undef } <NEWPOSTSIDS>;
>>>> while(<POSTSIDS>) { delete $newposts{$_}; }

Assuming the second file is a superset of the first file with only a few new entries, the above method is not very efficient because of the many deletions. But I am pretty sure that with PERL you could read the first file into the map and check for the entries of the second file as well.

>>>> I think it is completely OK to present alternative tools for a job in any TA.
I see two problems with that:

1. As long as there is no accepted solution, the alternative language
    code may confuse the asker more than help him/her.

2. Fans of a alternative language rarely were objective. Sometimes the
    'easiness' of a language doing some job with fewer statements than
    another language was combined with less efficiency or simplifying
    assumptions. E. g. I like the 'or die' in the above PERL script but a
    error message stating that file 2 doesn't exist may be a better solution
    even if it costs two more statements.

I personally do not post in other TA's than C/C++ and if I post in C TA I avoid C++ solutions.

Regards, Alex
0
 
LVL 40

Expert Comment

by:evilrix
ID: 20318682
>> the above method is not very efficient
It was done this way as newpostids (50 thousand ids) is significantly smaller than postids (million unique ids) and as such I felt this way would be more memory efficient! I'm not convinced the delete is actually as inefficient as you assume; however, since I don't have the OPs data I cannot test either case so I went with what seemed to be the better solution.

>> but a error message stating that file 2 doesn't exist may be a better solution
Die states the line of the error, which should be clear enough! It can always be changed to...
open POSTSIDS, "<", "posts_ids.txt" or die "opsts_ids.txt cannot be opened"
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20318835
>>>> as newpostids (50 thousand ids) is significantly smaller than postids (million unique ids)
I didn't remember that from the original question.

The question is whether a million of searches in a map of 50k + 50k inserts + (about) 50k deletes is faster than 50k searches on a set of million entries + 1 million of inserts. I will test that with a C++ prog (as unfortunately I never owned a Swiss Army Knife).

>>>> Die states the line of the error, which should be clear enough!
Sorry, but you didn't get my point..
0
 
LVL 40

Expert Comment

by:evilrix
ID: 20318916
>> I will test that with a C++ prog
If you really must!

>> Sorry, but you didn't get my point
No, I did I just couldn't be bothered to rise to it!
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 20318926
Let's move this thread onto something less controversial like religion or politics, eh?
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20318980
>>>> Let's move this thread onto something less controversial like religion or politics, eh?
Why not into Other\Misc\Somewhat TA?

Comparing philosophies of two programming languages is only interesting for someone who likes both. For others it is only annoying. I never experienced a good discussion when someone posted from solutions from another TA. But that maybe a subjective impression (or better told that may be caused by my comments).
0
 
LVL 40

Expert Comment

by:evilrix
ID: 20319034
>> Comparing philosophies of two programming languages is only interesting for someone who likes both
I was not attempting to compare anything. I originally suggested the OP might find a Perl solution easier/quicker if this was a one off admin task -- it is often simpler to just choose the right tool for the right job and this kind of task is something Perl excels at! The OP then asked for a Perl version of the code so I obliged and posted it but made it clear the post was off topic and requested it NOT be included in acceptance of the final answer to ensure that only C++ answers would be accepted -- so everyone who deserves points for contributing to this C++ thread gets them. I do not feel this was unreasonable nor do I see why it is necessary to make such a big deal about it! I was attempting to assist the OP not upset you or anyone else for that matter.
0
 
LVL 39

Accepted Solution

by:
itsmeandnobodyelse earned 500 total points
ID: 20319139
>>>> I do not feel this was unreasonable nor do I see why it is
>>>> necessary to make such a big deal about it!

rstaveley liked the PERL solution you posted.

I told that I don't like solutions of another TA before a solution was accepted.

Not more, not less.

The only one who makes it a big deal are you.

>>>> The question is whether a million of searches in a
>>>> map of 50k + 50k inserts + (about) 50k deletes is faster
>>>> than 50k searches on a set of million entries + 1 million
>>>> of inserts. I will test that with a C++ prog
The results were 28 seconds for the solution that makes 1 million of searches and erases the duplicate entries found in the small set and 44 seconds for the solution that searches in the big set and writes only these that were not found. Tested with 1 million of entries in file 1 and 52,500 entries in file 2 where 2,500 were no duplicates.

So the below code is more efficient in C++ (following the approach made in the PERL script).

int main ()
{
    ifstream is1("post_ids.txt");
    ifstream is2("newpost_ids.txt");
    ofstream os("newposts_nodups_ids_2.txt");
    set<string> data;
   
    string sLine;
    while (getline(is2, sLine))
        data.insert(sLine);

    set<string>::iterator f;
    while (getline(is1, sLine))
    {

        if ((f = data.find(sLine)) != data.end())
        {
            data.erase(f);
        }
    }
    for (f = data.begin(); f != data.end(); ++f)
    {
        os << *f << endl;
    }
    is1.close();
    is2.close();
    os.close();
   
    return 0;
}
 


0
 
LVL 40

Expert Comment

by:evilrix
ID: 20320528
Muhahaha, the irony -- the accepted answer is the one that was born from my "not very efficient " Perl code :)
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20320613
>> Muhahaha

Was that an evil laugh ? Now I know where you got your nick from ;)
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20320624
>>>> Muhahaha, the irony
You actually don't get the point again.

My remarks regarding effency was based on the (wrong) assumption that file 2 is a superset of file 1. It nothing had to do with PERL. On the contrary, after recognizing my wrong assumption I adopted the algorithm (from your PERL script), made the tests ("if you really must")  and posted the results.
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 20320817
>> Muhahaha
>
> Was that an evil laugh ?

I have *proof* that it is. I accessed http://www.research.att.com/~ttsweb/tts/demo.php to investigate with Firefox (the browser of good guys) and pasted "Muhahaha" into the text to speech box and got a freeze up when I hit the "speak" button, but when I did the same with Internet explorer (the browser of the bad guys), not only did Crystal US English read the file I downloaded, but it sounded not at all like the Dr Evil voice that you'd expect... but instead a benign sounding rendition. Now that really makes me shudder!
0
 

Author Comment

by:Gonzales2009
ID: 20321025
i remember back in the day when this site was more user friendly and wasnt all about the money.. too bad it was gone in the wrong way!
0
 
LVL 40

Expert Comment

by:evilrix
ID: 20321199
>> You actually don't get the point again.
Actually, I just don't care!

>> Was that an evil laugh  ? Now I know where you got your nick from ;)
By name and by nature! I8 ;-)

>> i remember back in the day when this site was more user friendly and wasnt all about the money
Eh? Money? What money? Can I have some?
0
 

Author Comment

by:Gonzales2009
ID: 20321220
would be sick if you didnt have to pay $189.95 to ask questions for two years... anyways is not like they pay the experts bummer
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 20321277
Read lots of tongue-in-cheek smilies in all of this, Gonzales2009. None of the commentators here are driven by money. I think the best you can get out of EE is kudos and a tee-shirt and my wife would divorce me, if she saw me wearing the latter. It does get a bit hot under the collar sometimes, but EE always has been like that as long as I've been on it [yikes... perhaps I'm the cause?]. Argument is actually a healthy sign that people care.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 20321303
>> None of the commentators here are driven by money
My time is given freely as are (all?) the other experts -- we get paid zilch!

>> my wife would divorce me
Yours too eh ? :)
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
ID: 20321496
>>>> i remember back in the day when this site was more user friendly

Sorry for spoiling your thread with some unfriendly remarks ...

... but as a result you got a better solution.

I think a controversy between experts is not so bad for the asker, but of course they shouldn't take it personally ...
0
 
LVL 40

Expert Comment

by:evilrix
ID: 20321543
>> spoiling your thread with some unfriendly remarks
I'm glad you recognized that! :-$
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20321771
>> >> None of the commentators here are driven by money
>> My time is given freely as are (all?) the other experts -- we get paid zilch!

Actually, we all get paid ... It's just you that does it for free ... lol ... j/k of course ;)
0
 
LVL 40

Expert Comment

by:evilrix
ID: 20321787
>> Actually, we all get paid
I get paid by just knowing I have helped someone *cough* -- I did just get a free T-Shirt :)
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20321804
>> >> Actually, we all get paid

I wish :)
0
 
LVL 17

Expert Comment

by:rstaveley
ID: 20322126
If we did get paid, it would all be out-sourced to much cleverer people in Asia, willing to do it for fewer pennies.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 20322286
>> out-sourced to much cleverer people
Speak for yourself :-p
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
Templates For Beginners Or How To Encourage The Compiler To Work For You Introduction This tutorial is targeted at the reader who is, perhaps, familiar with the basics of C++ but would prefer a little slower introduction to the more ad…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question