[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 463
  • Last Modified:

remove duplicates

Hello there,
I have over a million unique ids in (posts_ids.txt) each line by line.. I have a new file called (newposts_ids.txt) that has over 50 thousand ids each line by line too.. If I combine both files and remove dups then I wont be able to know which ones are the new ids.. so Is there anything that can scan the (newposts_ids.txt) and remove dups regarding to the (post_ids.txt) file?
0
Gonzales2009
Asked:
Gonzales2009
  • 15
  • 13
  • 11
  • +3
1 Solution
 
jkrCommented:
Read them into a std::set and the duplicates will be removed automatically, see http://www.sgi.com/tech/stl/set.html
0
 
evilrixSenior Software Engineer (Avast)Commented:
If this is a one off task and you aren't looking to develop this code for any other purpose you can knock something up in Perl -- it's be about a 5 line script :) If you'd be happy with that and jkr doesn't mind moving this to a Perl Q I'll be happy to help you out in that respect; otherwise, this is a C++ Q so I'm not sure it'd be appropriate for me to post Perl code here!

-Rx.
0
 
rstaveleyCommented:
sort < posts_id.txt | uniq > output.txt
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
rstaveleyCommented:
Sorry I should have read the Q properly.
0
 
Gonzales2009Author Commented:
hello evilrix, how long will it take to scan the file and remove dups? the reason why I selected c++ is because its a really fast language! anyways Im running windows xp if you can tell me how to do it.. I can compare both and ill decide which one is better/faster and if I accept your answer then well move it into perl.. thanks
0
 
Gonzales2009Author Commented:
this is the code that i am using, but it only remove duplicates from one file and exports into other file!!
is it possible to edit this so it can do what im looking for?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//FUNCTIONS
int check_string(char *string2);
void additem(char passed[501]);
//STRUCTS
struct dupe
    {
    char string[500];
    struct dupe *next;
} *mylist;
 
struct dupe *myptr;
struct dupe *tmp, *prev;
 
    int main() {
    FILE *fInput;
    FILE *fOutput;
    char InputText[500];
    char FileInput[256], FileOutput[256];
    unsigned long int RemovedDupes = 0;
    printf("removes duplicated lines from text files\n");
    printf("dupe remover\n------------------------\n\n");
    mylist = NULL;
    printf("enter input file: ");
    fgets(FileInput, sizeof(FileInput), stdin);
    FileInput[strlen(FileInput)-1] = 0;
    fInput = fopen(FileInput,"r");
    
 
 
        if (fInput == NULL) {
        	 printf(" *) Could not open %s for reading!\n", FileInput);
        	 system("PAUSE"); exit(0);
    }
 
    printf("enter output file: ");
    fgets(FileOutput, sizeof(FileOutput), stdin);
    FileOutput[strlen(FileOutput)-1] = 0;
    
    fOutput = fopen(FileOutput,"w");
    
 
 
        if (fOutput == NULL) {
        	 printf(" *) Could not open %s for writing!\n", FileOutput);
        	 system("PAUSE"); exit(0);
    }
 
    printf(" *) Successfully opened %s\n", FileInput);
    printf(" *) Filtering for duplicates...\n\n");
    additem("_null_");
 
 
        while (fgets(InputText, sizeof InputText, fInput)) {
        	 InputText[strlen(InputText)-1] = '\0';
        	 
 
 
            	 switch (check_string(InputText)) {
            	 case 0: 
            	 additem(InputText);
            		 break;
            	 case 1:
            		 ++RemovedDupes;
            		 break;
            	 }
            	
        }
 
        tmp = prev = mylist;
        while(tmp && tmp->next) { prev = tmp; tmp = tmp->next; }
        prev->next = NULL;
        free(tmp);
        printf(" *) Finished adding to memory, writing to %s...\n", FileOutput);
        
        myptr = mylist;
 
 
            while (myptr) {
            	 fputs(myptr->string,fOutput);
            	 fputs("\n",fOutput);
            	 myptr = myptr->next;
        }
 
        printf(" !) Finished writing! [%d] duplicates were successfully removed.\n\n",RemovedDupes);
        system("PAUSE");
        return 0;
    }
 
 
        void additem(char passed[501]) {
        struct dupe *b;
        b = (struct dupe *)malloc(sizeof(struct dupe));
        if (b == NULL) { printf("Could not allocate any more memory.\n"); exit(0); }
        strcpy(b->string,passed);
        b->next = mylist;
        mylist = b;
    }
 
 
        int check_string(char *string2) {
        	myptr = mylist;
 
 
            	while (myptr) {
 
 
                		if (strcmp(myptr->string,string2) == 0) {
                			return 1;
                		}
                	myptr = myptr->next;
                	}
                	return 0;
            }

Open in new window

0
 
jkrCommented:
Hm, that can be as simple as

#include <fstream>
#incude <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is("input.txt");
ofstream os("output.txt");
set<string> data;

while(!is.is_eof()) {

  string sLine;

  getline(is.sLine);

  data.insert(sLine);
}

set<string>::iterator i;

for (i = data.begin(); i != data.end(); ++i) {

  os << *i << endl;
}

return 0;
}

0
 
Gonzales2009Author Commented:
there has to be two inputs and one output

input
>posts_ids.txt
>newposts_ids.txt

output
>newposts_nodups_ids.txt)
0
 
jkrCommented:
Sorry, just change that to

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data;

while(!is1.is_eof()) {

  string sLine;

  getline(is1,sLine);

  data.insert(sLine);
}

while(!is2.is_eof()) {

  string sLine;

  getline(is2,sLine);

  data.insert(sLine);
}
0
 
Gonzales2009Author Commented:
thanks jkr, this is the code that im trying to compile as what you have showed me but its displaying some errors with dev c++ v4
#include <fstream.h>
#incude <iostream.h>
#include <string.h>
#include <set.h>
using namespace std;
 
int main () {
 
ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data;
 
while(!is1.is_eof()) {
 
  string sLine;
 
  getline(is1,sLine);
 
  data.insert(sLine);
}
 
while(!is2.is_eof()) {
 
  string sLine;
 
  getline(is2,sLine);
 
  data.insert(sLine);
}
 
set<string>::iterator i;
 
for (i = data.begin(); i != data.end(); ++i) {
 
  os << *i << endl;
}
 
return 0;
}

Open in new window

0
 
Gonzales2009Author Commented:
using visual c++ I get this


Deleting intermediate files and output files for project 'a - Win32 Debug'.
--------------------Configuration: a - Win32 Debug--------------------
Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\myprojects\a\a.cpp(44) : fatal error C1010: unexpected end of file while looking for precompiled header directive
Error executing cl.exe.

a.exe - 1 error(s), 0 warning(s)
0
 
jkrCommented:
Make that

#include "StdAfx.h"
#include <fstream>
#incude <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data;

while(!is1.is_eof()) {

  string sLine;

  getline(is1,sLine);

  data.insert(sLine);
}

while(!is2.is_eof()) {

  string sLine;

  getline(is2,sLine);

  data.insert(sLine);
}

set<string>::iterator i;

for (i = data.begin(); i != data.end(); ++i) {

  os << *i << endl;
}

return 0;
}

VC++ needs that file for precompiled eaders, if you don't have one, just provide an empty file with that name.
0
 
Gonzales2009Author Commented:
these are the errors that its showing now..
Deleting intermediate files and output files for project 'a - Win32 Debug'.
--------------------Configuration: a - Win32 Debug--------------------
Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Node' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(15) : error C2039: 'is_eof' : is not a member of 'basic_ifstream<char,struct std::char_traits<char> >'
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(15) : fatal error C1903: unable to recover from previous error(s); stopping compilation
Error executing cl.exe.
 
a.exe - 2 error(s), 5 warning(s)

Open in new window

0
 
jkrCommented:
Ooops, sorry,

#include "StdAfx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data;

while(!is1.eof()) {

  string sLine;

  getline(is1,sLine);

  data.insert(sLine);
}

while(!is2.eof()) {

  string sLine;

  getline(is2,sLine);

  data.insert(sLine);
}

set<string>::iterator i;

for (i = data.begin(); i != data.end(); ++i) {

  os << *i << endl;
}

return 0;
}

compiles fine for me.
0
 
Gonzales2009Author Commented:
ok compiles and its runs fine but thats not exactly what im looking for!
The file (newposts_nodups_ids.txt) has all ids, and I need to have only the new ids from (newpost_ids.txt) that arent duplicates

example input
>post_ids.txt
111
222
333

>newpost_ids.txt
111
111abc
222
222abc

example output
>newposts_nodups_ids.txt
111abc
222abc

the software will read (post_ids.txt) then read (newpost_ids.txt) take out dups and put no dups in (newposts_nodups_ids.txt)
0
 
jkrCommented:
In that case (you have taken a look at http://www.sgi.com/tech/stl/set.html , have you) use

//#include "Stddata^1fx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;

int main () {

ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data1;
set<string> data2;
set<string> result;

while(!is1.eof()) {

  string sLine;

  getline(is1,sLine);

  data2.insert(sLine);
}

while(!is2.eof()) {

  string sLine;

  getline(is2,sLine);

  data2.insert(sLine);
}

  set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),
                 inserter(result, result.begin()));
  copy(result.begin(), result.end(), ostream_iterator<const char*>(os, "\n"));
  cout << endl;

return 0;
}
0
 
rstaveleyCommented:
I suspect you need to change...

> ostream_iterator<const char*>(os, "\n")

...to...

ostream_iterator<string>(os, "\n")
0
 
itsmeandnobodyelseCommented:
>>>> while(!is1.eof()) {
>>>>  string sLine;
>>>>  getline(is1,sLine);
>>>>  data.insert(sLine);
>>>>> }

You better replace that kind of loop by

   string sLine;
   while (getline(is1, sLine))
          data.insert(sLine);

With the first loop you neither will catch read errors nor prevent from adding an empty string at end of file.

If you only want to store entries of the second file which were not in the first file you can do by

   ...
   string sLine;
   while (getline(is1, sLine))
          data.insert(sLine);

   while (getline(is2, sLine))
   {
         if (data.find(sLine) == data.end())
         {
               os << sLine;
         }
   }
 
Regards, Alex  


0
 
Gonzales2009Author Commented:
I included and edited jkr source with rstaveley and itsmeandnobodyelse
but its showing one error.. the final code comes up to this

Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\myprojects\a\a.cpp(35) : fatal error C1010: unexpected end of file while looking for precompiled header directive
Error executing cl.exe.

a.exe - 1 error(s), 0 warning(s)
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;
 
int main () {
 
ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data1;
set<string> data2;
set<string> result;
 
string sLine;
while (getline(is1, sLine))
      data.insert(sLine);
 
while (getline(is2, sLine))
{
     if (data.find(sLine) == data.end())
     {
           os << sLine;
     }
}
 
  set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),
                 inserter(result, result.begin()));
  copy(result.begin(), result.end(), ostream_iterator<string>(os, "\n"));
  cout << endl;
 
return 0;
}

Open in new window

0
 
jkrCommented:
>>fatal error C1010: unexpected end of file

Just add

#include "stdafx.h"

as the 1st line of the code - we went through that already.
0
 
Gonzales2009Author Commented:
I did that but it gave more errors
Deleting intermediate files and output files for project 'a - Win32 Debug'.
--------------------Configuration: a - Win32 Debug--------------------
Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Node' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(19) : error C2065: 'data' : undeclared identifier
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(19) : error C2228: left of '.insert' must have class/struct/union type
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(23) : error C2228: left of '.find' must have class/struct/union type
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(23) : error C2228: left of '.end' must have class/struct/union type
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(29) : error C2065: 'set_difference' : undeclared identifier
c:\program files\microsoft visual studio\vc98\include\iterator(143) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::
basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<
std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(30) : see reference to class template instantiation 'std::insert_iterator<std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<c
har,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > >' being compiled
Error executing cl.exe.
 
a.exe - 5 error(s), 6 warning(s)

Open in new window

0
 
jkrCommented:
Not surprising, the code should be

set<string> data1;
set<string> data2;
set<string> result;

while(!is1.eof()) {

  string sLine;

  getline(is1,sLine);

  data2.insert(sLine);
}

while(!is2.eof()) {

  string sLine;

  getline(is2,sLine);

  data2.insert(sLine);
}
0
 
Gonzales2009Author Commented:
sorry man but its still showing some errors


#include "stdafx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;
 
int main () {
 
ifstream is1("post_ids.txt");
ifstream is2("newpost_ids.txt");
ofstream os("newposts_nodups_ids.txt");
set<string> data1;
set<string> data2;
set<string> result;
 
while(!is1.eof()) {
 
  string sLine;
 
  getline(is1,sLine);
 
  data2.insert(sLine);
}
 
while(!is2.eof()) {
 
  string sLine;
 
  getline(is2,sLine);
 
  data2.insert(sLine);
}
 
  set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),
                 inserter(result, result.begin()));
  copy(result.begin(), result.end(), ostream_iterator<string>(os, "\n"));
  cout << endl;
 
return 0;
}
 
==================================
Deleting intermediate files and output files for project 'a - Win32 Debug'.
--------------------Configuration: a - Win32 Debug--------------------
Compiling...
StdAfx.cpp
Compiling...
a.cpp
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Node' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\xtree(120) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::bas
ic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<std
::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::const_iterator' : identifier was truncated to '255' characters in the debug information
        c:\program files\microsoft visual studio\vc98\include\set(33) : see reference to class template instantiation 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std:
:allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::alloc
ator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(13) : see reference to class template instantiation 'std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<
char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >' being compiled
c:\program files\microsoft visual studio\vc98\include\utility(25) : warning C4786: 'std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::set<std::ba
sic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::_Kfn,std::less<st
d::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator' : identifier was truncated to '255' characters in the debug information
        C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(23) : see reference to class template instantiation 'std::pair<std::_Tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_trait
s<char>,std::allocator<char> >,std::set<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char
>,std::allocator<char> > > >::_Kfn,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > >::iterator,bool>' being compiled
C:\Program Files\Microsoft Visual Studio\MyProjects\a\a.cpp(35) : error C2065: 'set_difference' : undeclared identifier
Error executing cl.exe.
 
a.exe - 1 error(s), 6 warning(s)

Open in new window

0
 
jkrCommented:
Well, just one error - add

#include <algorithm>

I.e.

#include "stdafx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
#include <algorithm>
using namespace std;
0
 
itsmeandnobodyelseCommented:
The code is

#pragma warning (disable : 4786)

#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;

int main () {
   
    ifstream is1("post_ids.txt");
    ifstream is2("newpost_ids.txt");
    ofstream os("newposts_nodups_ids.txt");
    set<string> data;
   
    string sLine;
    while (getline(is1, sLine))
        data.insert(sLine);
   
    while (getline(is2, sLine))
    {
        if (data.find(sLine) == data.end())
        {
            os << sLine;
        }
    }
    is1.close();
    is2.close();
    os.close();
   
    return 0;
}


>>>> #pragma warning (disable : 4786)
That disables the warnings which are a bug in VC6

>>>> #include "stdafx.h"
Better switch off 'precompiled headers' in the project settings (C++ - Precompiled Headers). PCH doesn't make sende for non-MFC and non-WINAPI projects.




0
 
itsmeandnobodyelseCommented:
correction:
        if (data.find(sLine) == data.end())
        {
            os << sLine << endl;   // add a linefeed for each non-duplicate
        }
0
 
Gonzales2009Author Commented:
jkr code is not working, the text file shows as blank..

itsmeandnobodyelse: your code is working its showing the right ids but in this format 111a222a
can you help me make it line by line instead?
0
 
itsmeandnobodyelseCommented:
>>>> while(!is1.eof()) {
>>>>  string sLine;
>>>>  getline(is1, sLine);

As told, the above is bad coding as there is no check on error for the getline.

>>>> set<string> data2;
If the second file has no duplicate entries itself, you don't need to store all entries in a std::set but simply check whether the entry exists in the first set and write to file if it is a new entry.

>>>> set_difference(data1.begin(), data1.end(), data2.begin(), data2.end(),
>>>>                 inserter(result, result.begin()));
>>>>  copy(result.begin(), result.end(), ostream_iterator<string>(os, "\n"));

That is some sort of 'overkill' if the second file has no duplicates itself.

Moreover it is wrong, if the second file is not a superset of the first set. Then, the above method would add all entries which are in data1 but not in data2.
0
 
Gonzales2009Author Commented:
wonderful itsmeandnobodyelse
0
 
Gonzales2009Author Commented:
this is the final code, just let me know if everything is right as its compiling fine!!
#pragma warning (disable : 4786)
#include "stdafx.h"
#include <fstream>
#include <iostream>
#include <string>
#include <set>
using namespace std;
 
int main () {
   
    ifstream is1("post_ids.txt");
    ifstream is2("newpost_ids.txt");
    ofstream os("newposts_nodups_ids.txt");
    set<string> data;
   
    string sLine;
    while (getline(is1, sLine))
        data.insert(sLine);
   
    while (getline(is2, sLine))
    {
        if (data.find(sLine) == data.end())
        {
            os << sLine << endl;   // add a linefeed for each non-duplicate
        }
    }
    is1.close();
    is2.close();
    os.close();
   
    return 0;
}

Open in new window

0
 
itsmeandnobodyelseCommented:
>>>> showing the right ids but in this format 111a222a
Yes, make the correction I posted in some previous comment
0
 
itsmeandnobodyelseCommented:
The main problem of jkr's code is that the files were not closed.
0
 
itsmeandnobodyelseCommented:
>>>> #include "stdafx.h"
As told, your prog doesn't need 'precompiled headers' (PCH). PCH is a concept when using big header files like 'windows.h' (WINAPI) or 'afx.h' (MFC) . Then, compile time can be improved by compiling these headers separately (once). You will find the include statements ifor windows.h and afx.h in the stdafx.h the Wizard has generated for you. In case of your above prog the stdafx.h makes only trouble. You can't include the STL headers in stdafx.h cause template classes cannot be precompiled either. So, it is really best you switch off PCH both for Debug and Release configuration. It will make your life happier.
0
 
evilrixSenior Software Engineer (Avast)Commented:
As discussed, below is a Perl version.

NB. You will, of course, need to download and install a Perl interpreter: -
http://www.activestate.com/store/download.aspx?prdGUID=81fbce82-6bd5-49bc-a915-08d58c2648ca

I provide this purely for completeness and since it is off topic you should NOT award me with any pointer for this post as that would be unfair to all those who have contributed to the C++ solution.

-Rx.
#!/usr/bin/perl
use strict;
 
open POSTSIDS, "<", "posts_ids.txt" or die;
open NEWPOSTSIDS, "<", "newposts_ids.txt" or die;
open NODUPS, ">", "nodups.txt" or die;
 
my %newposts = map { $_ => undef } <NEWPOSTSIDS>;
while(<POSTSIDS>) { delete $newposts{$_}; }
print NODUPS keys %newposts;
 
close POSTSIDS;
close NEWPOSTSIDS;
close NODUPS;

Open in new window

0
 
rstaveleyCommented:
I do like Perl. Clever trick with map too.

I think it is completely OK to present alternative tools for a job in any TA. Perl is a developer's Swiss Army Knife.
0
 
itsmeandnobodyelseCommented:
>>>> my %newposts = map { $_ => undef } <NEWPOSTSIDS>;
>>>> while(<POSTSIDS>) { delete $newposts{$_}; }

Assuming the second file is a superset of the first file with only a few new entries, the above method is not very efficient because of the many deletions. But I am pretty sure that with PERL you could read the first file into the map and check for the entries of the second file as well.

>>>> I think it is completely OK to present alternative tools for a job in any TA.
I see two problems with that:

1. As long as there is no accepted solution, the alternative language
    code may confuse the asker more than help him/her.

2. Fans of a alternative language rarely were objective. Sometimes the
    'easiness' of a language doing some job with fewer statements than
    another language was combined with less efficiency or simplifying
    assumptions. E. g. I like the 'or die' in the above PERL script but a
    error message stating that file 2 doesn't exist may be a better solution
    even if it costs two more statements.

I personally do not post in other TA's than C/C++ and if I post in C TA I avoid C++ solutions.

Regards, Alex
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> the above method is not very efficient
It was done this way as newpostids (50 thousand ids) is significantly smaller than postids (million unique ids) and as such I felt this way would be more memory efficient! I'm not convinced the delete is actually as inefficient as you assume; however, since I don't have the OPs data I cannot test either case so I went with what seemed to be the better solution.

>> but a error message stating that file 2 doesn't exist may be a better solution
Die states the line of the error, which should be clear enough! It can always be changed to...
open POSTSIDS, "<", "posts_ids.txt" or die "opsts_ids.txt cannot be opened"
0
 
itsmeandnobodyelseCommented:
>>>> as newpostids (50 thousand ids) is significantly smaller than postids (million unique ids)
I didn't remember that from the original question.

The question is whether a million of searches in a map of 50k + 50k inserts + (about) 50k deletes is faster than 50k searches on a set of million entries + 1 million of inserts. I will test that with a C++ prog (as unfortunately I never owned a Swiss Army Knife).

>>>> Die states the line of the error, which should be clear enough!
Sorry, but you didn't get my point..
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> I will test that with a C++ prog
If you really must!

>> Sorry, but you didn't get my point
No, I did I just couldn't be bothered to rise to it!
0
 
rstaveleyCommented:
Let's move this thread onto something less controversial like religion or politics, eh?
0
 
itsmeandnobodyelseCommented:
>>>> Let's move this thread onto something less controversial like religion or politics, eh?
Why not into Other\Misc\Somewhat TA?

Comparing philosophies of two programming languages is only interesting for someone who likes both. For others it is only annoying. I never experienced a good discussion when someone posted from solutions from another TA. But that maybe a subjective impression (or better told that may be caused by my comments).
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> Comparing philosophies of two programming languages is only interesting for someone who likes both
I was not attempting to compare anything. I originally suggested the OP might find a Perl solution easier/quicker if this was a one off admin task -- it is often simpler to just choose the right tool for the right job and this kind of task is something Perl excels at! The OP then asked for a Perl version of the code so I obliged and posted it but made it clear the post was off topic and requested it NOT be included in acceptance of the final answer to ensure that only C++ answers would be accepted -- so everyone who deserves points for contributing to this C++ thread gets them. I do not feel this was unreasonable nor do I see why it is necessary to make such a big deal about it! I was attempting to assist the OP not upset you or anyone else for that matter.
0
 
itsmeandnobodyelseCommented:
>>>> I do not feel this was unreasonable nor do I see why it is
>>>> necessary to make such a big deal about it!

rstaveley liked the PERL solution you posted.

I told that I don't like solutions of another TA before a solution was accepted.

Not more, not less.

The only one who makes it a big deal are you.

>>>> The question is whether a million of searches in a
>>>> map of 50k + 50k inserts + (about) 50k deletes is faster
>>>> than 50k searches on a set of million entries + 1 million
>>>> of inserts. I will test that with a C++ prog
The results were 28 seconds for the solution that makes 1 million of searches and erases the duplicate entries found in the small set and 44 seconds for the solution that searches in the big set and writes only these that were not found. Tested with 1 million of entries in file 1 and 52,500 entries in file 2 where 2,500 were no duplicates.

So the below code is more efficient in C++ (following the approach made in the PERL script).

int main ()
{
    ifstream is1("post_ids.txt");
    ifstream is2("newpost_ids.txt");
    ofstream os("newposts_nodups_ids_2.txt");
    set<string> data;
   
    string sLine;
    while (getline(is2, sLine))
        data.insert(sLine);

    set<string>::iterator f;
    while (getline(is1, sLine))
    {

        if ((f = data.find(sLine)) != data.end())
        {
            data.erase(f);
        }
    }
    for (f = data.begin(); f != data.end(); ++f)
    {
        os << *f << endl;
    }
    is1.close();
    is2.close();
    os.close();
   
    return 0;
}
 


0
 
evilrixSenior Software Engineer (Avast)Commented:
Muhahaha, the irony -- the accepted answer is the one that was born from my "not very efficient " Perl code :)
0
 
Infinity08Commented:
>> Muhahaha

Was that an evil laugh ? Now I know where you got your nick from ;)
0
 
itsmeandnobodyelseCommented:
>>>> Muhahaha, the irony
You actually don't get the point again.

My remarks regarding effency was based on the (wrong) assumption that file 2 is a superset of file 1. It nothing had to do with PERL. On the contrary, after recognizing my wrong assumption I adopted the algorithm (from your PERL script), made the tests ("if you really must")  and posted the results.
0
 
rstaveleyCommented:
>> Muhahaha
>
> Was that an evil laugh ?

I have *proof* that it is. I accessed http://www.research.att.com/~ttsweb/tts/demo.php to investigate with Firefox (the browser of good guys) and pasted "Muhahaha" into the text to speech box and got a freeze up when I hit the "speak" button, but when I did the same with Internet explorer (the browser of the bad guys), not only did Crystal US English read the file I downloaded, but it sounded not at all like the Dr Evil voice that you'd expect... but instead a benign sounding rendition. Now that really makes me shudder!
0
 
Gonzales2009Author Commented:
i remember back in the day when this site was more user friendly and wasnt all about the money.. too bad it was gone in the wrong way!
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> You actually don't get the point again.
Actually, I just don't care!

>> Was that an evil laugh  ? Now I know where you got your nick from ;)
By name and by nature! I8 ;-)

>> i remember back in the day when this site was more user friendly and wasnt all about the money
Eh? Money? What money? Can I have some?
0
 
Gonzales2009Author Commented:
would be sick if you didnt have to pay $189.95 to ask questions for two years... anyways is not like they pay the experts bummer
0
 
rstaveleyCommented:
Read lots of tongue-in-cheek smilies in all of this, Gonzales2009. None of the commentators here are driven by money. I think the best you can get out of EE is kudos and a tee-shirt and my wife would divorce me, if she saw me wearing the latter. It does get a bit hot under the collar sometimes, but EE always has been like that as long as I've been on it [yikes... perhaps I'm the cause?]. Argument is actually a healthy sign that people care.
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> None of the commentators here are driven by money
My time is given freely as are (all?) the other experts -- we get paid zilch!

>> my wife would divorce me
Yours too eh ? :)
0
 
itsmeandnobodyelseCommented:
>>>> i remember back in the day when this site was more user friendly

Sorry for spoiling your thread with some unfriendly remarks ...

... but as a result you got a better solution.

I think a controversy between experts is not so bad for the asker, but of course they shouldn't take it personally ...
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> spoiling your thread with some unfriendly remarks
I'm glad you recognized that! :-$
0
 
Infinity08Commented:
>> >> None of the commentators here are driven by money
>> My time is given freely as are (all?) the other experts -- we get paid zilch!

Actually, we all get paid ... It's just you that does it for free ... lol ... j/k of course ;)
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> Actually, we all get paid
I get paid by just knowing I have helped someone *cough* -- I did just get a free T-Shirt :)
0
 
Infinity08Commented:
>> >> Actually, we all get paid

I wish :)
0
 
rstaveleyCommented:
If we did get paid, it would all be out-sourced to much cleverer people in Asia, willing to do it for fewer pennies.
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> out-sourced to much cleverer people
Speak for yourself :-p
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

  • 15
  • 13
  • 11
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now