?
Solved

String search and Replace from a text file

Posted on 2006-04-09
14
Medium Priority
?
2,282 Views
Last Modified: 2011-10-03
Hi Experts,

Basically C just gets me crazy.

I need a complete C program which can open a txt file, search and change for example words:

'Me' to 'You'
'There' to 'Their'

These words will be hard coded. Changed file will be displayed, not really needed to be saved to a new file.

I know in C++ its basically:
String.Replace("oldval","newval");

But like I said C gets me.

Can any expert tackle this problem?
0
Comment
Question by:Omer_85
  • 5
  • 5
  • 2
  • +1
14 Comments
 
LVL 16

Expert Comment

by:PaulCaswell
ID: 16411668
Hi Omer_85,

>>Can any expert tackle this problem?
We can help you do it but we cant do it for you.

I would suggest you search for the 'Boyer Moore' algorithm to start with and post the code of an attempt to do what you have been asked.

There are other, simpler methods but they are terribly inefficient compared to boyer-moore.

Paul
0
 
LVL 16

Expert Comment

by:PaulCaswell
ID: 16412150
Hi Omer_85,

We have to be very careful here at EE about homework questions. With these we have to make sure you do the work and we help you. This doesnt sound like homework but it's simple enough to be homework. Could you help me by explaining why you need this and why you dont use some commercially available package or download some free source code to do this.

If it is homework, we will be glad to help but you need to do the work. The quickest way to start this process is to put together some code and post it here. We can then hopefully help you get it right.

Paul
0
 
LVL 23

Expert Comment

by:brettmjohnson
ID: 16412397
Boyer-Moore is fabulous for searching (especially with long search strings), but it is tedious to handle replace.  Boyer-Moore benefits from having large portions of the file in memory (preferably all of it).  Replacing substrings with another of differing length becomes slow and tedious in large memory buffers.  For an explanation, see the discussion of "the gap" in the emacs technical documentation.   Boyer-Moore also does not easily lend itself to case-insensitive, whole-word search.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 23

Expert Comment

by:brettmjohnson
ID: 16412452
Here is an idea:

Use fgets() to read lines from the file.
Duplicate the line (because strtok is destructive).
Set previousMatch pointer to point to beginning of duplicate line
Use strtok() to iterate over the line extracting individual words.
Use strcasecmp() to compare the extracted words with the list of "find" words.
If a match is found {
  Output the fragment of the saved line from the previousMatch to the match
  Output the "replace" word
  Set the previousMatch to the match + strlen(match) in the duplicate buffer
}
Don't forget to output the tail of the line once you hit the end of the line buffer
Note that if no "find" words appear in the line, the tail of the line is actually the whole line

This approach requires an understanding how strtok() parses the line, and how to use address arithmetic to calculate offsets and lengths in the saved line buffer for output.

0
 

Author Comment

by:Omer_85
ID: 16414655
Hi Paul,

No this is not a homework question. And yes if it was, I believe you should write your own code to learn.

This is why I am not using commercial software, I do wish to learn C a little more indepth, however at the moment I cannot spend weeks on code for what you call "simple". And going by what brettmjohnson wrote its not to easy to manipulate the Boyer-Moore code.

Basically I am doing some 'work experience' in I.T, mainly learning the ropes, more-so in networking and PC Service. I have been given this programming issue. Dont know why, I havent got any real programming experience, however they dont seem to care, and want it real soon. I am not well knowledged in programming let alone C. As for the free source code on the Internet, if you could tell me where to get this great...

I am new to this site, looks like a great resource center, and just thought the many experts would be able to help out a newbie..
0
 
LVL 24

Expert Comment

by:fridom
ID: 16415382
Ah yes, handling of char* arrays that's really not a strenght from C. However libraries or self written libraries to the help. Here we go with glib:
#include <stdlib.h>
#include <glib.h>
#include <glib/gprintf.h>


int main (void){
  gchar *file_name = "t1.txt";
  enum{BUF_SIZE=512, MAX_TOKENS=200};
  gchar *search_for = "my";
  gchar *replacement =  "Mei";
  gchar * current;

  char buf[BUF_SIZE];

  gchar ** splitted;

  FILE *pin = fopen(file_name, "r");
  if (NULL == pin){
    exit(EXIT_FAILURE);
  }
  while (NULL != fgets(buf, sizeof(buf), pin)){
    splitted = g_strsplit(buf, " ", MAX_TOKENS);
    for (int i = 0; splitted[i] != NULL; ++i){
      current = splitted[i];
      if (0 == g_ascii_strcasecmp(current, search_for)){
        g_printf("%s ", replacement);
      } else {
        g_printf("%s ", current);
      }
    }
    g_strfreev(splitted);
    buf[0] = '\0';
  }
  fclose(pin);
  return 0;
}
     
With this marvellous file:
bla      this or that my or may
End or start free or not
Out or in
away for now
sense this doesn't make
what's fud?
Should I go?

We get:
bla      this or that Mei or may
 End or start free or not
 Out or in
 away for now
 sense this doesn't make
 what's fud?
 Should I go?

C's now fine for string handling also ;-)

Of course I could have writen it a bit differently, but this code does what you like ;)

Regards
Friedrich

0
 

Author Comment

by:Omer_85
ID: 16415567
Hi Friedrich,

Thanks for your help, however what is the #include <glib.h> library?

I am trying to compile this in Microsoft Visual C, however I get the error (cannot open include file)

Is this a library I must download or make myself?

Also, can this code be changed to handle multiple changes to the file. (I.e. my to Mei, and Should to Can)

Sorry for the trouble.

**Point value increased**
0
 
LVL 24

Expert Comment

by:fridom
ID: 16416504
Of course you have to install it. You can get it from:
ftp://ftp.gtk.org/pub/gtk/v2.8/win32

Of course this can be changed to handle  multiple changes in the simplest case you put
the alternatives in an array and walk the array to find the replacements if there will be more you probably want to take a hash table.

This is also part of glib.

If you do not like any dependeny you can help yourself with a few extra functions. But sorry I'm not in the mood writing them for free.

Regards
Friedrich
0
 

Author Comment

by:Omer_85
ID: 16416775
There are many glib versions, which one is correct?

I will try this solution, however I think I am ment to have a stand alone program, using basic C functions and libraries. Without having to install any extra's, this is so it can work on many different machines with no need to install glib again. As I doubt they will give me permission to install glib on the UNIX sever.

I have been looking into the various open, read, lseek and write calls. They are standard system calls if I am correct, and may be better to use them.

I will try your solution Friedrich, the only problem seems to be that it requires extra install's.
0
 
LVL 24

Accepted Solution

by:
fridom earned 225 total points
ID: 16418241
Oh, I expected that argument. I do not know of any group so reluctant to add another library to their tool-box then the C people. What's going on with you? You can just go along and install this lib in you home directory. You then link you application statically and no one ever knows that you used glib. It's one library with tons of useful data-structures, but know people have to invent it all by themselves again. So here we go here's a Standard C solution I once, if it does not work you can always blame me ;-)

#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>

#ifndef TRUE
enum{FALSE,TRUE};
#endif

char* read_next_word (FILE *stream,
                      char *result,
                      int *act_size,
                      int *add_new_line){
  /* read next word from stream a word bounds are as defined for isspace
     that seem to be \n, \t ....
     result is adjusted and *act_size is the actual allocated space for
     result; add_new_line indicate whether word woth ended by \n tabs and simular are
     ignored */
 
  char *new_content = NULL;
  int i = 0;
  int c = 0;
  enum {
    BLANK = ' ',
    TAB='\t',
    NEWLINE='\n',
  };
 
 
  *add_new_line = FALSE;
  /* find beginning of word */
  while ((c = getc(stream))){
    if (c == EOF){
      return NULL;
    }
    if ((c != BLANK) || (c != TAB)){
      ungetc (c, stream);
      break;
    }
  }
   

   
  for (i=0; (c = getc(stream)) != EOF; i++){
    /* does it fit ? */
    if (i >= *act_size -1){
      *act_size *= 2; /* arbitrary but seems to be often used, */
      new_content = realloc(result, *act_size);
      if (!new_content){
        fprintf(stderr, "Run out of memory in read_next_word\n");
        *act_size /= 2; /* adjust size, that one does not write over
                           array bounds on another call */
        exit(EXIT_FAILURE); /* give up */
       
        return NULL; /* should not come  up to here */
      }
      result = new_content;
    }
    if (isspace(c)){
      /* must be at the end of a word,
         ATTENTION: Tabs a silently ignored and changed to one single BLANK */
      if (c == NEWLINE){
        /* one probably wishes having different lines in a file */
        *add_new_line = TRUE;
      }
      result[i] = '\0'; /* make it a proper string */
      return result;
    } else {
      result[i] = c;
    }
  } /* for */
  return result;
} /* read_next_word */


void write_out(char* word,
               char* word_to_replace,
               char* replacement,
               int new_line,
               FILE* fout){
  if (0 == strcmp(word, word_to_replace)){
    fputs(replacement, fout);
  } else {
    fputs(word, fout);
  }
  if (new_line){
    fputs("\n", fout);
  } else{
    fputs (" ", fout);
  }
}
   
void usage (char *program_name){
  fprintf(stdout, "call with %s <word-to-replace> <replacement> <open-file> <write-to-file>\n",
          program_name);
  exit (EXIT_FAILURE);
}


int main (int argc, char* argv[]){
  /* warning just base error handling is done, ugly things will happen, while
     file_to_open == file_to_write_to, so be prepared */
  FILE *fin, *fout;
  int size = 100;
  int new_line = FALSE;
  char *word = NULL;
  char *file_to_open, *file_to_write_to, *word_to_replace, *replacement;

  if (argc != 5){
    usage(argv[0]);
  }
  file_to_open = argv[3];
  file_to_write_to = argv[4];
  word_to_replace = argv[1];
  replacement = argv[2];
                   
  word = malloc (size);
  if (!word){
    fprintf(stderr, "Could not allocate memory in main\n");
    exit (EXIT_FAILURE);
  }


  fin = fopen(file_to_open, "r");

  fout = fopen(file_to_write_to, "w");

  if ((!fin) || (!fout)){
    fprintf(stderr, "something went wrong while trying to open files \n");
    exit (EXIT_FAILURE);
  }
 
    while(read_next_word(fin, word, &size, &new_line)){
      write_out(word, word_to_replace, replacement, new_line, fout);
    }
    /* don't forget this */
    fclose(fin);
    fclose(fout);
    free(word);
    return EXIT_SUCCESS;
}

Of course you have to check the arguments and act proper. e.g a test on stdout of the 4th argument may be  a very good idea.

I wonder what has driven me to that time implementing this stuff, what a waste of time

Regards
Friedrich
0
 

Author Comment

by:Omer_85
ID: 16425010
Hi again Friedrich,

sorry blame this on my stupidity, but...

I changed some of the code to:

  fin = fopen(file_to_open, "test.txt");  /*my input file: hello my name is omer*/

  fout = fopen(file_to_write_to, "test1.txt");  /*my output file: empty at the moment*/

so I would like to change my input file to write "hi my name is omer"


when I run this code it outputs:

<word-to-replace> <replacement> <open-file> <write-to-file>

What am I doing wrong...???

Thanks for your time..
0
 
LVL 24

Expert Comment

by:fridom
ID: 16425108
Oh, come one you can't be serious. Open test.txt with whatever you use for editing files
and write "hi my name is omer" into it. I expects that you give him exactly four command line options. So you have to change the test on the number of arguments.

Keep the stuff as is, and call it this way:
./prog_name "my" "Mei" test.txt test1.txt

Friedrich
0
 

Author Comment

by:Omer_85
ID: 16425811
Hi Friedrich,

Thanks for that, stupid me was using MS Visual Basic C++, and running it through there.

Tried it through unix platform, and it works...

Thanks for all your help, and putting up with a newbie..

I will be back to this site.. its a great help..

Regards,

Omer.
0
 
LVL 24

Expert Comment

by:fridom
ID: 16425893
You can give command line arguments to MSVC. See the project properties.
and this stuff will work on Windows also without any trouble

Regards
Friedrich
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Summary: This tutorial covers some basics of pointer, pointer arithmetic and function pointer. What is a pointer: A pointer is a variable which holds an address. This address might be address of another variable/address of devices/address of fu…
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to use…
The goal of this video is to provide viewers with basic examples to understand and use structures in the C programming language.
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question