Link to home
Start Free TrialLog in
Avatar of Omer_85
Omer_85

asked on

String search and Replace from a text file

Hi Experts,

Basically C just gets me crazy.

I need a complete C program which can open a txt file, search and change for example words:

'Me' to 'You'
'There' to 'Their'

These words will be hard coded. Changed file will be displayed, not really needed to be saved to a new file.

I know in C++ its basically:
String.Replace("oldval","newval");

But like I said C gets me.

Can any expert tackle this problem?
Avatar of PaulCaswell
PaulCaswell
Flag of United Kingdom of Great Britain and Northern Ireland image

Hi Omer_85,

>>Can any expert tackle this problem?
We can help you do it but we cant do it for you.

I would suggest you search for the 'Boyer Moore' algorithm to start with and post the code of an attempt to do what you have been asked.

There are other, simpler methods but they are terribly inefficient compared to boyer-moore.

Paul
Hi Omer_85,

We have to be very careful here at EE about homework questions. With these we have to make sure you do the work and we help you. This doesnt sound like homework but it's simple enough to be homework. Could you help me by explaining why you need this and why you dont use some commercially available package or download some free source code to do this.

If it is homework, we will be glad to help but you need to do the work. The quickest way to start this process is to put together some code and post it here. We can then hopefully help you get it right.

Paul
Boyer-Moore is fabulous for searching (especially with long search strings), but it is tedious to handle replace.  Boyer-Moore benefits from having large portions of the file in memory (preferably all of it).  Replacing substrings with another of differing length becomes slow and tedious in large memory buffers.  For an explanation, see the discussion of "the gap" in the emacs technical documentation.   Boyer-Moore also does not easily lend itself to case-insensitive, whole-word search.
Here is an idea:

Use fgets() to read lines from the file.
Duplicate the line (because strtok is destructive).
Set previousMatch pointer to point to beginning of duplicate line
Use strtok() to iterate over the line extracting individual words.
Use strcasecmp() to compare the extracted words with the list of "find" words.
If a match is found {
  Output the fragment of the saved line from the previousMatch to the match
  Output the "replace" word
  Set the previousMatch to the match + strlen(match) in the duplicate buffer
}
Don't forget to output the tail of the line once you hit the end of the line buffer
Note that if no "find" words appear in the line, the tail of the line is actually the whole line

This approach requires an understanding how strtok() parses the line, and how to use address arithmetic to calculate offsets and lengths in the saved line buffer for output.

Avatar of Omer_85
Omer_85

ASKER

Hi Paul,

No this is not a homework question. And yes if it was, I believe you should write your own code to learn.

This is why I am not using commercial software, I do wish to learn C a little more indepth, however at the moment I cannot spend weeks on code for what you call "simple". And going by what brettmjohnson wrote its not to easy to manipulate the Boyer-Moore code.

Basically I am doing some 'work experience' in I.T, mainly learning the ropes, more-so in networking and PC Service. I have been given this programming issue. Dont know why, I havent got any real programming experience, however they dont seem to care, and want it real soon. I am not well knowledged in programming let alone C. As for the free source code on the Internet, if you could tell me where to get this great...

I am new to this site, looks like a great resource center, and just thought the many experts would be able to help out a newbie..
Avatar of F. Dominicus
Ah yes, handling of char* arrays that's really not a strenght from C. However libraries or self written libraries to the help. Here we go with glib:
#include <stdlib.h>
#include <glib.h>
#include <glib/gprintf.h>


int main (void){
  gchar *file_name = "t1.txt";
  enum{BUF_SIZE=512, MAX_TOKENS=200};
  gchar *search_for = "my";
  gchar *replacement =  "Mei";
  gchar * current;

  char buf[BUF_SIZE];

  gchar ** splitted;

  FILE *pin = fopen(file_name, "r");
  if (NULL == pin){
    exit(EXIT_FAILURE);
  }
  while (NULL != fgets(buf, sizeof(buf), pin)){
    splitted = g_strsplit(buf, " ", MAX_TOKENS);
    for (int i = 0; splitted[i] != NULL; ++i){
      current = splitted[i];
      if (0 == g_ascii_strcasecmp(current, search_for)){
        g_printf("%s ", replacement);
      } else {
        g_printf("%s ", current);
      }
    }
    g_strfreev(splitted);
    buf[0] = '\0';
  }
  fclose(pin);
  return 0;
}
     
With this marvellous file:
bla      this or that my or may
End or start free or not
Out or in
away for now
sense this doesn't make
what's fud?
Should I go?

We get:
bla      this or that Mei or may
 End or start free or not
 Out or in
 away for now
 sense this doesn't make
 what's fud?
 Should I go?

C's now fine for string handling also ;-)

Of course I could have writen it a bit differently, but this code does what you like ;)

Regards
Friedrich

Avatar of Omer_85

ASKER

Hi Friedrich,

Thanks for your help, however what is the #include <glib.h> library?

I am trying to compile this in Microsoft Visual C, however I get the error (cannot open include file)

Is this a library I must download or make myself?

Also, can this code be changed to handle multiple changes to the file. (I.e. my to Mei, and Should to Can)

Sorry for the trouble.

**Point value increased**
Of course you have to install it. You can get it from:
ftp://ftp.gtk.org/pub/gtk/v2.8/win32

Of course this can be changed to handle  multiple changes in the simplest case you put
the alternatives in an array and walk the array to find the replacements if there will be more you probably want to take a hash table.

This is also part of glib.

If you do not like any dependeny you can help yourself with a few extra functions. But sorry I'm not in the mood writing them for free.

Regards
Friedrich
Avatar of Omer_85

ASKER

There are many glib versions, which one is correct?

I will try this solution, however I think I am ment to have a stand alone program, using basic C functions and libraries. Without having to install any extra's, this is so it can work on many different machines with no need to install glib again. As I doubt they will give me permission to install glib on the UNIX sever.

I have been looking into the various open, read, lseek and write calls. They are standard system calls if I am correct, and may be better to use them.

I will try your solution Friedrich, the only problem seems to be that it requires extra install's.
ASKER CERTIFIED SOLUTION
Avatar of F. Dominicus
F. Dominicus
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Omer_85

ASKER

Hi again Friedrich,

sorry blame this on my stupidity, but...

I changed some of the code to:

  fin = fopen(file_to_open, "test.txt");  /*my input file: hello my name is omer*/

  fout = fopen(file_to_write_to, "test1.txt");  /*my output file: empty at the moment*/

so I would like to change my input file to write "hi my name is omer"


when I run this code it outputs:

<word-to-replace> <replacement> <open-file> <write-to-file>

What am I doing wrong...???

Thanks for your time..
Oh, come one you can't be serious. Open test.txt with whatever you use for editing files
and write "hi my name is omer" into it. I expects that you give him exactly four command line options. So you have to change the test on the number of arguments.

Keep the stuff as is, and call it this way:
./prog_name "my" "Mei" test.txt test1.txt

Friedrich
Avatar of Omer_85

ASKER

Hi Friedrich,

Thanks for that, stupid me was using MS Visual Basic C++, and running it through there.

Tried it through unix platform, and it works...

Thanks for all your help, and putting up with a newbie..

I will be back to this site.. its a great help..

Regards,

Omer.
You can give command line arguments to MSVC. See the project properties.
and this stuff will work on Windows also without any trouble

Regards
Friedrich