?
Solved

TXT file searching with c programming language

Posted on 2008-11-12
11
Medium Priority
?
653 Views
Last Modified: 2013-12-14
I have a programming issue. I am writing a program in c that is suppose to search a file of characters for a certain sequence of characters. The file that i am searching has carriage returns in it which I am suppose to ignore meaning the sequence can span lines. Also the first line of text in the file I am searching is suppose to be ignored. So I search everything under the first line of text.

The sequence I am looking for is brought in from another file.  Attached below is what I have so far which is not working. Any help?

void searchgene (char* filename,char* genefilename)
 
{
 char ch, gch;
 int startsearching = 0;
 int matchstart = 0, index = 0, newline_num=0, gene_index=1;
 FILE *file = fopen(filename, "r");
 FILE *gfile = fopen(genefilename, "r");
  
 if (file && gfile)
 {
    gch = fgetc(gfile);
    while ( (ch = fgetc(file)) != EOF ) 
    {
 
       if(ch == '\n')
       {
             startsearching = 1;
             if (matchstart > 0){
                newline_num ++;
             }
       }
       else if (startsearching == 1)
       { 
            
        index++;  
        if(gch == ch)
        {
          if (matchstart == 0)
              matchstart = index;           
          
          gch = fgetc(gfile);
          gene_index ++;
          if (gch == EOF )
          {
            printf("Match found at: %d\n",matchstart);
            matchstart = 0;
            rewind (gfile);
            gch = fgetc(gfile);
            gene_index = 1;
            newline_num = 0;
                   
          } 
        }
        else
        {
           if (matchstart > 0)
           {              index = 1+index - gene_index - newline_num;
              matchstart = 0;
              fseek(file, - gene_index - newline_num + 1, SEEK_CUR);
              rewind (gfile);
              gch = fgetc(gfile);
              gene_index = 1;
              newline_num = 0;
           }
           
            
        }     
       }    
    }  
fclose(file);  
fclose(gfile);
 
 }
 
}

Open in new window

0
Comment
Question by:bbcac
  • 4
  • 3
  • 2
9 Comments
 
LVL 11

Expert Comment

by:spoxox
ID: 22947024
I think the approach I would take is this:

1) get searchString

2) read and discard first record in targetFile

3) read tfc = targetFile character by character, comparing tfc with searchString first character.

4) When a match is found in step 3, execute matching subroutine: compare next characters until all of searchString is matched (success) or match fails. [Note: next character for targetFile excludes \n] [Note: remember second occurrence of searchString initial character, and matched targetFile characters, for this situation:

searchString = "hello"
targetFile = "hehello"


Is that helpful?
0
 

Author Comment

by:bbcac
ID: 22947116
I'm sort of new to c.... I don't really understand what you are saying. My main issue is traversing the file. It seems that FSEEK and FTELL are acting as expected (probably my fault). Can you expand a bit?
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 22948078
>> It seems that FSEEK and FTELL are acting as expected (probably my fault).

Depending on how you use them, they will not be good for text files. You don't need them anyway. You can read a line of text using fgets :

        http://www.cplusplus.com/reference/clibrary/cstdio/fgets.html
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 11

Accepted Solution

by:
spoxox earned 1000 total points
ID: 22950530
Unless this is an assignment that requires using FSEEK and FTELL, I would avoid them.

Get SearchString:
   see above reference: mystring is the searchString; mystring[0] is the first character of the string you'll search for.

Go through targetText (gene?):
Choice 1:
- loop through character by character using
gch = fgetc
as you are doing.
- when gch = mystring[0], you have a potential match - check it.

Choice 2:
- read the entire text file into a character array
- delete the carriage return characters from the array (fileText[n] = fileText[n+1])
- use a built-in function (strstr) to search the array

 
0
 

Author Comment

by:bbcac
ID: 22954169
Your solutions make sense, but I would like to figure out where my program doesn't work. It seems that after the fseek then things get mixed up

I put this line in the above code at line 25
       printf("\ngch: %c    ftell=%d ", ch, ftell(file));
       getch();

This way I can see what char the loop is pulling and its corresponding ftell result. When I run the program it shows right until the first time it uses fseek. Notice below that it shows the ftell having two different values at 115. Between these an FSEEK happens.

gch: C    ftell=112
gch: C    ftell=113
gch: G    ftell=114
gch: C    ftell=115
gch: T    ftell=115
gch: A    ftell=116
gch: T    ftell=117
0
 
LVL 53

Assisted Solution

by:Infinity08
Infinity08 earned 1000 total points
ID: 22954365
For text files, the only valid use of fseek is by passing it a value returned earlier by ftell (for the same opened file). Everything else, you can not depend on.

For example :

>>               fseek(file, - gene_index - newline_num + 1, SEEK_CUR);

you cannot do this and hope to get a reliable result.


But as I remarked before : you don't need fseek at all. So, why bother ?
0
 
LVL 11

Expert Comment

by:spoxox
ID: 22955572
Should have made clear earlier:

From a processing standpoint, the greatest efficiencies are achieved by minimizing file operations. Straightforward reads from the file (fgetc, fgets, fscanf, ...) cost less CPU than other file operations.

For this problem, nothing more complicated than straightforward reads is necessary. Using the other routines adds complications and slows operation. Execution speed might not matter much for this assignment, but it's a good idea to get in the habit of making decisions based on such practical considerations.
0
 
LVL 11

Expert Comment

by:spoxox
ID: 22960579
Feels like you're not happy yet! Is the use of fseek/ftell required?


0
 
LVL 53

Expert Comment

by:Infinity08
ID: 25279706
Based on the author's post http:#22954169, I'd say this question has been answered :

The author claims that spoxox's post http:#22950530 makes sense, indicating that it has helped in understanding an alternative approach.
The author also says he'd like to know why his code didn't work, which is what I explained in http:#22954365.

I'd say that those two posts together answer the question.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
Examines three attack vectors, specifically, the different types of malware used in malicious attacks, web application attacks, and finally, network based attacks.  Concludes by examining the means of securing and protecting critical systems and inf…
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.
The viewer will learn how to use and create keystrokes in Netbeans IDE 8.0 for Windows.

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question