TXT file searching with c programming language

I have a programming issue. I am writing a program in c that is suppose to search a file of characters for a certain sequence of characters. The file that i am searching has carriage returns in it which I am suppose to ignore meaning the sequence can span lines. Also the first line of text in the file I am searching is suppose to be ignored. So I search everything under the first line of text.

The sequence I am looking for is brought in from another file.  Attached below is what I have so far which is not working. Any help?

void searchgene (char* filename,char* genefilename)
 
{
 char ch, gch;
 int startsearching = 0;
 int matchstart = 0, index = 0, newline_num=0, gene_index=1;
 FILE *file = fopen(filename, "r");
 FILE *gfile = fopen(genefilename, "r");
  
 if (file && gfile)
 {
    gch = fgetc(gfile);
    while ( (ch = fgetc(file)) != EOF ) 
    {
 
       if(ch == '\n')
       {
             startsearching = 1;
             if (matchstart > 0){
                newline_num ++;
             }
       }
       else if (startsearching == 1)
       { 
            
        index++;  
        if(gch == ch)
        {
          if (matchstart == 0)
              matchstart = index;           
          
          gch = fgetc(gfile);
          gene_index ++;
          if (gch == EOF )
          {
            printf("Match found at: %d\n",matchstart);
            matchstart = 0;
            rewind (gfile);
            gch = fgetc(gfile);
            gene_index = 1;
            newline_num = 0;
                   
          } 
        }
        else
        {
           if (matchstart > 0)
           {              index = 1+index - gene_index - newline_num;
              matchstart = 0;
              fseek(file, - gene_index - newline_num + 1, SEEK_CUR);
              rewind (gfile);
              gch = fgetc(gfile);
              gene_index = 1;
              newline_num = 0;
           }
           
            
        }     
       }    
    }  
fclose(file);  
fclose(gfile);
 
 }
 
}

Open in new window

bbcacAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

spoxoxCommented:
I think the approach I would take is this:

1) get searchString

2) read and discard first record in targetFile

3) read tfc = targetFile character by character, comparing tfc with searchString first character.

4) When a match is found in step 3, execute matching subroutine: compare next characters until all of searchString is matched (success) or match fails. [Note: next character for targetFile excludes \n] [Note: remember second occurrence of searchString initial character, and matched targetFile characters, for this situation:

searchString = "hello"
targetFile = "hehello"


Is that helpful?
0
bbcacAuthor Commented:
I'm sort of new to c.... I don't really understand what you are saying. My main issue is traversing the file. It seems that FSEEK and FTELL are acting as expected (probably my fault). Can you expand a bit?
0
Infinity08Commented:
>> It seems that FSEEK and FTELL are acting as expected (probably my fault).

Depending on how you use them, they will not be good for text files. You don't need them anyway. You can read a line of text using fgets :

        http://www.cplusplus.com/reference/clibrary/cstdio/fgets.html
0
HTML5 and CSS3 Fundamentals

Build a website from the ground up by first learning the fundamentals of HTML5 and CSS3, the two popular programming languages used to present content online. HTML deals with fonts, colors, graphics, and hyperlinks, while CSS describes how HTML elements are to be displayed.

spoxoxCommented:
Unless this is an assignment that requires using FSEEK and FTELL, I would avoid them.

Get SearchString:
   see above reference: mystring is the searchString; mystring[0] is the first character of the string you'll search for.

Go through targetText (gene?):
Choice 1:
- loop through character by character using
gch = fgetc
as you are doing.
- when gch = mystring[0], you have a potential match - check it.

Choice 2:
- read the entire text file into a character array
- delete the carriage return characters from the array (fileText[n] = fileText[n+1])
- use a built-in function (strstr) to search the array

 
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
bbcacAuthor Commented:
Your solutions make sense, but I would like to figure out where my program doesn't work. It seems that after the fseek then things get mixed up

I put this line in the above code at line 25
       printf("\ngch: %c    ftell=%d ", ch, ftell(file));
       getch();

This way I can see what char the loop is pulling and its corresponding ftell result. When I run the program it shows right until the first time it uses fseek. Notice below that it shows the ftell having two different values at 115. Between these an FSEEK happens.

gch: C    ftell=112
gch: C    ftell=113
gch: G    ftell=114
gch: C    ftell=115
gch: T    ftell=115
gch: A    ftell=116
gch: T    ftell=117
0
Infinity08Commented:
For text files, the only valid use of fseek is by passing it a value returned earlier by ftell (for the same opened file). Everything else, you can not depend on.

For example :

>>               fseek(file, - gene_index - newline_num + 1, SEEK_CUR);

you cannot do this and hope to get a reliable result.


But as I remarked before : you don't need fseek at all. So, why bother ?
0
spoxoxCommented:
Should have made clear earlier:

From a processing standpoint, the greatest efficiencies are achieved by minimizing file operations. Straightforward reads from the file (fgetc, fgets, fscanf, ...) cost less CPU than other file operations.

For this problem, nothing more complicated than straightforward reads is necessary. Using the other routines adds complications and slows operation. Execution speed might not matter much for this assignment, but it's a good idea to get in the habit of making decisions based on such practical considerations.
0
spoxoxCommented:
Feels like you're not happy yet! Is the use of fseek/ftell required?


0
Infinity08Commented:
Based on the author's post http:#22954169, I'd say this question has been answered :

The author claims that spoxox's post http:#22950530 makes sense, indicating that it has helped in understanding an alternative approach.
The author also says he'd like to know why his code didn't work, which is what I explained in http:#22954365.

I'd say that those two posts together answer the question.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Editors IDEs

From novice to tech pro — start learning today.