asked on

strchr() questions...

I want to use the strchr function to scan a document of text. Problem is, I know how to use that function to scan an input or predefined string, but not a document.

Here's the program..
Right now, it basically looks like WordPad. I need strchr() to scan each character in order to find a couple of "special" characters (defined in an array, that part is taken care of.)
However, I'm not sure how to get strchr to actually SCAN from a document like that...

I'm using MSVC++ 4.0 (or could use 5.0 if there's something easier that I don't know about).

I am *NOT* using MFC.

Please reply with any suggestions or questions about my question, if need be. =)

Thanks.

AlexVirochovsky

What sort of documents you must scan? strchr works only
untill 0x00. For scan all Documents, you must make it for
some times and end of loop after len = lenght of all Doc!

ichor

ASKER

What is 0x00?

The documents I need scanned are going to be just like a simple word processing file (maybe a page or two long). Would strchr() work for this? Is there a better function built in, or would I be better off modifying a couple of functions?

Thanks for the help!

AlexVirochovsky

I depends from Document. For example, QTEXT add 0(= 0x00) in
every line. If in you Doc no 0, you find sybol programm
can be as:
char *szFile = LoadFile();
char *ptr = strchr(szFile, "..");
while (ptr)
{
...
ptr = strchr(ptr+1, ".."
}
But if exist 0, must make test untill end of file.
BTW: for find substring better use strstr(with CASE).
Without case better make all file small(large) letter
and use strstr.
Alex
}

meessen

strchr is made to search for a char is a string.
Unless you load your file into a buffer and make it look like a string, you need a special function.

If your file is a texe file, There should not be any 0x00 in it. 0x00 is the special non printable NULL char used to mark the end of a string.

You know that with strchr you get a pointer on the found char in the string.
Since a file is not in memory you can't get a pointer on the found char. At best you could return the offset of the char in the file.

Solution1: If you really must use strchr
--------------------------------------------------------
- get the file size
- allocate a big enough bloc to hold the full file content
- append a 0x00 at the end
- use strchr to look for the given char in bloc.

Solution2: If you need to search offset of first char occurence in a given file
-----------------------------------------------------------------------------------------------------------
- open file for read
- use getc and compare result with char.

source example:

/* returns offset to first occurence of character c in file f or -1 if c is not found in f */
long filechr( char* f, int c ){
FILE* f;
long p = -1;

f = fopen( filename, "r" );
if( f == NULL ){
puts( "Failed opening file" );
exit( 0 );
}

while( ! feof( f ) )
if( fgetc( f ) == c ){
p = ftell( p );
break;
}

fclose( f );
return p;
}

nietod

since A, the document might not have NUL terminators and B it might have imbedded NULs, the best idea would be to load it into a memory buffer (array and then use memchr() instead of strchr()). memchr does not treat NULs (0s or 0x00s) specially.

ichor

ASKER

In Solution 1 that you provide, it would probably work. For the time being. In the near future, the "text" file is going to support OLE, and that would, if I understand correctly, make the buffer for the entire file very large.

"open file for read
use getc and compare result with char."

That seems like a better thing to do. However, I run into the same problem. I'm not going to need to "open" any sort of document, as it will be in a RichEdit control already.

Think of it like WordPad. If you had just typed a letter to somebody in WordPad, the file wasn't saved, and the file was just sitting there, how would you use getc() to scan your document for particular characters?

Rejecting the answer seems so harsh, but accepting it doesn't help me...and I'll have the same problem (though somewhat fixed) using getc(). If you show me how to use that function for an untitled RichEdit control box, post it as an answer and I'll give ya the points.

Thanks!

nietod

This depends on what your goal is. For example, if you want, you could just make the richedit control do the search by sending it a search text message (EM_FINDTEXT). Or you could obtain the RTF data and then do the search yourself with strchr() or memchr().

You're not giving us enough information.

ichor

ASKER

Yikes...sorry about that.

Here's what I need to do:

After the completed document is typed (not saved, just there, and open), I need to convert a couple of characters. So, to start this out, I need to scan the document for those characters, and do the conversion. (I think I can do the actual conversion pretty easily.) But actually getting a variable with each character to be sent TO my conversion function is my problem.

It's like this. Suppose you had the following lines:

The quick brown fox jumped over the lazy dog. Now is the time for all good men and women to come to the aid of their country.

Let's say I wanted to convert all the "o's" to "x's". That's what I'm attempting to do, but with an entire document (no larger than 2 pages.) I just need someway to compare each character in the document to a set of "special" characters that I'll convert later.

Hope that helps a little. =)

nietod

since you don't want to concern yourself with formatting, I think the easiest way woul be to use the EM_FINDTEXT message to search for occurances of the character you want to change. each time one is found, use EM_EXSETSEL to select the character and then use EM_REPLACESEL to replace the character with a new one. You will have to doe this repeatidly. That is, for each type of character you want to change, you will have to have a oop that repeatidly calls EM_FINDTEXT until there are no more occurances of that character, then the loop will end and you will begin a new loop with the next character you want to replace.

I think that will be best for you.

ichor

ASKER

I left out a couple of things...

First, when I'm scanning the document, I need it to scan character by character, like I said...But it's not a Find/Replace thing. When my switch statement finds the "special" characters, it's going to be relayed to another DialogBox with a simple EditControl (plain text.)

And in the future, it will have to work with formatting (bolds, underlines, and colors...)

I already have a find/replace option in the program, but that's not really what I'm looking for. I need it to scan "real time" without having to rescan the code several times for each "special" character that may need to be converted.

Personally, I think the getc() function might be the best way to do it (that I've heard so far)...if only I knew how to have that function read from the main RichEdit window.

nietod

getc() has nothing to do with richedit controls.

in what sense does it have to work with formatting? Do you need to find bold? what does that even mean?

You can use EM_GETSELTEXT to get back charaters (minus formatting) from the edit box, tha is the easiest way. If you need to get back formatting information as well as characters, then you need to use EM_STREAMOUT (this can also be used to get the text without formatting and would be better then EM_GETSELTEXT, but a little more complex)

ichor

ASKER

I know getc() has nothing to do with RichEdit. The only reason I stated I was using RichEdit is just in case there's a control message that I'm unaware of that can do this.

Again, I'll try to explain what I'm attempting to do...

If you had just typed up a document in WordPad (not saved, no IO functions needed..)

Think if there was a "convert" button where WordPad would convert certain characters to other characters. Text to HTML is a good example. If you wanted to convert ' ' (space) to '&nbsp' (html space). That's what I'm trying to do...not like a full HTML converter, because I don't need one...I have 12 characters (and formatting) I need to search in the document for, convert any and all of those 12 characters, and pop open another dialogbox with the entire document in that box, with all the "special" characters formatted.

I don't think EM_GETSELTEXT would work very well because in the MSVC++ documentation, it says that GETSELTEXT is used to "...retrieve the text from the current selection in [a richedit]l object." While that does solve one problem (how I'm going to have the text read from the document into a "scan/search" function), it seems like it would be much simpler if I could just use something like...

getc ( RichEdit ) or
getc ( MyMainWindow )

or something along those lines...that's what I was really wondering when I first posted this question.

Thanks for all your help...

ASKER CERTIFIED SOLUTION

meessen

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ichor

ASKER

Okay. Since I posted the question, I've encountered several ways to do what I'm trying to do...although none of them do exactly what I want them to do. I consulted the local Senior Programmer, and he proceeded to inform me that I'd have to make an entirely new function to do what I'm actually trying to do.

So, for everyone, thanks for trying to help...I'm going to rethink my strategy and may have more questions later. =)