strchr() questions...

Posted on 1998-12-15
Medium Priority
Last Modified: 2013-12-03
I want to use the strchr function to scan a document of text.  Problem is, I know how to use that function to scan an input or predefined string, but not a document.

Here's the program..
Right now, it basically looks like WordPad.  I need strchr() to scan each character in order to find a couple of "special" characters (defined in an array, that part is taken care of.)
However, I'm not sure how to get strchr to actually SCAN from a document like that...

I'm using MSVC++ 4.0 (or could use 5.0 if there's something easier that I don't know about).

I am *NOT* using MFC.

Please reply with any suggestions or questions about my question, if need be.  =)

Question by:ichor
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
  • 2
  • +1
LVL 14

Expert Comment

ID: 1417331
What sort of documents you must scan? strchr works only
untill 0x00. For scan all Documents, you must make it for
some times and end of loop after len = lenght of all Doc!

Author Comment

ID: 1417332
What is 0x00?

The documents I need scanned are going to be just like a simple word processing file (maybe a page or two long).  Would strchr() work for this?  Is there a better function built in, or would I be better off modifying a couple of functions?

Thanks for the help!
LVL 14

Expert Comment

ID: 1417333
I depends from Document. For example, QTEXT add 0(= 0x00) in
every line. If in you Doc no 0, you find sybol programm  
can be as:
char *szFile = LoadFile();
char *ptr = strchr(szFile, "..");
while (ptr)
 ptr = strchr(ptr+1, ".."
But if exist 0, must make test untill end of file.
BTW: for find substring better use strstr(with CASE).
Without case better make all file small(large) letter
and use strstr.
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!


Expert Comment

ID: 1417334
strchr is made to search for a char is a string.
Unless you load your file into a buffer and make it look like a string, you need a special function.

If your file is a texe file, There should not be any 0x00 in it. 0x00 is the special non printable NULL char used to mark the end of a string.

You know that with strchr you get a pointer on the found char in the string.
Since a file is not in memory you can't get a pointer on the found char. At best you could return the offset of the char in the file.

Solution1: If you really must use strchr
- get the file size
- allocate a big enough bloc to hold the full file content
- append a 0x00 at the end
- use strchr to look for the given char in bloc.

Solution2: If you need to search offset of first char occurence in a given file
- open file for read
- use getc and compare result with char.

source example:

/* returns offset to first occurence of character c in file f or -1 if c is not found in f */
long filechr( char* f, int c ){
   FILE* f;
   long p = -1;

   f  = fopen( filename, "r" );
   if( f == NULL ){
      puts( "Failed opening file" );
      exit( 0 );

   while( ! feof( f ) )
      if( fgetc( f ) == c ){
         p = ftell( p );
   fclose( f );
   return p;

LVL 22

Expert Comment

ID: 1417335
since A, the document might not have NUL terminators and B it might have imbedded NULs, the best idea would be to load it into a memory buffer (array and then use memchr() instead of strchr()).  memchr does not treat NULs (0s or 0x00s) specially.

Author Comment

ID: 1417336
In Solution 1 that you provide, it would probably work.  For the time being.  In the near future, the "text" file is going to support OLE, and that would, if I understand correctly, make the buffer for the entire file very large.

"open file for read
use getc and compare result with char."

That seems like a better thing to do.  However, I run into the same problem.  I'm not going to need to "open" any sort of document, as it will be in a RichEdit control already.  

Think of it like WordPad.  If you had just typed a letter to somebody in WordPad, the file wasn't saved, and the file was just sitting there, how would you use getc() to scan your document for particular characters?

Rejecting the answer seems so harsh, but accepting it doesn't help me...and I'll have the same problem (though somewhat fixed) using getc().  If you show me how to use that function for an untitled RichEdit control box, post it as an answer and I'll give ya the points.


LVL 22

Expert Comment

ID: 1417337
This depends on what your goal is.  For example, if you want, you could just make the richedit control do the search by sending it a search text message (EM_FINDTEXT).   Or you could obtain the RTF data and then do the search yourself with strchr() or memchr().

You're not giving us enough information.

Author Comment

ID: 1417338
Yikes...sorry about that.

Here's what I need to do:  

After the completed document is typed (not saved, just there, and open), I need to convert a couple of characters.  So, to start this out, I need to scan the document for those characters, and do the conversion.  (I think I can do the actual conversion pretty easily.)  But actually getting a variable with each character to be sent TO my conversion function is my problem.

It's like this.  Suppose you had the following lines:

The quick brown fox jumped over the lazy dog.  Now is the time for all good men and women to come to the aid of their country.

Let's say I wanted to convert all the "o's" to "x's".  That's what I'm attempting to do, but with an entire document (no larger than 2 pages.)  I just need someway to compare each character in the document to a set of "special" characters that I'll convert later.

Hope that helps a little.  =)
LVL 22

Expert Comment

ID: 1417339
since you don't want to concern yourself with formatting, I think the easiest way woul be to use the EM_FINDTEXT message to search for occurances of the character you want to change.  each time one is found, use EM_EXSETSEL to select the character and then use EM_REPLACESEL to replace the character with a new one.  You will have to doe this repeatidly.  That is, for each type of character you want to change, you will have to have a oop that repeatidly calls EM_FINDTEXT until there are no more occurances of that character, then the loop will end and you will begin a new loop with the next character you want to replace.

I think that will be best for you.

Author Comment

ID: 1417340
I left out a couple of things...

First, when I'm scanning the document, I need it to scan character by character, like I said...But it's not a Find/Replace thing.  When my switch statement finds the "special" characters, it's going to be relayed to another DialogBox with a simple EditControl (plain text.)  

And in the future, it will have to work with formatting (bolds, underlines, and colors...)  

I already have a find/replace option in the program, but that's not really what I'm looking for.  I need it to scan "real time" without having to rescan the code several times for each "special" character that may need to be converted.

Personally, I think the getc() function might be the best way to do it (that I've heard so far)...if only I knew how to have that function read from the main RichEdit window.
LVL 22

Expert Comment

ID: 1417341
getc() has nothing to do with richedit controls.

in what sense does it have to work with formatting?  Do you need to find bold?  what does that even mean?

You can use EM_GETSELTEXT to get back charaters (minus formatting) from the edit box, tha is the easiest way.  If you need to get back formatting information as well as characters, then you need to use EM_STREAMOUT (this can also be used to get the text without formatting and would be better then EM_GETSELTEXT, but a little more complex)

Author Comment

ID: 1417342
I know getc() has nothing to do with RichEdit.  The only reason I stated I was using RichEdit is just in case there's a control message that I'm unaware of that can do this.

Again, I'll try to explain what I'm attempting to do...

If you had just typed up a document in WordPad (not saved, no IO functions needed..)

Think if there was a "convert" button where WordPad would convert certain characters to other characters.  Text to HTML is a good example.  If you wanted to convert ' ' (space) to '&nbsp' (html space).  That's what I'm trying to do...not like a full HTML converter, because I don't need one...I have 12 characters (and formatting) I need to search in the document for, convert any and all of those 12 characters, and pop open another dialogbox with the entire document in that box, with all the "special" characters formatted.

I don't think EM_GETSELTEXT would work very well because in the MSVC++ documentation, it says that GETSELTEXT is used to "...retrieve the text from the current selection in [a richedit]l object."  While that does solve one problem (how I'm going to have the text read from the document into a "scan/search" function), it seems like it would be much simpler if I could just use something like...

getc ( RichEdit )  or
getc ( MyMainWindow )

or something along those lines...that's what I was really wondering when I first posted this question.

Thanks for all your help...


Accepted Solution

meessen earned 300 total points
ID: 1417343
ichor your were not telling us everything ;-)

Are you using MFC ?

In MFC you have a CRichEditText class with interresting methods otherwise look for the EM_xxx messages.

You can scan your text line by line using the following methods or messages

GetLine() method fills a buffer with the given line. (WIN32 message = EM_GETLINE )
Lines are numbered from 0 to n-1.

You can than scan the line using memchr as nietod suggested since the buffered line is not ended by a terminated 0x00 char.

To replace this char by a string you have to select the char and replace it with a new text string. See EM_EXSETSEL or SetSel() method in MFC.

This selecting method uses a char based index instead of a line based index.
You have to compute the char index of the char or string you want to replace.
You have to get the char index of the first char of the line to which you add the offset to the found char in the line. This yields the char index of the found char.
Use EM_LINEINDEX or LineIndex() to get the Char index of the line.

Now that you have selected you char/string you can replace it by the new string.
Use ReplaceSel() (in MFC) or EM_REPLACESEL (in WIN32) to replace the selection.

When looking for the next char/string occurence you have to keep track of the char index change resulting from the previous replacements in the line. Thus when replacing stringA (which may be one char long) by stringB compute Delta += strlen(stringB) - strlen(stringA); For each line scan you initialize Delta to zero.
Thus when you find a new char occurence to replace add Delta to it's index to get it's real char index taking in account any previous changes in the line. Of course you update Delta so the next change will properly update the char index.

For your specific application I would suggest the following char-replacement strategy since memchr is not the most appropriate.

Suppose you have a big set of char to search you could use a table of 256 entries corresponding to each found char. For char whoes ascii code is i, you would find in table[i] all information relevant for char i. For instance a replacement string, the delta, and even function pointer for special handlings. So for each char you look in the table if there is a replacement string. If no, skip, otherwise replace, etc...
In this case you examine each char only one and in a constant get the appropriate required processing. Faster then a switch. Beside you could define new replacements dynamicaly by a nice dialog.

Note that this works well only for ascii strings (one byte wide chars). For unicode strings, you need a smarter strategy. For string searching you also need a smarter strategy.


Author Comment

ID: 1417344
Okay.  Since I posted the question, I've encountered several ways to do what I'm trying to do...although none of them do exactly what I want them to do.  I consulted the local Senior Programmer, and he proceeded to inform me that I'd have to make an entirely new function to do what I'm actually trying to do.

So, for everyone, thanks for trying to help...I'm going to rethink my strategy and may have more questions later.  =)


Featured Post

 [eBook] Windows Nano Server

Download this FREE eBook and learn all you need to get started with Windows Nano Server, including deployment options, remote management
and troubleshooting tips and tricks

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

With most software applications trying to cater to multiple user needs nowadays, the focus is to make them as configurable as possible. For e.g., when creating Silverlight applications which will connect to WCF services, the service end point usuall…
Entering time in Microsoft Access can be difficult. An input mask often bothers users more than helping them and won't catch all typing errors. This article shows how to create a textbox for 24-hour time input with full validation politely catching …
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question