Solved

Rich Edit - extract formatted text

Posted on 2001-07-09
11
506 Views
Last Modified: 2013-12-03
I need to be able to get the contents of a rich edit control.  specifically i need all of the text and to know where any bold and/or underlined text starts and ends.  I can use EM_STREAMOUT to get contents, but how do I just parse out text, bold, and underline data?
Thank You
0
Comment
Question by:marvinm
  • 4
  • 3
  • 2
  • +1
11 Comments
 
LVL 5

Expert Comment

by:robpitt
ID: 6265408
I think you'll have to write a simple RTF to plain text parser that ignores all RTF control sequences apart from bold and underline.

0
 
LVL 1

Author Comment

by:marvinm
ID: 6265433
Do you have any examples of doing this? (non-MFC)
Thank You
0
 
LVL 14

Accepted Solution

by:
AlexVirochovsky earned 200 total points
ID: 6265454
You need to use EM_GETCHARFORMAT message
and CHARFORMAT structure

The EM_GETCHARFORMAT message determines the current character formatting in a rich edit control.

EM_GETCHARFORMAT
wParam = (WPARAM) (BOOL) fSelection;
lParam = (LPARAM) (CHARFORMAT FAR *) lpFmt;

In CHARFORMAT structure dwMask field may has( or not has)
CFM_BOLD/CFM_UNDERLINE attribute.
0
 
LVL 1

Author Comment

by:marvinm
ID: 6265491
Do I need to set the selection to one character and do this for each character position?
Thank You
0
 
LVL 5

Expert Comment

by:robpitt
ID: 6266076
Whilst you could call EM_GETCHARFORMAT iteratively for each character this would obviously not be very efficent. Then again it would work and saving you writing a lot of code.


If you wanted to parse the RTF data, then you should read the section of MSDN entitled "Rich Text Format (RTF) Specification, version 1.6".
If you don't have MSDN, I can mail you the section.


Try opening an RTF file in a text editor you'll see the format is very simple. Its a header followed by text with control codes embeded.

All control codes start with a "\".
Bold on/off is "\b" and "\b0" respectively.
Underline on/off is "\ul" and "\ulnone" respectively.
All control codes sequences terminate with a space " ".
The \ character is represented by "\\"

Rob
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 14

Expert Comment

by:AlexVirochovsky
ID: 6267813
I don't recommend you make RTF parser. I 've made
this and it is read headpain.
>> Do I need to set the selection to one character and do this for each character position?

I don't see something wrong in this (may be a bit slow, but can make in 10 min, parse is MINIMUM 2 weeks of job,
and after that you find text, that use some added tags and
all again...
  But:  RTF doc
http://www.wotsit.org/wtext/rtf15.zip (Rich Text File Format v1.5)
http://www.wotsit.org/wtext/rtfadd97.zip (Word 97 Addendum)
0
 
LVL 5

Expert Comment

by:robpitt
ID: 6268324
It really depends on exactly what Marvin wants to do.

If he has text in a control and then needs to establish the formating of a particular character then yes EM_GETCHARFORMAT is the thing for him.

Using EM_GETCHARFORMAT for a more complex situation may work or it may be prohibitively exepensive in terms of performance.

Writing a full rtf parser would indeed be a task and a half but remember, he only needs to extract bold & underline information.

Rob
0
 

Expert Comment

by:chandas
ID: 6268961
Rob, would he really have to call EM_GETCHARFORMAT for each letter? I think maybe he could select a word at a time using GetSel(...). Maybe he could just look for space characters to break at each word and then select a word at a time.

Just my $0.02

Chandas
0
 
LVL 1

Author Comment

by:marvinm
ID: 6270783
This should be the easiest way to accomplish my goal.  There will not be much text to get, so efficiency is not a major concern.
Thanks to all - mm
0
 

Expert Comment

by:chandas
ID: 6276763
It would help if you awarded points for the comment that helped you out
0
 
LVL 1

Author Comment

by:marvinm
ID: 6276834
I DID!  AlexVirochovsky suggested using EM_GETCHARFORMAT which is easier than writing a RTF parser.  robpitt's comments were informative, and certainly the more thourough approach, but that is not the direction I am taking.  I have not implemented this yet as I have been pulled into a different project, but this will be my approach when I return to it.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

This article describes how to programmatically preset the "Pages per Sheet" option that's available with most printer drivers.   This setting lets you do "n-Up" printing, where two, four, or more pages are printed on each sheet of paper. If your …
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now