Avatar of rossryan
rossryan

asked on 

Text documents (grabbing the first few sentences)

Aright, I'm going to hell for asking this question (wait for it): how can I grab the first xxx about of characters from a text (I know this one), doc, or rtf file?
C#

Avatar of undefined
Last Comment
dfiala13
Avatar of dfiala13
dfiala13

You doing this on a server?  Or can you use Word automation?
Avatar of coltrane2003
coltrane2003

Once you get the file contents read into a string you can use Substring. No offense if you already knew this much. Substring takes two arguments - the index of the first character and the index of the last character. You could do something like

aStringFormofsomeWorddoc.Substring(0,3);

Hope that helps
Avatar of rossryan
rossryan

ASKER

It's the getting the file contents (particulary word documents) into the string that I am interested in ;).

What is word automation (api?).

I need to programmatically grab the first few sentences...i.e. I need an API or code that grabs it (for use in a program).
Avatar of coltrane2003
coltrane2003

Word along with many other MS office products has a means of writing scripts. This is called VBA Visual Basic for applications. It's under tools/macro/visual basic editor. Perhaps this is what is meant by API?

Have you tried reading a .doc file with standard IO?

I've got to run now. If you haven't figured this out by the time I get back, I will try to help. Good luck.
Avatar of dfiala13
dfiala13

>>It's the getting the file contents (particulary word documents) into the string that I am interested in ;).
That's why I asked.  If you are running a Winforms app, you can use the Word automation objects to open and manipulate a document.  Using the Word objects it is relatively trivial to grab the the first X characters.
http://msdn.microsoft.com/vstudio/office/default.aspx?pull=/library/en-us/odc_vsto2003_ta/html/wordobject.asp

If however you are ripping through lots of documents being uploaded to a server, this might not be the best option,
Avatar of coltrane2003
coltrane2003

Yes I agree with the above. It depends on what kind of solution you are trying to provide. A couple years back I worked with a content managment system that published web templates from Word. It worked with a VBA script and validated documents to be published against a DTD. I didn't build it so I'm afraid I don't know much about it.
Avatar of rossryan
rossryan

ASKER

Right. Now, does this work with Office XP, or shall I install 2003? (I've been hoping to hold off on that one).
ASKER CERTIFIED SOLUTION
Avatar of dfiala13
dfiala13

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
C#
C#

C# is an object-oriented programming language created in conjunction with Microsoft’s .NET framework. Compilation is usually done into the Microsoft Intermediate Language (MSIL), which is then JIT-compiled to native code (and cached) during execution in the Common Language Runtime (CLR).

98K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo