?
Solved

Extracting set amount of words.

Posted on 2003-03-30
7
Medium Priority
?
180 Views
Last Modified: 2010-03-05
I've been looking into creating a news script, on the main page it will display the first 200 words of the article and give a link to the whole article.  What would be the best way of doing this?

I've been told that I could extract the first 200 characters but not words.  If I extract the first 200 characters, is there someway to prevent it from stopping in mid-word?  If I need to post the entire script I will but at this point in time I'm only looking for an example.

Basically when the top story article is posted through a form it will be processed and saved.  During the saving process it should clip the first 200 words (if not that, characters), it will save that along with other information to a file.

Author|Date|Time|UserID|Title|Link|((Extracted200WordsHere))

Thanks in advance.
0
Comment
Question by:KenHeckert
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
7 Comments
 
LVL 20

Expert Comment

by:jmcg
ID: 8234299
Depending on your definition of "words" you might be satisfied with something like the following:

$words = join ' ', split(' ', $content, 200);

This splits out the first 200 space-delimited tokens in the string $content. It does not remove HTML coding or anything like that, so you may need to prepare the content of the article to ensure that the first 200 words are something meaningful.

The third argument to 'split' limits the split operation so it stops once it has generated the given number of chunks.
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 8234595
$content=~s/([^\s]*\s+){200}/$1/;
0
 
LVL 51

Accepted Solution

by:
ahoffmann earned 200 total points
ID: 8234607
$content=~s/^(([^\s]*\s+){200}).*/$1/;
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:KenHeckert
ID: 8237123
Thanks ahoffman.  Works perfectly fine.  So simple too.
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 8237191
just keep in mind that it does not match if there are less than 200 words
0
 
LVL 20

Expert Comment

by:jmcg
ID: 8237868
But that could be fixed by doing:

$content =~ s/^((\S+\s+){0,200}/$1/;

(\S is the built-in equivalent to [^\s])
0
 

Author Comment

by:KenHeckert
ID: 8243779
Alright, thanks jmcq.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question