Solved

Split Long String of HTML Source Between the Two Words or Tags Closest to the Middle

Posted on 2006-11-20
19
599 Views
Last Modified: 2008-02-01
I have string variables containing several hundred characters of HTML source code, and I want to divide them into two lines when they print to a Web page, by inserting a "\n" about halfway through the string -- but only between two words or between two HTML Tags, and only if the string exceeds a certain length.  
   For example, it should test to see if the string is over 300 characters. If so, it should insert a "\n" close to the middle of the string without breaking up a word, and without breaking up the contents of any HTML tag.
   How would this be done?
0
Comment
Question by:Randall-B
  • 9
  • 4
  • 4
  • +1
19 Comments
 
LVL 2

Expert Comment

by:jingks03
ID: 17981084
I try a quick attempt here, i'm guessing there will be quite a few replies to this
I think this does what you want... a little bulky i think though

#------------------------
my $file = shift @ARGV;
open FILE, $file;
my $data = "";
while (<FILE>) { $data .= $_; } # store source in string
my $slen = length($data);             # possibly a little inaccurate (i'm quessing \n and \s will count as characters)
my ($half1,$half2) = ('','');
if ($slen > 300) {
    my @words = split(/(>[^<]*\s|>)/,$data);
    my $i = 0;
    while (length($half1) < $slen/2) {
       $half1 .= shift @words;
       $i++;
    }
    if ($i%2) { $half1 .= shift @words; }
    $half2 = join('',@words);
    open oFILE, ">first_half.html";
    print oFILE $half1;
    close oFILE;
    open oFILE, ">second_half.html";
    print oFILE $half2;
    close oFILE;
}
#--------------------------------------
0
 
LVL 84

Expert Comment

by:ozo
ID: 17981125
"\n" will not divide lines on HTML unless it is in a <pre> tag
did you mean "<br />"?
0
 

Author Comment

by:Randall-B
ID: 17981196
ozo,
   I only want to break the line as it appears in the source code file, not in the browser window.  For reasons that I won't go into here, I'm trying to make all of my HTML source code lines reasonably short, like screen-width (without using wrap-text).  So I'm just looking for a way to break long source code lines into shorter source code lines, without affecting the output as seen on the Web page. That's why I want "\n" instead of <br />.  Thanks.
0
 
LVL 84

Expert Comment

by:ozo
ID: 17981252
Does that mean you also want to avoid splitting inside of <pre>...</pre>?
0
 
LVL 8

Assisted Solution

by:Perl_Diver
Perl_Diver earned 225 total points
ID: 17981260
my $oldstring = 'your long string here";
my ($newstring,$f,$s) = ('','','');
if (length($oldstring) > 300) {
   $f = substr $oldstring,0,150;
   $s = substr $oldstring,150;
   if ($f !~ /\s$/) {
      $f =~ s/((\s)(\S))$//;
      $newstring = "$f\n$3$s";
   }
   else {
      $newstring = "$f\n$s";
   }
   print $newstring;      
}
else {
   print $oldstring;
}
0
 

Author Comment

by:Randall-B
ID: 17981263
jingks03,
   Maybe I should have explained that I'm not looking to perform this operation on a whole file, but rather on a few individual strings that are output from another function in my script.  For example, lets say the script does some operations and assigns a bunch of HTML code as the value of string variable "$String". I just need a function or regex or whatever to process that $String variable before "printing" it to STDOUT (which becomes part of the HTML source code for a Web page that the use will view in the browser. But I'm trying to make the lines of source code shorter, because it will also be used in an application that will store and compare different versions of the source code).
0
 

Author Comment

by:Randall-B
ID: 17981287
ozo,
   No, I don't think I would need to avoid splitting between any two tags, not even <pre> . . . </pre>. I just don't want to split inside the tags themselves.  For example, I don't want:  <pr
                                    e>

but this is fine:   <pre> . . .
                        </pre>
0
 

Author Comment

by:Randall-B
ID: 17981303
Perl Diver,
    Just by eyeballing your code, it looks like I will do what I need. I'm going to actually test it in my script now. I'll let you know the results. Thanks.
0
 
LVL 8

Expert Comment

by:Perl_Diver
ID: 17981326
I didn't try taking into account to break strings between html tags, because I believe html tags can be broken internally on spaces with no problem.
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 84

Expert Comment

by:ozo
ID: 17981386
It is valid HTML to put a "\n" insde of
<pre
>
or
</pre
>
Do you still want to disallow this?
How about inside of
<!--
comments
 -->
or
<script>if (a<b && a>c)</script>

0
 
LVL 2

Accepted Solution

by:
jingks03 earned 275 total points
ID: 17981454
Ah... yeah, i wrote that a little too fast.  Missed some of the reqs. So given the string $String;
# ----------------------------------
my $slen = length($String);
if (length($String) > 300) {
    my $half = length($String)/2;
    $String =~ s/^(.{$half}.*?[\s\>])([^<]*<?.*)$/$1\n$2/;
}
print $String;
# ------------------------------------
That should insert a "\n" into the string after the first ">" or "\s" after the half way point not inside a "<.>"
0
 
LVL 2

Expert Comment

by:jingks03
ID: 17981461
oops... and forget about the "my $slen = length($String);" line...

need more coffee
0
 

Author Comment

by:Randall-B
ID: 17981491
Uh-oh, I just discovered something that I had forgotten . . .  And because of this, none of the proposed solutions will work.
  Although Perl Diver's code may be just about right if the lines of HTML code were actually composed of single long strings, I now remember that the "lines" of HTML code are actually composed of many separate "print" statements, which print "horizontally" across the page of HTML source until a certain number of print statements have run. (After that number of print statements, it inserts a "\n".)
    The problem is, some lines are short when composed of 20 print statements (each of which is generally made up of only 1 word).  But other lines turn out very very long when composed of 20 print statements, because some of those statements contain a bunch of html formatting tags that take up a lot of space on the source-code page.
    So the situation is a lot more complicated that I originally thought. Probably what I need to do is get rid of my current loop that inserts a "\n" after every 20 print statements, and use some kind of character counter, instead.  For example, a counter would keep track of the number of characters printed in a bunch of successive print statements, and the code would insert a "\n" after every so many (e.g. 300) characters (but without breaking individual words or html tags.
     To understand what's going on in the current Perl code, I am showing the print statement below (which needs to be modified as described in the previous paragraph):

$count=0;

sub PrintLine($$){
  $count=($count+1)%20;
   $Mode=shift;
   $Word=shift;
   chomp $Word;
   $Word=~s/^\s+//;
   $Word=~s/\s+$//;
   $Word=~s<</(html|body)>>~~ig;

 if($Word eq ''){
  return;
 }
 else{
  if($Mode eq 'New'){
    $Word=~s|<font.*?>||ig;
    if($Word=~/<td>/i){
      $Word=~s|<td>|<TD>$Style{StartNew}|ig;
      print $FH "$Word$Style{EndNew} " if $count<19;
      print $FH "$Word$Style{EndNew} \n" if $count==19;
    }
    else{
   print $FH "$Style{StartNew}$Word$Style{EndNew} " if $count<19;
   print $FH "$Style{StartNew}$Word$Style{EndNew} \n" if $count>=19;
    }
  }
  elsif($Mode eq 'Old'){
    $Word=~s|<[^<]+>||g;
 print $FH "$Style{StartOld}$Word$Style{EndOld} " if $count<19;
 print $FH "$Style{StartOld}$Word$Style{EndOld} \n" if $count>=19;
  }
  elsif($Mode eq 'Equal'){
     if($Word=~ />$/){
        print $FH "$Word" if $count<19;
     print $FH "$Word\n" if $count==19;}
     else{
        print $FH "$Word " if $count<19;
     print $FH "$Word \n" if $count==19;}
   }
 else{die"Illegal Mode for PrintLine: $Mode"}
#  print $FH $count;
 }
}

This print function is called hundreds of times in the script, and the counter makes it insert a " \n" (space and \n) after every 20 statements (otherwise, it only inserts a space, if less than 20 statements).  But, to make the lines of HTML code really about even length, it needs to be modified to add the "\n" after a set number of *characters* (without breaking inside of a word or inside of an html tag).
0
 

Author Comment

by:Randall-B
ID: 17981527
ozo,
   Yes, it should avoid breaks within a tag, such as:
<pr
 e>
      or
</p
re>
  (It should maintain it as "<pre>".)  The other examples you gave probably also need to be maintain without breaks or spaces, as well.  However, please see my revised question above. Sorry about the confusion.
0
 

Author Comment

by:Randall-B
ID: 17981535
jingks03
   That looks like it would have worked for the question as originally stated, but I'm sorry I mis-stated it. See the long revised question above. Thanks.
0
 
LVL 84

Expert Comment

by:ozo
ID: 17981588
<pr
e>
is bad
but
<pre
>
is valid
(althogh I suppose it doesn't hurt to wait one more character before inserting the newline.)
beteween word1 and word2 in
<script>"word1 word2"</script>
is not inside <> but should probably not have a line break inserted,
<!-- comment --> is inside of <> but could safely have a line break inserted
do you need to take those into account?
0
 

Author Comment

by:Randall-B
ID: 17981622
ozo,
   I would like to avoid breaking up the <pre> or </pre> tag at all.
But I doubt that it would hurt to insert a "\n" inside of content between the script or comment tags, so it probably does not need to take those into account.  (And please see the revised question above.) Thanks.
0
 

Author Comment

by:Randall-B
ID: 17982190
Looks like no one is biting the bait for the revised question. Although I haven't tested jingks03's solution, it looks like it would do what I originally asked for, so I'll accept it.  
  The revised question is being move to a separate listing: http://www.experts-exchange.com/Programming/Programming_Languages/Perl/Q_22067302.html , as it is so different from the original question.  Experts, please go to the new question. Thanks.
0
 
LVL 2

Expert Comment

by:jingks03
ID: 17982518
Ah, in that case Randall, i think there has to be another slight change to it

$String =~ s/^(.{$half}.*?[\s\>])([^>]+\<.*\>[^>]*)$/$1\n$2/;

The pattern borken up should work as:
^(.{$half}     - match anything up to the halfway mark
.*?[\s\>])     - match up to the first space or > observed
([^>]+         - split point mush not be followed by a >
\<.*\>         - if HTML tags exists after split match fist < to last >
[^>]$)         - match to the end of the string

Sorry about not looking into the second, revised question.  looks a little to involved for a coffee break answer
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now