Solved

bash place <br> in front of certain lines in file

Posted on 2006-06-15
23
373 Views
Last Modified: 2010-04-20
I am converting a html document from <pre> tags to one that will not have pre tags

here is what I need

<this>
Thursday, 8-11-2005, 5:00PM-9:00PM
-Friday, 8-12-2005, 5:00PM-9:00PM
Saturday, 8-13-2005, 8:00AM-9:00PM
Sunday, 8-14-2005, 8:00AM-5:00PM
</this>

<to look like this>
Thursday, 8-11-2005, 5:00PM-9:00PM
<br>-Friday, 8-12-2005, 5:00PM-9:00PM
<br>Saturday, 8-13-2005, 8:00AM-9:00PM
<br>Sunday, 8-14-2005, 8:00AM-5:00PM
</to look like this>

Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

any idea?

thanks
jculkincys
0
Comment
Question by:jculkincys
  • 10
  • 5
  • 4
  • +1
23 Comments
 
LVL 8

Expert Comment

by:Autogard
ID: 16916017
You mentioned that you are trying to convert HTML docs to a wiki format in the other question.  What is your main method for trying to do all of this?  Are you just trying to do it all in bash scripts or are you open to using another scripting language like python/perl/php to do this.  I would recommend something like that for such a large text processing project.  Just a thought.

Well, on to tackle this problem too :)
0
 
LVL 2

Author Comment

by:jculkincys
ID: 16916062
Yes I just have a large bash script.

If I knew it was going to get this big I would have used something different but I am almost there. Speed if not really an issue since this is going to just run once.

Thanks for the input.
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16916450
"Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line"

Saturday and Sunday do not start with a space or dash? Why was <br> inserted?
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 2

Author Comment

by:jculkincys
ID: 16916746
pjemond sorry about that I made a bad type I meant to say

Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

Sorry for the confusion
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16916955
I really wanted to do this with sed....but it wasn't to be. This was as close as I got:

cat convert.txt | sed '/^[^\x2D ]/N; s/\n /\n <br>/'

- signs are a pain in the $%^&

Eventually went for the perl option:

------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) eq "-") || (substr($next_line,0,1) eq " ")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;
------------8X---------------------------

Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

cat filename | myscript.pl > newfile

HTH:)

0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16916974
Revised to :

Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) ne "-") && (substr($next_line,0,1) ne " ")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;

------------8X---------------------------

But that doesn't allow for the <this> </this>...which according to your rules need <br> in front? If you want to ignore lines starting with '<' as well then:


------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) ne "-") && (substr($next_line,0,1) ne " ") && (substr($next_line,0,1) ne "<")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;

------------8X---------------------------

Any other rule variations? ;)

HTH:)
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16916978
In fact, you've probably got enough code skeleton to modify to your needs now - let me knwo if you have any problems:)
0
 
LVL 8

Expert Comment

by:Autogard
ID: 16917269
I think everyone is messed up on the rules of what you want to happen.  These are the 2 variations:

1. Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line
--- this breaks on lines 3 and 4 which both DO NOT start with a space or a dash

2. Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line
--- this breaks on line 2 which DOES start with a dash

What is it exactly you want?  What is the significance of a "-" or a " " (space)?  Maybe a more comprehensive example will help.

Also, do you want a perl scrip that you can call from your bash script like pjedmond gave you or do you want a pure bash implementation?
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16917294
Have a go at the pure bash implementation - there is something wierd about the '-' char. I know that it has special significance in regexs...[A-Z] etc, but when you start [^ -] or even just [ -], then it starts matching < and other chars. Was driving me nuts - hence the switch to a Perl script:)...and even then, the - causes problems with matching, but the above examples get around it!
0
 
LVL 8

Expert Comment

by:Autogard
ID: 16917304
Basically --

if [[ $line =~ '^\-' ]]
then
    echo "whatever"
else
    echo "whatever else"
fi

checks if the string (line) begins with a "-" in bash.

Once I get the exact rules I can work on a solution -- otherwise feel free to use this to formulate it... it shouldn't be too hard once you have this.
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 16918107
> Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

sed -e '/^[^ -]/s/.*/<br>&/' file

I omitted the "and the line above it doesn't start with a space or dash" 'cause it doesn't make a big difference in the browser
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16918536
ahoffman,
That's kind of the route I chose as well, but the ^[^ -] bit also seems to contain < and some other puctuation chars on my system as well as the -. I'm curious as to what else the - can represent. Hence my attempt at replacing it with \x2D in my above example.
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 16918929
in the given example I don't see spaces, just - (well spaces should be there, someone said:), adding the missing < to the charachter class should be a simple 2 second task ;-)
Keep in mind that \x2d is Gnu's sed only, and I don't see a reason to use it.
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16919068
Useful insight - I was using that (\x2d) because the '-' by itself inside the []s was matching a number of other chars as well as the '-' (including '<'s and '>'s). I'm guessing that this is related to [A-Z][0-9] type terminology, and I really wanted to get this to work with sed on my system as it is more 'elegant' than the Perl that I ended up using.

Seems to work fine for Cygwin...ah well - you live and learn. I'm now wondering what the true posix sed should do in this situation?

Answers on the back of a postcard?;)
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 16919180
- as last charachter in a class is used as itself, not as a range operator (at least with reliable regex engines:)
0
 
LVL 2

Author Comment

by:jculkincys
ID: 16919898
Thanks for all the responses


Good call guys
I guess it doesn't matter if the line above it doesn't start with a space or dash

I will try some suggerstions and get back to you
0
 
LVL 22

Accepted Solution

by:
pjedmond earned 400 total points
ID: 16920609
If the line above doesn't matter, then it's easy:)

cat convert.txt | sed '/^[^ -]/s/^/^<br>/'

or

sed '/^[^ -]/s/^/^<br>/' convert.txt

Not sure I should be bragging about this, but mines smaller than anyone elses;)....the regex that is;)

Anyone got anything smaller? Answers on the back of the postage stamp this time:)
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 16925945
good improvment  (except the UUOCA:)
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16926200
UUOCA?....Obviouslsy, I'm being really thick at the moment...Just got up!
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 16926603
useless use of cat award
;-)
0
 
LVL 2

Author Comment

by:jculkincys
ID: 16936136
yea pj that looks pretty good

sed '/^[^ -]/s/^/^<br>/' convert.txt

can you explain what the /^[^ -]/ at the beginning tells it to do.

Thanks
0
 
LVL 22

Expert Comment

by:pjedmond
ID: 16936271
Sure - the ^ matches the beginning of the line. ($ matches the end of the lint. Inside the [ ], ^ means NOT <space> or -, s is substitute the beginning of the line with the beginning of the line<br>

I enjoyed this Q:)

HTH
0
 
LVL 8

Assisted Solution

by:Autogard
Autogard earned 100 total points
ID: 16936379
May I make one suggestion:

sed '/^[^ -]/s/^/^<br>/' convert.txt --> you will want to get rid of the last "^" as the replace part no longer works as a regular expression.

So: sed '/^[^ -]/s/^/<br>/' convert.txt

Great solution pjedmond!
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Linux boot cd to do hardware report on PC? 3 81
awk file 6 102
winscp where are logs stored 3 76
how to include conditional log rotate in liunx. 17 73
Network Interface Card (NIC) bonding, also known as link aggregation, NIC teaming and trunking, is an important concept to understand and implement in any environment where high availability is of concern. Using this feature, a server administrator …
Over the last ten+ years I have seen Linux configuration tools come and go. In the early days there was the tried-and-true, all-powerful linuxconf that many thought would remain the one and only Linux configuration tool until the end of times. Well,…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question