Link to home
Start Free TrialLog in
Avatar of jculkincys
jculkincysFlag for United States of America

asked on

bash place <br> in front of certain lines in file

I am converting a html document from <pre> tags to one that will not have pre tags

here is what I need

<this>
Thursday, 8-11-2005, 5:00PM-9:00PM
-Friday, 8-12-2005, 5:00PM-9:00PM
Saturday, 8-13-2005, 8:00AM-9:00PM
Sunday, 8-14-2005, 8:00AM-5:00PM
</this>

<to look like this>
Thursday, 8-11-2005, 5:00PM-9:00PM
<br>-Friday, 8-12-2005, 5:00PM-9:00PM
<br>Saturday, 8-13-2005, 8:00AM-9:00PM
<br>Sunday, 8-14-2005, 8:00AM-5:00PM
</to look like this>

Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

any idea?

thanks
jculkincys
Avatar of Autogard
Autogard

You mentioned that you are trying to convert HTML docs to a wiki format in the other question.  What is your main method for trying to do all of this?  Are you just trying to do it all in bash scripts or are you open to using another scripting language like python/perl/php to do this.  I would recommend something like that for such a large text processing project.  Just a thought.

Well, on to tackle this problem too :)
Avatar of jculkincys

ASKER

Yes I just have a large bash script.

If I knew it was going to get this big I would have used something different but I am almost there. Speed if not really an issue since this is going to just run once.

Thanks for the input.
"Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line"

Saturday and Sunday do not start with a space or dash? Why was <br> inserted?
pjemond sorry about that I made a bad type I meant to say

Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

Sorry for the confusion
I really wanted to do this with sed....but it wasn't to be. This was as close as I got:

cat convert.txt | sed '/^[^\x2D ]/N; s/\n /\n <br>/'

- signs are a pain in the $%^&

Eventually went for the perl option:

------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) eq "-") || (substr($next_line,0,1) eq " ")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;
------------8X---------------------------

Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

cat filename | myscript.pl > newfile

HTH:)

Revised to :

Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) ne "-") && (substr($next_line,0,1) ne " ")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;

------------8X---------------------------

But that doesn't allow for the <this> </this>...which according to your rules need <br> in front? If you want to ignore lines starting with '<' as well then:


------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) ne "-") && (substr($next_line,0,1) ne " ") && (substr($next_line,0,1) ne "<")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;

------------8X---------------------------

Any other rule variations? ;)

HTH:)
In fact, you've probably got enough code skeleton to modify to your needs now - let me knwo if you have any problems:)
I think everyone is messed up on the rules of what you want to happen.  These are the 2 variations:

1. Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line
--- this breaks on lines 3 and 4 which both DO NOT start with a space or a dash

2. Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line
--- this breaks on line 2 which DOES start with a dash

What is it exactly you want?  What is the significance of a "-" or a " " (space)?  Maybe a more comprehensive example will help.

Also, do you want a perl scrip that you can call from your bash script like pjedmond gave you or do you want a pure bash implementation?
Have a go at the pure bash implementation - there is something wierd about the '-' char. I know that it has special significance in regexs...[A-Z] etc, but when you start [^ -] or even just [ -], then it starts matching < and other chars. Was driving me nuts - hence the switch to a Perl script:)...and even then, the - causes problems with matching, but the above examples get around it!
Basically --

if [[ $line =~ '^\-' ]]
then
    echo "whatever"
else
    echo "whatever else"
fi

checks if the string (line) begins with a "-" in bash.

Once I get the exact rules I can work on a solution -- otherwise feel free to use this to formulate it... it shouldn't be too hard once you have this.
> Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

sed -e '/^[^ -]/s/.*/<br>&/' file

I omitted the "and the line above it doesn't start with a space or dash" 'cause it doesn't make a big difference in the browser
ahoffman,
That's kind of the route I chose as well, but the ^[^ -] bit also seems to contain < and some other puctuation chars on my system as well as the -. I'm curious as to what else the - can represent. Hence my attempt at replacing it with \x2D in my above example.
in the given example I don't see spaces, just - (well spaces should be there, someone said:), adding the missing < to the charachter class should be a simple 2 second task ;-)
Keep in mind that \x2d is Gnu's sed only, and I don't see a reason to use it.
Useful insight - I was using that (\x2d) because the '-' by itself inside the []s was matching a number of other chars as well as the '-' (including '<'s and '>'s). I'm guessing that this is related to [A-Z][0-9] type terminology, and I really wanted to get this to work with sed on my system as it is more 'elegant' than the Perl that I ended up using.

Seems to work fine for Cygwin...ah well - you live and learn. I'm now wondering what the true posix sed should do in this situation?

Answers on the back of a postcard?;)
- as last charachter in a class is used as itself, not as a range operator (at least with reliable regex engines:)
Thanks for all the responses


Good call guys
I guess it doesn't matter if the line above it doesn't start with a space or dash

I will try some suggerstions and get back to you
ASKER CERTIFIED SOLUTION
Avatar of pjedmond
pjedmond
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
good improvment  (except the UUOCA:)
UUOCA?....Obviouslsy, I'm being really thick at the moment...Just got up!
useless use of cat award
;-)
yea pj that looks pretty good

sed '/^[^ -]/s/^/^<br>/' convert.txt

can you explain what the /^[^ -]/ at the beginning tells it to do.

Thanks
Sure - the ^ matches the beginning of the line. ($ matches the end of the lint. Inside the [ ], ^ means NOT <space> or -, s is substitute the beginning of the line with the beginning of the line<br>

I enjoyed this Q:)

HTH
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial