Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 380
  • Last Modified:

bash place <br> in front of certain lines in file

I am converting a html document from <pre> tags to one that will not have pre tags

here is what I need

<this>
Thursday, 8-11-2005, 5:00PM-9:00PM
-Friday, 8-12-2005, 5:00PM-9:00PM
Saturday, 8-13-2005, 8:00AM-9:00PM
Sunday, 8-14-2005, 8:00AM-5:00PM
</this>

<to look like this>
Thursday, 8-11-2005, 5:00PM-9:00PM
<br>-Friday, 8-12-2005, 5:00PM-9:00PM
<br>Saturday, 8-13-2005, 8:00AM-9:00PM
<br>Sunday, 8-14-2005, 8:00AM-5:00PM
</to look like this>

Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

any idea?

thanks
jculkincys
0
jculkincys
Asked:
jculkincys
  • 10
  • 5
  • 4
  • +1
2 Solutions
 
AutogardCommented:
You mentioned that you are trying to convert HTML docs to a wiki format in the other question.  What is your main method for trying to do all of this?  Are you just trying to do it all in bash scripts or are you open to using another scripting language like python/perl/php to do this.  I would recommend something like that for such a large text processing project.  Just a thought.

Well, on to tackle this problem too :)
0
 
jculkincysAuthor Commented:
Yes I just have a large bash script.

If I knew it was going to get this big I would have used something different but I am almost there. Speed if not really an issue since this is going to just run once.

Thanks for the input.
0
 
pjedmondCommented:
"Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line"

Saturday and Sunday do not start with a space or dash? Why was <br> inserted?
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
jculkincysAuthor Commented:
pjemond sorry about that I made a bad type I meant to say

Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

Sorry for the confusion
0
 
pjedmondCommented:
I really wanted to do this with sed....but it wasn't to be. This was as close as I got:

cat convert.txt | sed '/^[^\x2D ]/N; s/\n /\n <br>/'

- signs are a pain in the $%^&

Eventually went for the perl option:

------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) eq "-") || (substr($next_line,0,1) eq " ")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;
------------8X---------------------------

Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

cat filename | myscript.pl > newfile

HTH:)

0
 
pjedmondCommented:
Revised to :

Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) ne "-") && (substr($next_line,0,1) ne " ")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;

------------8X---------------------------

But that doesn't allow for the <this> </this>...which according to your rules need <br> in front? If you want to ignore lines starting with '<' as well then:


------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) ne "-") && (substr($next_line,0,1) ne " ") && (substr($next_line,0,1) ne "<")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;

------------8X---------------------------

Any other rule variations? ;)

HTH:)
0
 
pjedmondCommented:
In fact, you've probably got enough code skeleton to modify to your needs now - let me knwo if you have any problems:)
0
 
AutogardCommented:
I think everyone is messed up on the rules of what you want to happen.  These are the 2 variations:

1. Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line
--- this breaks on lines 3 and 4 which both DO NOT start with a space or a dash

2. Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line
--- this breaks on line 2 which DOES start with a dash

What is it exactly you want?  What is the significance of a "-" or a " " (space)?  Maybe a more comprehensive example will help.

Also, do you want a perl scrip that you can call from your bash script like pjedmond gave you or do you want a pure bash implementation?
0
 
pjedmondCommented:
Have a go at the pure bash implementation - there is something wierd about the '-' char. I know that it has special significance in regexs...[A-Z] etc, but when you start [^ -] or even just [ -], then it starts matching < and other chars. Was driving me nuts - hence the switch to a Perl script:)...and even then, the - causes problems with matching, but the above examples get around it!
0
 
AutogardCommented:
Basically --

if [[ $line =~ '^\-' ]]
then
    echo "whatever"
else
    echo "whatever else"
fi

checks if the string (line) begins with a "-" in bash.

Once I get the exact rules I can work on a solution -- otherwise feel free to use this to formulate it... it shouldn't be too hard once you have this.
0
 
ahoffmannCommented:
> Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

sed -e '/^[^ -]/s/.*/<br>&/' file

I omitted the "and the line above it doesn't start with a space or dash" 'cause it doesn't make a big difference in the browser
0
 
pjedmondCommented:
ahoffman,
That's kind of the route I chose as well, but the ^[^ -] bit also seems to contain < and some other puctuation chars on my system as well as the -. I'm curious as to what else the - can represent. Hence my attempt at replacing it with \x2D in my above example.
0
 
ahoffmannCommented:
in the given example I don't see spaces, just - (well spaces should be there, someone said:), adding the missing < to the charachter class should be a simple 2 second task ;-)
Keep in mind that \x2d is Gnu's sed only, and I don't see a reason to use it.
0
 
pjedmondCommented:
Useful insight - I was using that (\x2d) because the '-' by itself inside the []s was matching a number of other chars as well as the '-' (including '<'s and '>'s). I'm guessing that this is related to [A-Z][0-9] type terminology, and I really wanted to get this to work with sed on my system as it is more 'elegant' than the Perl that I ended up using.

Seems to work fine for Cygwin...ah well - you live and learn. I'm now wondering what the true posix sed should do in this situation?

Answers on the back of a postcard?;)
0
 
ahoffmannCommented:
- as last charachter in a class is used as itself, not as a range operator (at least with reliable regex engines:)
0
 
jculkincysAuthor Commented:
Thanks for all the responses


Good call guys
I guess it doesn't matter if the line above it doesn't start with a space or dash

I will try some suggerstions and get back to you
0
 
pjedmondCommented:
If the line above doesn't matter, then it's easy:)

cat convert.txt | sed '/^[^ -]/s/^/^<br>/'

or

sed '/^[^ -]/s/^/^<br>/' convert.txt

Not sure I should be bragging about this, but mines smaller than anyone elses;)....the regex that is;)

Anyone got anything smaller? Answers on the back of the postage stamp this time:)
0
 
ahoffmannCommented:
good improvment  (except the UUOCA:)
0
 
pjedmondCommented:
UUOCA?....Obviouslsy, I'm being really thick at the moment...Just got up!
0
 
ahoffmannCommented:
useless use of cat award
;-)
0
 
jculkincysAuthor Commented:
yea pj that looks pretty good

sed '/^[^ -]/s/^/^<br>/' convert.txt

can you explain what the /^[^ -]/ at the beginning tells it to do.

Thanks
0
 
pjedmondCommented:
Sure - the ^ matches the beginning of the line. ($ matches the end of the lint. Inside the [ ], ^ means NOT <space> or -, s is substitute the beginning of the line with the beginning of the line<br>

I enjoyed this Q:)

HTH
0
 
AutogardCommented:
May I make one suggestion:

sed '/^[^ -]/s/^/^<br>/' convert.txt --> you will want to get rid of the last "^" as the replace part no longer works as a regular expression.

So: sed '/^[^ -]/s/^/<br>/' convert.txt

Great solution pjedmond!
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 10
  • 5
  • 4
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now