Solved

bash place <br> in front of certain lines in file

Posted on 2006-06-15
23
367 Views
Last Modified: 2010-04-20
I am converting a html document from <pre> tags to one that will not have pre tags

here is what I need

<this>
Thursday, 8-11-2005, 5:00PM-9:00PM
-Friday, 8-12-2005, 5:00PM-9:00PM
Saturday, 8-13-2005, 8:00AM-9:00PM
Sunday, 8-14-2005, 8:00AM-5:00PM
</this>

<to look like this>
Thursday, 8-11-2005, 5:00PM-9:00PM
<br>-Friday, 8-12-2005, 5:00PM-9:00PM
<br>Saturday, 8-13-2005, 8:00AM-9:00PM
<br>Sunday, 8-14-2005, 8:00AM-5:00PM
</to look like this>

Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

any idea?

thanks
jculkincys
0
Comment
Question by:jculkincys
  • 10
  • 5
  • 4
  • +1
23 Comments
 
LVL 8

Expert Comment

by:Autogard
Comment Utility
You mentioned that you are trying to convert HTML docs to a wiki format in the other question.  What is your main method for trying to do all of this?  Are you just trying to do it all in bash scripts or are you open to using another scripting language like python/perl/php to do this.  I would recommend something like that for such a large text processing project.  Just a thought.

Well, on to tackle this problem too :)
0
 
LVL 2

Author Comment

by:jculkincys
Comment Utility
Yes I just have a large bash script.

If I knew it was going to get this big I would have used something different but I am almost there. Speed if not really an issue since this is going to just run once.

Thanks for the input.
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
"Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line"

Saturday and Sunday do not start with a space or dash? Why was <br> inserted?
0
 
LVL 2

Author Comment

by:jculkincys
Comment Utility
pjemond sorry about that I made a bad type I meant to say

Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

Sorry for the confusion
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
I really wanted to do this with sed....but it wasn't to be. This was as close as I got:

cat convert.txt | sed '/^[^\x2D ]/N; s/\n /\n <br>/'

- signs are a pain in the $%^&

Eventually went for the perl option:

------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) eq "-") || (substr($next_line,0,1) eq " ")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;
------------8X---------------------------

Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

cat filename | myscript.pl > newfile

HTH:)

0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
Revised to :

Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) ne "-") && (substr($next_line,0,1) ne " ")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;

------------8X---------------------------

But that doesn't allow for the <this> </this>...which according to your rules need <br> in front? If you want to ignore lines starting with '<' as well then:


------------8X---------------------------
#!/usr/bin/perl -w

use strict;

my $current_line;
my $next_line;

$current_line=<>;

while(<>) {
$next_line=$_;
# Do work here
        if ((substr($current_line,0,1) ne "-") && (substr($current_line,0,1) ne " ")) {
                if ((substr($next_line,0,1) ne "-") && (substr($next_line,0,1) ne " ") && (substr($next_line,0,1) ne "<")) {
                        $next_line="<br>".$next_line;
                }
        }
# Finish work
print $current_line;
$current_line=$next_line;
}
print $current_line;

------------8X---------------------------

Any other rule variations? ;)

HTH:)
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
In fact, you've probably got enough code skeleton to modify to your needs now - let me knwo if you have any problems:)
0
 
LVL 8

Expert Comment

by:Autogard
Comment Utility
I think everyone is messed up on the rules of what you want to happen.  These are the 2 variations:

1. Basically if a line does start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line
--- this breaks on lines 3 and 4 which both DO NOT start with a space or a dash

2. Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line
--- this breaks on line 2 which DOES start with a dash

What is it exactly you want?  What is the significance of a "-" or a " " (space)?  Maybe a more comprehensive example will help.

Also, do you want a perl scrip that you can call from your bash script like pjedmond gave you or do you want a pure bash implementation?
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
Have a go at the pure bash implementation - there is something wierd about the '-' char. I know that it has special significance in regexs...[A-Z] etc, but when you start [^ -] or even just [ -], then it starts matching < and other chars. Was driving me nuts - hence the switch to a Perl script:)...and even then, the - causes problems with matching, but the above examples get around it!
0
 
LVL 8

Expert Comment

by:Autogard
Comment Utility
Basically --

if [[ $line =~ '^\-' ]]
then
    echo "whatever"
else
    echo "whatever else"
fi

checks if the string (line) begins with a "-" in bash.

Once I get the exact rules I can work on a solution -- otherwise feel free to use this to formulate it... it shouldn't be too hard once you have this.
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
> Basically if a line does not start with a space or dash (-) and the line above it doesn't start with a space or dash - then I want to insert a <br>before the current line

sed -e '/^[^ -]/s/.*/<br>&/' file

I omitted the "and the line above it doesn't start with a space or dash" 'cause it doesn't make a big difference in the browser
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
ahoffman,
That's kind of the route I chose as well, but the ^[^ -] bit also seems to contain < and some other puctuation chars on my system as well as the -. I'm curious as to what else the - can represent. Hence my attempt at replacing it with \x2D in my above example.
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
in the given example I don't see spaces, just - (well spaces should be there, someone said:), adding the missing < to the charachter class should be a simple 2 second task ;-)
Keep in mind that \x2d is Gnu's sed only, and I don't see a reason to use it.
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
Useful insight - I was using that (\x2d) because the '-' by itself inside the []s was matching a number of other chars as well as the '-' (including '<'s and '>'s). I'm guessing that this is related to [A-Z][0-9] type terminology, and I really wanted to get this to work with sed on my system as it is more 'elegant' than the Perl that I ended up using.

Seems to work fine for Cygwin...ah well - you live and learn. I'm now wondering what the true posix sed should do in this situation?

Answers on the back of a postcard?;)
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
- as last charachter in a class is used as itself, not as a range operator (at least with reliable regex engines:)
0
 
LVL 2

Author Comment

by:jculkincys
Comment Utility
Thanks for all the responses


Good call guys
I guess it doesn't matter if the line above it doesn't start with a space or dash

I will try some suggerstions and get back to you
0
 
LVL 22

Accepted Solution

by:
pjedmond earned 400 total points
Comment Utility
If the line above doesn't matter, then it's easy:)

cat convert.txt | sed '/^[^ -]/s/^/^<br>/'

or

sed '/^[^ -]/s/^/^<br>/' convert.txt

Not sure I should be bragging about this, but mines smaller than anyone elses;)....the regex that is;)

Anyone got anything smaller? Answers on the back of the postage stamp this time:)
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
good improvment  (except the UUOCA:)
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
UUOCA?....Obviouslsy, I'm being really thick at the moment...Just got up!
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
useless use of cat award
;-)
0
 
LVL 2

Author Comment

by:jculkincys
Comment Utility
yea pj that looks pretty good

sed '/^[^ -]/s/^/^<br>/' convert.txt

can you explain what the /^[^ -]/ at the beginning tells it to do.

Thanks
0
 
LVL 22

Expert Comment

by:pjedmond
Comment Utility
Sure - the ^ matches the beginning of the line. ($ matches the end of the lint. Inside the [ ], ^ means NOT <space> or -, s is substitute the beginning of the line with the beginning of the line<br>

I enjoyed this Q:)

HTH
0
 
LVL 8

Assisted Solution

by:Autogard
Autogard earned 100 total points
Comment Utility
May I make one suggestion:

sed '/^[^ -]/s/^/^<br>/' convert.txt --> you will want to get rid of the last "^" as the replace part no longer works as a regular expression.

So: sed '/^[^ -]/s/^/<br>/' convert.txt

Great solution pjedmond!
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

How many times have you wanted to quickly do the same thing to a list but found yourself typing it again and again? I first figured out a small time saver with the up arrow to recall the last command but that can only get you so far if you have a bi…
Introduction We as admins face situation where we need to redirect websites to another. This may be required as a part of an upgrade keeping the old URL but website should be served from new URL. This document would brief you on different ways ca…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now