trim files in CMD and concatenate to one file

Hello everyone,

I have folder in which I have allot of different files.
I want to trim out same section in each of XXX specific files.
The name of specific files start from News-x end at News-xxx.
Content of file is like this
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<meta http-equiv="content-type" content="text/html;charset=utf-8" />
<head>
<meta content="DYNAMIC" name="DOCUMENT-STATE" />
<meta content="Copyright (c) 'Церковь Вифлеем' - info@vflm.by" name="Copyright" />
<link href="http://xxxxxxxxx.com/App_Themes/main/layout.css" type="text/css" rel="stylesheet" />
<link href="http://xxxxxxxxx.com/App_Themes/main/style.css" type="text/css" rel="stylesheet" />
<link href="http://xxxxxxxxx.com/App_Themes/main/tables.css" type="text/css" rel="stylesheet" />
<meta content="Заметки церкви Вифлеем, Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько" name="keywords" />
<meta content="Новости официального сайта церкви Вифлеем, 20 марта 2010 года в нашей церкви прошел XIV съезд Союза евангельских христиан-баптистов Беларуси." name="description" />
<link rel="icon" href="favicon.ico" type="image/x-icon" />
<link rel="shortcut icon" href="favicon.ico" type="image/x-icon" />
<meta content="info@xxxxxxxxx.com" name="Reply-to" />
<title>
	Новости - Церковь Вифлеем - Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько
</title></head>
<body id="news_page">
<form method="post" action="http://xxxxxxxxx.com/News-xxx.aspx" id="aspnetForm_news">
<div>
</div>
<div id="main_wrapper"><div class="main_wr_l"><div class="main_wr_r"><div id="main"><div class="body-bg">
<!-- HEADER -->
<div id="header">
<div class="row-1">
<div id="logo">
<h1><a href="Default.html">Церковь «Вифлеем»</a></h1>
<h5>Библейская Церковь Евангельских Христиан Баптистов «Вифлеем», г.Минск, Беларусь.</h5>
</div>
<div id="top-menu">
<!-- top menu -->
<ul>
	<li class="resources"><a href="Resources.html">Ресурсы</a></li>
	<li class="beliefs"><a href="Beliefs.html">Вероучение</a></li>
	<li class="cal"><a href="Calendar.html">Календарь</a></li>
	<li class="about"><a href="About.html">О нас</a></li>
</ul>
<!-- end menu -->
</div>
</div><div class="nav-pass"><table cellpadding="0" cellspacing="0" style="border-width:0;" ><tr><td><a href="Default.html">Главная</a> ></td><td  style="white-space:nowrap;">Новости</td></tr></table></div>
</div>
<!-- END HEADER -->
<!-- CONTENT -->
<div id="content"><div class="box"><div class="border-top"><div class="border-right"><div class="border-bot"><div class="border-left"><div class="left-top-corner"><div class="right-top-corner"><div class="right-bot-corner"><div class="left-bot-corner"><div class="wrapper">
<div class="col-caption">
<h3>РќРѕРІРсти</h3>
<h4>РќРѕРІРсти Серкви «Вифлеем»</h4>
</div>
<div class="col-caption"><h2>Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько</h2></div>
<div class="col-1">
 <div class="img-indent"><div class="img-box1"><img src="img/vflmNews/picture_14_syod.jpg" alt="picture_14_syod.jpg" /></div></div>
 <a href="News.html" class="link">читать все новости</a><br /><br />
</div>
<div class="col-2">
 <h4>20 марта 2010 г.</h4>
<b>20 марта 2010 года в нашей церкви прошел XIV съезд Союза евангельских христиан-баптистов Беларуси. </b><br /><p>В «Вифлееме» собрались представители всех церквей братства, всего 291 делегат.  По итогам съезда старший пресвитер нашей церкви Виктор Никодимович Крутько был избран новым председателем Союза ЕХБ Беларуси. Генеральным секретарем стал Николай Васильевич Синковец, который занимал пост председателя до этого. Заместителем председателя переизбран Иосиф Николаевич Рачковский.</p>                                                     
</div>
</div></div></div></div></div></div></div></div></div></div></div>
<!-- END CONTENT -->
<!-- FOOTER -->
<div id="footer">
<div class="indent">
<div class="wrapper"><div class="col-1">
	&nbsp;</div>
<div class="col-2">
	<ul>
		<li>
			<noindex><img alt="Минская Богословская Семинария" height="31" src="img/stxt/banner_seminary.jpg" width="88" /></noindex></li>
		<li>
			<noindex><img alt="Союз Евангельских Христиан-Баптистов" height="31" src="img/stxt/banner_baptist.jpg" width="88" /></a></noindex></li>
		<li>
			<noindex><img alt="журнал Крынiца Жыцця" height="31" src="img/stxt/banner_krinitsa.jpg" width="88" /></noindex></li>
		<li>
			<noindex><img alt="Евангелие и Реформация" height="31" src="../epbook.by/shop/images/epbook_banner_88x31.jpg" width="88" /></noindex></li>

	</ul>
</div>
<div class="col-3">
	<a href="index.html">Церковь &laquo;Вифлеем&raquo;</a> &copy; 2010 | <a href="SiteMap.html">Карта сайта</a></div>
</div>
</div>
</div>
<!-- END FOOTER -->
</div></div></div></div></div>
</form>
</body>
</html>

Open in new window

I need to get out only following divs boxes.<div class="col-caption">, <div class="col-1"> and <div class="col-2">
It would be nice to combine all ready files into one txt file with space between each other.

I think there is an easy way to do it in old school.

Appreciate your help.
LVL 16
SSupremeAsked:
Who is Participating?
 
SStoryCommented:
Well, those tools already exist. I'll admit they can take some time to master. The alternative is to write a simple program that reads input one line at a time in something like vb.net or C or whatever  If it finds
<div as the start of the text and has id=whatever then set flag indiv=true),
output that line and all other lines until you hit a line that just starts with
</div>

That's the quick method I'd use.   So either you have to use tools like grep and awk that can be tricky, or write your own.  For me, by far the simplest would be to write my own in whatever language.. Read each line, output appropriate lines until hitting a closing div and start over at next find.
0
 
SStoryCommented:
Get a copy of grep:
http://gnuwin32.sourceforge.net/packages/grep.htm

grep col-caption *.html

would return lines having col-caption in them.

grep is a very complex tool that can quickly find search terms in text files. It has a lot of options.

To see them type
grep --help

at the command line and hit enter.

example:
grep -i (case insentive)

Another thing to note is that you can use regular expressions with grep.
http://www.opensourceforu.com/2012/06/beginners-guide-gnu-grep-basics-regular-expressions/

Once you've built grep the way you want, put

> youroutputfilename.txt

at the end of it to write to a file.

Example:
grep 'col-caption\|col-1\|col-2'  > outputfile.txt

or

grep -H 'col-caption\|col-1\|col-2'  > outputfile.txt
0
 
SSupremeAuthor Commented:
'col-1\' is not recognized as an internal or external command,
operable program or batch file.

Open in new window

Thanks for you answer and help! I used grep before on linux, spent some time to install grep and  make it work. I cannot imagine grep is doing what I want. output multiple rows of same div.

Output should be like this:
<div class="col-caption">
<h3>РќРѕРІРсти</h3>
<h4>РќРѕРІРсти Серкви «Вифлеем»</h4>
</div>
<div class="col-caption"><h2>Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько</h2></div>
<div class="col-1">
 <div class="img-indent"><div class="img-box1"><img src="img/vflmNews/picture_14_syod.jpg" alt="picture_14_syod.jpg" /></div></div>
 <a href="News.html" class="link">читать все новости</a><br /><br />
</div>
<div class="col-2">
 <h4>20 марта 2010 г.</h4>
<b>20 марта 2010 года в нашей церкви прошел XIV съезд Союза евангельских христиан-баптистов Беларуси. </b><br /><p>В «Вифлееме» собрались представители всех церквей братства, всего 291 делегат.  По итогам съезда старший пресвитер нашей церкви Виктор Никодимович Крутько был избран новым председателем Союза ЕХБ Беларуси. Генеральным секретарем стал Николай Васильевич Синковец, который занимал пост председателя до этого. Заместителем председателя переизбран Иосиф Николаевич Рачковский.</p>                                                     
</div>

Open in new window

0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
SSupremeAuthor Commented:
Well, I spent sometime learning and practicing but with no luck. I feel like it is a tiny part of solution.
Like in Excel you can use FIND command to locate first and last character, and MID to return content between locations.
0
 
SStoryCommented:
Worked for me in Linux.  Try double quotes " " instead and see if that does anything.
0
 
SStoryCommented:
Grep can match and output every line containing the word in the word list.  -A5 option would output the 5lines after that match too.



Example:
grep 'col-caption\|col-1\|col-2'  News*.* > output.txt

Or grep '"col-caption\|col-1\|col-2'" News*.* > output.txt

Or grep -A5 'col-caption\|col-1\|col-2'  News*.* > output.txt

Or grep  -A5 '"col-caption\|col-1\|col-2'" News*.* > output.txt

Now if it must go until it finds the ending div tag, that is another story.  Then you could use awk.  Or write a simple parser in VB or something.

The News*.* should grep all your News files. I had forgotten to specify what to grep before.
0
 
SSupremeAuthor Commented:
Now if it must go until it finds the ending div tag, that is another story.
Looks like I am looking for another story.
I thought I can get solution in few hours, but as usual no solution in few days.
I know I can learn grep, sed and awk, it would take few days or as I use grep, awk once a year, It would take few days to process those files manually. While I will be doing it, I will think about computer as something that hard to communicate and that is why I cannot my life simple.
0
 
SSupremeAuthor Commented:
Thanks, Ill try PHP and will place solution here.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.