trim files in CMD and concatenate to one file

Hello everyone,

I have folder in which I have allot of different files.
I want to trim out same section in each of XXX specific files.
The name of specific files start from News-x end at News-xxx.
Content of file is like this
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<meta http-equiv="content-type" content="text/html;charset=utf-8" />
<head>
<meta content="DYNAMIC" name="DOCUMENT-STATE" />
<meta content="Copyright (c) 'Церковь Вифлеем' - info@vflm.by" name="Copyright" />
<link href="http://xxxxxxxxx.com/App_Themes/main/layout.css" type="text/css" rel="stylesheet" />
<link href="http://xxxxxxxxx.com/App_Themes/main/style.css" type="text/css" rel="stylesheet" />
<link href="http://xxxxxxxxx.com/App_Themes/main/tables.css" type="text/css" rel="stylesheet" />
<meta content="Заметки церкви Вифлеем, Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько" name="keywords" />
<meta content="Новости официального сайта церкви Вифлеем, 20 марта 2010 года в нашей церкви прошел XIV съезд Союза евангельских христиан-баптистов Беларуси." name="description" />
<link rel="icon" href="favicon.ico" type="image/x-icon" />
<link rel="shortcut icon" href="favicon.ico" type="image/x-icon" />
<meta content="info@xxxxxxxxx.com" name="Reply-to" />
<title>
	Новости - Церковь Вифлеем - Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько
</title></head>
<body id="news_page">
<form method="post" action="http://xxxxxxxxx.com/News-xxx.aspx" id="aspnetForm_news">
<div>
</div>
<div id="main_wrapper"><div class="main_wr_l"><div class="main_wr_r"><div id="main"><div class="body-bg">
<!-- HEADER -->
<div id="header">
<div class="row-1">
<div id="logo">
<h1><a href="Default.html">Церковь «Вифлеем»</a></h1>
<h5>Библейская Церковь Евангельских Христиан Баптистов «Вифлеем», г.Минск, Беларусь.</h5>
</div>
<div id="top-menu">
<!-- top menu -->
<ul>
	<li class="resources"><a href="Resources.html">Ресурсы</a></li>
	<li class="beliefs"><a href="Beliefs.html">Вероучение</a></li>
	<li class="cal"><a href="Calendar.html">Календарь</a></li>
	<li class="about"><a href="About.html">О нас</a></li>
</ul>
<!-- end menu -->
</div>
</div><div class="nav-pass"><table cellpadding="0" cellspacing="0" style="border-width:0;" ><tr><td><a href="Default.html">Главная</a> ></td><td  style="white-space:nowrap;">Новости</td></tr></table></div>
</div>
<!-- END HEADER -->
<!-- CONTENT -->
<div id="content"><div class="box"><div class="border-top"><div class="border-right"><div class="border-bot"><div class="border-left"><div class="left-top-corner"><div class="right-top-corner"><div class="right-bot-corner"><div class="left-bot-corner"><div class="wrapper">
<div class="col-caption">
<h3>РќРѕРІРсти</h3>
<h4>РќРѕРІРсти Серкви «Вифлеем»</h4>
</div>
<div class="col-caption"><h2>Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько</h2></div>
<div class="col-1">
 <div class="img-indent"><div class="img-box1"><img src="img/vflmNews/picture_14_syod.jpg" alt="picture_14_syod.jpg" /></div></div>
 <a href="News.html" class="link">читать все новости</a><br /><br />
</div>
<div class="col-2">
 <h4>20 марта 2010 г.</h4>
<b>20 марта 2010 года в нашей церкви прошел XIV съезд Союза евангельских христиан-баптистов Беларуси. </b><br /><p>В «Вифлееме» собрались представители всех церквей братства, всего 291 делегат.  По итогам съезда старший пресвитер нашей церкви Виктор Никодимович Крутько был избран новым председателем Союза ЕХБ Беларуси. Генеральным секретарем стал Николай Васильевич Синковец, который занимал пост председателя до этого. Заместителем председателя переизбран Иосиф Николаевич Рачковский.</p>                                                     
</div>
</div></div></div></div></div></div></div></div></div></div></div>
<!-- END CONTENT -->
<!-- FOOTER -->
<div id="footer">
<div class="indent">
<div class="wrapper"><div class="col-1">
	&nbsp;</div>
<div class="col-2">
	<ul>
		<li>
			<noindex><img alt="Минская Богословская Семинария" height="31" src="img/stxt/banner_seminary.jpg" width="88" /></noindex></li>
		<li>
			<noindex><img alt="Союз Евангельских Христиан-Баптистов" height="31" src="img/stxt/banner_baptist.jpg" width="88" /></a></noindex></li>
		<li>
			<noindex><img alt="журнал Крынiца Жыцця" height="31" src="img/stxt/banner_krinitsa.jpg" width="88" /></noindex></li>
		<li>
			<noindex><img alt="Евангелие и Реформация" height="31" src="../epbook.by/shop/images/epbook_banner_88x31.jpg" width="88" /></noindex></li>

	</ul>
</div>
<div class="col-3">
	<a href="index.html">Церковь &laquo;Вифлеем&raquo;</a> &copy; 2010 | <a href="SiteMap.html">Карта сайта</a></div>
</div>
</div>
</div>
<!-- END FOOTER -->
</div></div></div></div></div>
</form>
</body>
</html>

Open in new window

I need to get out only following divs boxes.<div class="col-caption">, <div class="col-1"> and <div class="col-2">
It would be nice to combine all ready files into one txt file with space between each other.

I think there is an easy way to do it in old school.

Appreciate your help.
LVL 16
SSupremeAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

SStoryCommented:
Get a copy of grep:
http://gnuwin32.sourceforge.net/packages/grep.htm

grep col-caption *.html

would return lines having col-caption in them.

grep is a very complex tool that can quickly find search terms in text files. It has a lot of options.

To see them type
grep --help

at the command line and hit enter.

example:
grep -i (case insentive)

Another thing to note is that you can use regular expressions with grep.
http://www.opensourceforu.com/2012/06/beginners-guide-gnu-grep-basics-regular-expressions/

Once you've built grep the way you want, put

> youroutputfilename.txt

at the end of it to write to a file.

Example:
grep 'col-caption\|col-1\|col-2'  > outputfile.txt

or

grep -H 'col-caption\|col-1\|col-2'  > outputfile.txt
0
SSupremeAuthor Commented:
'col-1\' is not recognized as an internal or external command,
operable program or batch file.

Open in new window

Thanks for you answer and help! I used grep before on linux, spent some time to install grep and  make it work. I cannot imagine grep is doing what I want. output multiple rows of same div.

Output should be like this:
<div class="col-caption">
<h3>РќРѕРІРсти</h3>
<h4>РќРѕРІРсти Серкви «Вифлеем»</h4>
</div>
<div class="col-caption"><h2>Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько</h2></div>
<div class="col-1">
 <div class="img-indent"><div class="img-box1"><img src="img/vflmNews/picture_14_syod.jpg" alt="picture_14_syod.jpg" /></div></div>
 <a href="News.html" class="link">читать все новости</a><br /><br />
</div>
<div class="col-2">
 <h4>20 марта 2010 г.</h4>
<b>20 марта 2010 года в нашей церкви прошел XIV съезд Союза евангельских христиан-баптистов Беларуси. </b><br /><p>В «Вифлееме» собрались представители всех церквей братства, всего 291 делегат.  По итогам съезда старший пресвитер нашей церкви Виктор Никодимович Крутько был избран новым председателем Союза ЕХБ Беларуси. Генеральным секретарем стал Николай Васильевич Синковец, который занимал пост председателя до этого. Заместителем председателя переизбран Иосиф Николаевич Рачковский.</p>                                                     
</div>

Open in new window

0
SSupremeAuthor Commented:
Well, I spent sometime learning and practicing but with no luck. I feel like it is a tiny part of solution.
Like in Excel you can use FIND command to locate first and last character, and MID to return content between locations.
0
Learn SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

SStoryCommented:
Worked for me in Linux.  Try double quotes " " instead and see if that does anything.
0
SStoryCommented:
Grep can match and output every line containing the word in the word list.  -A5 option would output the 5lines after that match too.



Example:
grep 'col-caption\|col-1\|col-2'  News*.* > output.txt

Or grep '"col-caption\|col-1\|col-2'" News*.* > output.txt

Or grep -A5 'col-caption\|col-1\|col-2'  News*.* > output.txt

Or grep  -A5 '"col-caption\|col-1\|col-2'" News*.* > output.txt

Now if it must go until it finds the ending div tag, that is another story.  Then you could use awk.  Or write a simple parser in VB or something.

The News*.* should grep all your News files. I had forgotten to specify what to grep before.
0
SSupremeAuthor Commented:
Now if it must go until it finds the ending div tag, that is another story.
Looks like I am looking for another story.
I thought I can get solution in few hours, but as usual no solution in few days.
I know I can learn grep, sed and awk, it would take few days or as I use grep, awk once a year, It would take few days to process those files manually. While I will be doing it, I will think about computer as something that hard to communicate and that is why I cannot my life simple.
0
SStoryCommented:
Well, those tools already exist. I'll admit they can take some time to master. The alternative is to write a simple program that reads input one line at a time in something like vb.net or C or whatever  If it finds
<div as the start of the text and has id=whatever then set flag indiv=true),
output that line and all other lines until you hit a line that just starts with
</div>

That's the quick method I'd use.   So either you have to use tools like grep and awk that can be tricky, or write your own.  For me, by far the simplest would be to write my own in whatever language.. Read each line, output appropriate lines until hitting a closing div and start over at next find.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
SSupremeAuthor Commented:
Thanks, Ill try PHP and will place solution here.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft DOS

From novice to tech pro — start learning today.