Avatar of SSupreme
SSupreme
Flag for Belarus asked on

trim files in CMD and concatenate to one file

Hello everyone,

I have folder in which I have allot of different files.
I want to trim out same section in each of XXX specific files.
The name of specific files start from News-x end at News-xxx.
Content of file is like this
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<meta http-equiv="content-type" content="text/html;charset=utf-8" />
<head>
<meta content="DYNAMIC" name="DOCUMENT-STATE" />
<meta content="Copyright (c) 'Церковь Вифлеем' - info@vflm.by" name="Copyright" />
<link href="http://xxxxxxxxx.com/App_Themes/main/layout.css" type="text/css" rel="stylesheet" />
<link href="http://xxxxxxxxx.com/App_Themes/main/style.css" type="text/css" rel="stylesheet" />
<link href="http://xxxxxxxxx.com/App_Themes/main/tables.css" type="text/css" rel="stylesheet" />
<meta content="Заметки церкви Вифлеем, Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько" name="keywords" />
<meta content="Новости официального сайта церкви Вифлеем, 20 марта 2010 года в нашей церкви прошел XIV съезд Союза евангельских христиан-баптистов Беларуси." name="description" />
<link rel="icon" href="favicon.ico" type="image/x-icon" />
<link rel="shortcut icon" href="favicon.ico" type="image/x-icon" />
<meta content="info@xxxxxxxxx.com" name="Reply-to" />
<title>
	Новости - Церковь Вифлеем - Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько
</title></head>
<body id="news_page">
<form method="post" action="http://xxxxxxxxx.com/News-xxx.aspx" id="aspnetForm_news">
<div>
</div>
<div id="main_wrapper"><div class="main_wr_l"><div class="main_wr_r"><div id="main"><div class="body-bg">
<!-- HEADER -->
<div id="header">
<div class="row-1">
<div id="logo">
<h1><a href="Default.html">Церковь «Вифлеем»</a></h1>
<h5>Библейская Церковь Евангельских Христиан Баптистов «Вифлеем», г.Минск, Беларусь.</h5>
</div>
<div id="top-menu">
<!-- top menu -->
<ul>
	<li class="resources"><a href="Resources.html">Ресурсы</a></li>
	<li class="beliefs"><a href="Beliefs.html">Вероучение</a></li>
	<li class="cal"><a href="Calendar.html">Календарь</a></li>
	<li class="about"><a href="About.html">О нас</a></li>
</ul>
<!-- end menu -->
</div>
</div><div class="nav-pass"><table cellpadding="0" cellspacing="0" style="border-width:0;" ><tr><td><a href="Default.html">Главная</a> ></td><td  style="white-space:nowrap;">Новости</td></tr></table></div>
</div>
<!-- END HEADER -->
<!-- CONTENT -->
<div id="content"><div class="box"><div class="border-top"><div class="border-right"><div class="border-bot"><div class="border-left"><div class="left-top-corner"><div class="right-top-corner"><div class="right-bot-corner"><div class="left-bot-corner"><div class="wrapper">
<div class="col-caption">
<h3>РќРѕРІРсти</h3>
<h4>РќРѕРІРсти Серкви «Вифлеем»</h4>
</div>
<div class="col-caption"><h2>Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько</h2></div>
<div class="col-1">
 <div class="img-indent"><div class="img-box1"><img src="img/vflmNews/picture_14_syod.jpg" alt="picture_14_syod.jpg" /></div></div>
 <a href="News.html" class="link">читать все новости</a><br /><br />
</div>
<div class="col-2">
 <h4>20 марта 2010 г.</h4>
<b>20 марта 2010 года в нашей церкви прошел XIV съезд Союза евангельских христиан-баптистов Беларуси. </b><br /><p>В «Вифлееме» собрались представители всех церквей братства, всего 291 делегат.  По итогам съезда старший пресвитер нашей церкви Виктор Никодимович Крутько был избран новым председателем Союза ЕХБ Беларуси. Генеральным секретарем стал Николай Васильевич Синковец, который занимал пост председателя до этого. Заместителем председателя переизбран Иосиф Николаевич Рачковский.</p>                                                     
</div>
</div></div></div></div></div></div></div></div></div></div></div>
<!-- END CONTENT -->
<!-- FOOTER -->
<div id="footer">
<div class="indent">
<div class="wrapper"><div class="col-1">
	&nbsp;</div>
<div class="col-2">
	<ul>
		<li>
			<noindex><img alt="Минская Богословская Семинария" height="31" src="img/stxt/banner_seminary.jpg" width="88" /></noindex></li>
		<li>
			<noindex><img alt="Союз Евангельских Христиан-Баптистов" height="31" src="img/stxt/banner_baptist.jpg" width="88" /></a></noindex></li>
		<li>
			<noindex><img alt="журнал Крынiца Жыцця" height="31" src="img/stxt/banner_krinitsa.jpg" width="88" /></noindex></li>
		<li>
			<noindex><img alt="Евангелие и Реформация" height="31" src="../epbook.by/shop/images/epbook_banner_88x31.jpg" width="88" /></noindex></li>

	</ul>
</div>
<div class="col-3">
	<a href="index.html">Церковь &laquo;Вифлеем&raquo;</a> &copy; 2010 | <a href="SiteMap.html">Карта сайта</a></div>
</div>
</div>
</div>
<!-- END FOOTER -->
</div></div></div></div></div>
</form>
</body>
</html>

Open in new window

I need to get out only following divs boxes.<div class="col-caption">, <div class="col-1"> and <div class="col-2">
It would be nice to combine all ready files into one txt file with space between each other.

I think there is an easy way to do it in old school.

Appreciate your help.
Microsoft DOSWindows BatchWindows OS

Avatar of undefined
Last Comment
SSupreme

8/22/2022 - Mon
SStory

Get a copy of grep:
http://gnuwin32.sourceforge.net/packages/grep.htm

grep col-caption *.html

would return lines having col-caption in them.

grep is a very complex tool that can quickly find search terms in text files. It has a lot of options.

To see them type
grep --help

at the command line and hit enter.

example:
grep -i (case insentive)

Another thing to note is that you can use regular expressions with grep.
http://www.opensourceforu.com/2012/06/beginners-guide-gnu-grep-basics-regular-expressions/

Once you've built grep the way you want, put

> youroutputfilename.txt

at the end of it to write to a file.

Example:
grep 'col-caption\|col-1\|col-2'  > outputfile.txt

or

grep -H 'col-caption\|col-1\|col-2'  > outputfile.txt
SSupreme

ASKER
'col-1\' is not recognized as an internal or external command,
operable program or batch file.

Open in new window

Thanks for you answer and help! I used grep before on linux, spent some time to install grep and  make it work. I cannot imagine grep is doing what I want. output multiple rows of same div.

Output should be like this:
<div class="col-caption">
<h3>РќРѕРІРсти</h3>
<h4>РќРѕРІРсти Серкви «Вифлеем»</h4>
</div>
<div class="col-caption"><h2>Новый епископ Союза ЕХБ Беларуси – В.Н. Крутько</h2></div>
<div class="col-1">
 <div class="img-indent"><div class="img-box1"><img src="img/vflmNews/picture_14_syod.jpg" alt="picture_14_syod.jpg" /></div></div>
 <a href="News.html" class="link">читать все новости</a><br /><br />
</div>
<div class="col-2">
 <h4>20 марта 2010 г.</h4>
<b>20 марта 2010 года в нашей церкви прошел XIV съезд Союза евангельских христиан-баптистов Беларуси. </b><br /><p>В «Вифлееме» собрались представители всех церквей братства, всего 291 делегат.  По итогам съезда старший пресвитер нашей церкви Виктор Никодимович Крутько был избран новым председателем Союза ЕХБ Беларуси. Генеральным секретарем стал Николай Васильевич Синковец, который занимал пост председателя до этого. Заместителем председателя переизбран Иосиф Николаевич Рачковский.</p>                                                     
</div>

Open in new window

SSupreme

ASKER
Well, I spent sometime learning and practicing but with no luck. I feel like it is a tiny part of solution.
Like in Excel you can use FIND command to locate first and last character, and MID to return content between locations.
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
SStory

Worked for me in Linux.  Try double quotes " " instead and see if that does anything.
SStory

Grep can match and output every line containing the word in the word list.  -A5 option would output the 5lines after that match too.



Example:
grep 'col-caption\|col-1\|col-2'  News*.* > output.txt

Or grep '"col-caption\|col-1\|col-2'" News*.* > output.txt

Or grep -A5 'col-caption\|col-1\|col-2'  News*.* > output.txt

Or grep  -A5 '"col-caption\|col-1\|col-2'" News*.* > output.txt

Now if it must go until it finds the ending div tag, that is another story.  Then you could use awk.  Or write a simple parser in VB or something.

The News*.* should grep all your News files. I had forgotten to specify what to grep before.
SSupreme

ASKER
Now if it must go until it finds the ending div tag, that is another story.
Looks like I am looking for another story.
I thought I can get solution in few hours, but as usual no solution in few days.
I know I can learn grep, sed and awk, it would take few days or as I use grep, awk once a year, It would take few days to process those files manually. While I will be doing it, I will think about computer as something that hard to communicate and that is why I cannot my life simple.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
SStory

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
SSupreme

ASKER
Thanks, Ill try PHP and will place solution here.