spread sheet

I think I'm asking this in the right place, I don't know where else to ask this.

I need to place a text file into a spread sheet  so I can get only the characters that I want out.

I intend to use bash shell and I need to make a script to do this from a cron job.

Point me in the right direction if you know how to do this.

I think gawk may work for me if I can get everything separated.
LVL 1
Ted22Asked:
Who is Participating?
 
Karl Heinz KremerConnect With a Mentor Commented:
I'm sorry, I probably should have tried to access the page and look at the data. I just assumed that "picture" meant that you did a screen capture or something similar.

Here is a very simple XSL script that only lists the contents of the table at the top of the page, field by field:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:param name="delim" select="':'"/>   <!-- tab     -->
<xsl:param name="nl"    select="'&#xA;'"/>   <!-- newline -->
<xsl:template match="/">
    <xsl:for-each select="html/body/center/table/tr/td/table/tr">
        <xsl:for-each select="td">
               <xsl:value-of select="."/>
               <xsl:value-of select="$nl"/>
        </xsl:for-each>
    </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Save this to a file (e.g. extract_table.xsl).
/usr/bin/wget -O- http://www.htd.dns2go.com/pic.html > pic.html
/usr/local/bin/tidy -asxml -o pic.xml pic.html
/usr/bin/xsltproc --html extract_table.xsl pic.xml

This will print something like this:

Cur Oper Mode
Iss Capture All
Disabled
Drupla Capture
Disabled
Iss Disk Free Cap:
0

Cur Oper Mode
Iss Capture All
Disabled
Drupla Capture
Disabled
Iss Disk Free Cap:
0

ISS/OCR

Oss Err Monitor
Enabled
Ocr Data Collect
Disabled
Iss Disk Free (MB):
999


ISS/OCR

Oss Err Monitor
Enabled
Ocr Data Collect
Disabled
Iss Disk Free (MB):
999
BC Printer
Enabled
Ocr Engines
Enabled
Zip+4 Dir Ver:
4162004

BC Printer
Enabled
Ocr Engines
Enabled
Zip+4 Dir Ver:
4162004
Address Block
Pre
BC Reader
Enabled
...

You can of course make the XSL more complex and format the data so that you have e.g. key value pairs per line:

key1: value1
key2: value2
...

0
 
jlevieCommented:
What's the format of the text file and what part of it are you interested in?
0
 
Ted22Author Commented:
The text file is an html page with numbers that change on it.

curl foo.html > foo.html
sed -e 's/<[^>]*>//g' foo.html  > file

What I would really like to be able to do is put each character on the page in a cell and just take out what I want. If I could do something like this I could use it other places too. (reports) etc...

I've run into some problems  with control characters with some things I've tried.

The end result is to put each number in a file, and then add it to an array to build a dynamic graph
with php.

0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
jlevieConnect With a Mentor Commented:
I'd guess that the format of the file is constant and it's just data within that format that changes. If that's the case it would make more sense to me to parse of the requisite data with a Perl or PHP script and build the array for the graph. That's a more flexible method and you can deal with any control characters and the HTML tags in a sensible manner.
0
 
Karl Heinz KremerCommented:
If you have a HTML page, you should really try to parse the HTML code. I use xsltproc with the --html parameter for this. Unfortunately HTML can be pretty far from the standard, and a web browser will still do the right thing. xsltproc will however have problems with such files, therefore I'm cleaning up the HTML with tidy. These lines are from a Perl script that I use to extract data from a web page:

system("/usr/bin/wget -O- http://server/path/to/web_page.html | /usr/local/bin/tidy -asxml -o /tmp/tidy_$$.xml") || die "Cannot retrieve HTML file";

open(INPUT, "/usr/bin/xsltproc --html /path/to/xsl_file.xsl /tmp/tidy_$$.xml |");

while (<INPUT>)
{
    # do something with the data you've extracted
}
close(INPUT);
system("/bin/rm -rf /tmp/tidy_$$.xml");

THe output of xsltproc can either be a XML file, or text output. I'm using text output by using the <xsl:output method="text" /> statement in the xsl file.

You need to learn some xsl however for this to work. If you have a sample of the HTML code I can probably give you some pointers.

0
 
Karl Heinz KremerCommented:
Forgot the links:

xsltproc is part of libxslt (http://xmlsoft.org/XSLT/)
Tidy can be found here: http://tidy.sourceforge.net/
0
 
Ted22Author Commented:
A copy of the page can be seen here.

http://www.htd.dns2go.com/pic.html

If you can give me some pointers that would be fine, I'll need them.
0
 
Karl Heinz KremerCommented:
A picture of a web page and the actual HTML code are two totally different things. Please post a link to the actual page, or post the HTML code here. You can capture the page with the wget, or lynx, or save the HTML from within your browser.

Have you ever worked with XSL?
0
 
Ted22Author Commented:
This is a copy of the web page. Picture was the wrong word to use.
I uploaded it to my home computer because it is not accessible from the internet.
I will post the code in the next comment.

I have never worked with XSL.
I downloaded tidy and xsltproc is included in Fedora core 1.

If I understand this at all and I'm not sure I do;
I want to parse the page through a template that will get what I want off the page.
0
 
Ted22Author Commented:
<HTML>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HEAD>
<meta http-equiv='REFRESH' content='30'>
<meta http-equiv='EXPIRE' content='-1'>
</HEAD>
<body bgcolor=#336699 text=#FFFFFF topmargin=0 bottommargin=0 marginheight=0 marginwidth=0 leftmargin=0 rightmargin=0>
<CENTER>
<TABLE width=100% cellspacing=0 cellpading=0 border=1>
  <TR><TD>
<TABLE width=100% cellspacing=0 border=0 bgcolor=#CCCCCC>
  <TR><TD><FONT color=black size=-1>Cur Oper Mode</TD>                  <TD><FONT color=black size=-1>Iss Capture All</TD><TD><FONT color=blue size=-1>Disabled</TD><TD><FONT color=black size=-1>Drupla Capture</TD><TD><FONT color=blue size=-1>Disabled</TD><TD><FONT color=black size=-1>Iss Disk Free Cap:</TD><TD><FONT color=blue size=-1>0</TD></TR>
  <TR><TD rowspan=2><FONT color=blue><H2>ISS/OCR</TD><TD><FONT color=black size=-1>Oss Err Monitor</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Ocr Data Collect</TD><TD><FONT color=blue size=-1>Disabled</TD><TD><FONT color=black size=-1>Iss Disk Free (MB):</TD><TD><FONT color=blue size=-1>999</TD></TR>
  <TR>                                                          <TD><FONT color=black size=-1>BC Printer</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Ocr Engines</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Zip+4 Dir Ver:</TD><TD><FONT color=blue size=-1>4162004</TD></TR>
  <TR><TD><FONT color=black size=-1>Address Block</TD><TD><FONT color=black size=-1><FONT color=black size=-1>Pre BC Reader</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Iss Time Sgmt:</TD><TD><FONT color=blue size=-1>33</TD><TD><FONT color=black size=-1>Dsu Cnnct State:</TD><TD><FONT color=blue size=-1>Disconnected</TD></TR>
  <TR><TD rowspan=4><TABLE width=100% cellspacing=0 border=0><TR><TD><FONT color=black size=-1>Left:</TD><TD><FONT color=blue size=-1>0</TD></TR><TR><TD><FONT color=black size=-1>Right:</TD><TD><FONT color=blue size=-1>0</TD></TR><TR><TD><FONT color=black size=-1>Top:</TD><TD><FONT color=blue size=-1>0</TD></TR><TR><TD><FONT color=black size=-1>Botm:</TD><TD><FONT color=blue size=-1>0</TD></TR></TABLE></TD>
      <TD><FONT color=black size=-1>BC Verifier</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Iss Machine ID:</TD><TD><FONT color=blue size=-1>2603</TD><TD><FONT color=black size=-1>Iss Cnnct State:</TD><TD><FONT color=blue size=-1>Connected</TD></TR>
  <TR><TD><FONT color=black size=-1>IDTag Printer</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Iss Disk Images:</TD><TD><FONT color=blue size=-1>0</TD><TD><FONT color=black size=-1>Pics Cnnct State:</TD><TD><FONT color=blue size=-1>Disconnected</TD></TR>
  <TR><TD><FONT color=black size=-1>IDTag Reader</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Average Img Size:</TD><TD><FONT color=blue size=-1>0</TD><TD><FONT color=black size=-1>Ucp Cnnct State:</TD><TD><FONT color=blue size=-1>Connected</TD></TR>
  <TR><TD><FONT color=black size=-1>IDTag Verifier</TD><TD><FONT color=blue size=-1>Enabled</TD><TD>&nbsp;</TD><TD>&nbsp;</TD><TD><FONT color=black size=-1>Mail Class:</TD><TD><FONT color=blue size=-1>1</TD></TR>
</TABLE>
</TD></TR>
  <TR><TD bgcolor=000066><FONT color=white>Online Mail Processing - ISS/OCR Current User Level:</TD></TR>
  <TR><TD>
<TABLE bgcolor=darkcyan width=100% cellspacing=1 cellpadding=1 border=0 bordercolor=orange>
  <TR>
    <TD colspan=3>
      <TABLE width=100% border=0>
        <TR>
          <TD><B><FONT color=00FFFF>04/24/04</FONT></B></TD>
          <TD><B><FONT color=00FFFF>16:51:03</FONT></B></TD>
          <TD align=right><B>Sortplan:</B></TD>
          <TD><B><FONT size=3 color=00FFFF>281DIOSS.EBF</FONT></B></TD>
        </TR>
        <TR>
          <TD align=right colspan=3><B>Date:</B></TD>
          <TD><B><FONT color=00FFFF>03/29/04</FONT></B></TD>
        </TR>
      </TABLE>
    </TD>
  </TR>
  <TR>
    <TD colspan=2>
      <TABLE border=0>
        <TR>
          <TD><B>GAR:</B></TD>
          <TD>
<TABLE width=400 cellspacing=0 cellpadding=0 border=1 borderColor=black>
  <TR borderColorDark=black borderColorLight=white>
    <TD width=68.1003% align=center bgcolor=green><FONT color=#FFFFFF>68</FONT></TD>
    <TD width=0.074151% align=center bgcolor=CCCCCC></TD>
    <TD width=0.163132% align=center bgcolor=66FFCC></TD>
    <TD width=0.0296604% align=center bgcolor=black></TD>
    <TD width=31.6328% align=center bgcolor=FF00FF><FONT color=#FFFFFF>31</FONT></TD>
  </TR>
</TABLE>
          </TD>
          <TD>%</TD>
        </TR>
        <TR>
          <TD><B>Last 100:</B></TD>
          <TD>
<TABLE width=400 cellspacing=0 cellpadding=0 border=1 borderColor=black>
  <TR borderColorDark=black borderColorLight=white>
    <TD width=65% align=center bgcolor=green><FONT color=#FFFFFF>65</FONT></TD>
    <TD width=35% align=center bgcolor=FF00FF><FONT color=#FFFFFF>35</FONT></TD>
  </TR>
</TABLE>
          </TD>
          <TD>%</TD>
        </TR>
      </TABLE>
    </TD>
    <TD>
<TABLE border=0 width=100%>
  <TR><TD align=right width=60%><B>Current Run:</B></TD><TD><B><FONT SIZE=3 COLOR=00FFFF>1</FONT></B></TD></TR>
  <TR><TD align=right><B>OEE:</B></TD><TD><B><FONT SIZE=3 COLOR=00FFFF>41.37</FONT></B></TD></TR>
</TABLE>
    </TD>
  </TR>
  <TR>
    <TD>
<TABLE border=0 width=100%>
  <TR><TD><BR></TD><TD><B>Total</B></TD><TD><B>Last 100</B></TD></TR>
  <TR><TD align=right><B>Fed</B></TD><TD bgcolor=blue><B><font color=#FFFFFF>6743</font></B></TD><TD bgcolor=blue><B><font color=#FFFFFF>100</font></B></TD></TR>
  <TR><TD align=right><B>Accepted</B></TD><TD bgcolor=green><B><font color=#FFFFFF>4592</font></B></TD><TD bgcolor=green><B><font color=#FFFFFF>65</font></B></TD></TR>
  <TR><TD align=right><B>Non Read</B></TD><TD bgcolor=purple><B><font color=#FFFFFF>0</font></B></TD><TD bgcolor=purple><B><font color=#FFFFFF>0</font></B></TD></TR>
  <TR><TD align=right><B>No Code</B></TD><TD bgcolor=003399><B><font color=#FFFFFF>0</font></B></TD><TD bgcolor=003399><B><font color=#FFFFFF>0</font></B></TD></TR>
  <TR><TD align=right><B>Out of Plan</B></TD><TD bgcolor=yellow><B><font color=#000000>0</font></B></TD><TD bgcolor=yellow><B><font color=#000000>0</font></B></TD></TR>
  <TR><TD align=right><B>BC Print Err</B></TD><TD bgcolor=CCCCCC><B><font color=#FFFFFF>5</font></B></TD><TD bgcolor=CCCCCC><B><font color=#FFFFFF>0</font></B></TD></TR>
  <TR><TD align=right><B>ID Tag Print</B></TD><TD bgcolor=66FFCC><B><font color=#FFFFFF>11</font></B></TD><TD bgcolor=66FFCC><B><font color=#FFFFFF>0</font></B></TD></TR>
  <TR><TD align=right><B>Mech/Tracking</B></TD><TD bgcolor=black><B><font color=#FFFFFF>2</font></B></TD><TD bgcolor=black><B><font color=#FFFFFF>0</font></B></TD></TR>
  <TR><TD align=right><B>OCR Rejects</B></TD><TD bgcolor=FF00FF><B><font color=#FFFFFF>2133</font></B></TD><TD bgcolor=FF00FF><B><font color=#FFFFFF>35</font></B></TD></TR>
</TABLE>
    </TD>
    <TD>
<TABLE border=0 width=100%>
  <TR><TD align=right><B>GAR:</B></TD><TD><B><FONT color=00FFFF>96</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>MAR:</B></TD><TD><B><FONT color=00FFFF>100</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>9/11:</B></TD><TD><B><FONT color=00FFFF>45</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>VER:</B></TD><TD><B><FONT color=00FFFF>0</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>TER:</B></TD><TD><B><FONT color=00FFFF>0</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>ORR:</B></TD><TD><B><FONT color=00FFFF>37</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>GAP:</B></TD><TD><B><FONT color=00FFFF>134</FONT></B></TD><TD>mm</TD></TR>
  <TR><TD colspan=2 align=center><B><U>Throughput:</U></B></TD></TR>
  <TR><TD align=right><B>RUN:</B></TD><TD><B><FONT color=00FFFF>29895</FONT></B></TD></TR>
  <TR><TD align=right><B>OP:</B></TD><TD><B><FONT color=00FFFF>17413</FONT></B></TD></TR>
</TABLE>
    </TD>
    <TD>
<TABLE border=1 width=100%>
<TR><TD>
  <TABLE border=0 width=100%>
    <TR><TD align=right width=60%><B>E-Stops:</B></TD><TD><B><FONT COLOR=00FFFF>0</FONT></B></TD></TR>
    <TR><TD align=right><B>Pocket Full:</B></TD><TD><B><FONT COLOR=00FFFF>2</FONT></B></TD></TR>
    <TR><TD align=right><B>Jams:</B></TD><TD><B><FONT COLOR=00FFFF>1</FONT></B></TD></TR>
    <TR><TD align=right><B>Seq Stops:</B></TD><TD><B><FONT COLOR=00FFFF>0</FONT></B></TD></TR>
  </TABLE>
</TD></TR>
<TR><TD>
  <TABLE border=0 width=100%>
    <TR><TD align=right width=60%><B>RPMS:</B></TD><TD><B><FONT COLOR=00FFFF>Disable</FONT></B></TD></TR>
    <TR><TD align=right><B>DAS:</B></TD><TD><B><FONT COLOR=00FFFF>Disable</FONT></B></TD></TR>
    <TR><TD align=right><B>OOS Pkt Seq:</B></TD><TD><B><FONT COLOR=00FFFF>Enabled</FONT></B></TD></TR>
    <TR><TD align=right><B>OOS Pkt Delta:</B></TD><TD><B><FONT COLOR=00FFFF>Enabled</FONT></B></TD></TR>
   <TR><TD align=right><B>CMD:</B></TD><TD><B><FONT COLOR=00FFFF>Disable</FONT></B></TD></TR>
    <TR><TD align=right><B>Lift All/RTS:</B></TD><TD><B><FONT COLOR=00FFFF>Disable</FONT></B></TD></TR>
  </TABLE>
</TD></TR>
</TABLE>
    </TD>
  </TR>
</TABLE></TD></TR>
</TABLE>
</CENTER>
</BODY>
</HTML>
0
 
da99rmdCommented:
What things do you want from the page ?

/Rob
0
 
Ted22Author Commented:

I found by using tidy the page gets put in order.

Using this php script it removes the tags.
<?
$page = "pic2.html";
if (file_exists($page)) {
$fp = fopen($page, "r");
$contents = fread($fp, filesize($page));
echo strip_tags($contents);
fclose($fp);
}
?>

php -q pic.html > FileWithoutTags

Now I can grep what I need and colrm all but the numbers.

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.