Solved

spread sheet

Posted on 2004-04-23
12
529 Views
Last Modified: 2010-04-22
I think I'm asking this in the right place, I don't know where else to ask this.

I need to place a text file into a spread sheet  so I can get only the characters that I want out.

I intend to use bash shell and I need to make a script to do this from a cron job.

Point me in the right direction if you know how to do this.

I think gawk may work for me if I can get everything separated.
0
Comment
Question by:Ted22
  • 5
  • 4
  • 2
  • +1
12 Comments
 
LVL 40

Expert Comment

by:jlevie
ID: 10906087
What's the format of the text file and what part of it are you interested in?
0
 
LVL 1

Author Comment

by:Ted22
ID: 10906167
The text file is an html page with numbers that change on it.

curl foo.html > foo.html
sed -e 's/<[^>]*>//g' foo.html  > file

What I would really like to be able to do is put each character on the page in a cell and just take out what I want. If I could do something like this I could use it other places too. (reports) etc...

I've run into some problems  with control characters with some things I've tried.

The end result is to put each number in a file, and then add it to an array to build a dynamic graph
with php.

0
 
LVL 40

Assisted Solution

by:jlevie
jlevie earned 50 total points
ID: 10906288
I'd guess that the format of the file is constant and it's just data within that format that changes. If that's the case it would make more sense to me to parse of the requisite data with a Perl or PHP script and build the array for the graph. That's a more flexible method and you can deal with any control characters and the HTML tags in a sensible manner.
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 10906878
If you have a HTML page, you should really try to parse the HTML code. I use xsltproc with the --html parameter for this. Unfortunately HTML can be pretty far from the standard, and a web browser will still do the right thing. xsltproc will however have problems with such files, therefore I'm cleaning up the HTML with tidy. These lines are from a Perl script that I use to extract data from a web page:

system("/usr/bin/wget -O- http://server/path/to/web_page.html | /usr/local/bin/tidy -asxml -o /tmp/tidy_$$.xml") || die "Cannot retrieve HTML file";

open(INPUT, "/usr/bin/xsltproc --html /path/to/xsl_file.xsl /tmp/tidy_$$.xml |");

while (<INPUT>)
{
    # do something with the data you've extracted
}
close(INPUT);
system("/bin/rm -rf /tmp/tidy_$$.xml");

THe output of xsltproc can either be a XML file, or text output. I'm using text output by using the <xsl:output method="text" /> statement in the xsl file.

You need to learn some xsl however for this to work. If you have a sample of the HTML code I can probably give you some pointers.

0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 10906888
Forgot the links:

xsltproc is part of libxslt (http://xmlsoft.org/XSLT/)
Tidy can be found here: http://tidy.sourceforge.net/
0
 
LVL 1

Author Comment

by:Ted22
ID: 10910593
A copy of the page can be seen here.

http://www.htd.dns2go.com/pic.html

If you can give me some pointers that would be fine, I'll need them.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 10911279
A picture of a web page and the actual HTML code are two totally different things. Please post a link to the actual page, or post the HTML code here. You can capture the page with the wget, or lynx, or save the HTML from within your browser.

Have you ever worked with XSL?
0
 
LVL 1

Author Comment

by:Ted22
ID: 10912855
This is a copy of the web page. Picture was the wrong word to use.
I uploaded it to my home computer because it is not accessible from the internet.
I will post the code in the next comment.

I have never worked with XSL.
I downloaded tidy and xsltproc is included in Fedora core 1.

If I understand this at all and I'm not sure I do;
I want to parse the page through a template that will get what I want off the page.
0
 
LVL 1

Author Comment

by:Ted22
ID: 10912858
<HTML>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HEAD>
<meta http-equiv='REFRESH' content='30'>
<meta http-equiv='EXPIRE' content='-1'>
</HEAD>
<body bgcolor=#336699 text=#FFFFFF topmargin=0 bottommargin=0 marginheight=0 marginwidth=0 leftmargin=0 rightmargin=0>
<CENTER>
<TABLE width=100% cellspacing=0 cellpading=0 border=1>
  <TR><TD>
<TABLE width=100% cellspacing=0 border=0 bgcolor=#CCCCCC>
  <TR><TD><FONT color=black size=-1>Cur Oper Mode</TD>                  <TD><FONT color=black size=-1>Iss Capture All</TD><TD><FONT color=blue size=-1>Disabled</TD><TD><FONT color=black size=-1>Drupla Capture</TD><TD><FONT color=blue size=-1>Disabled</TD><TD><FONT color=black size=-1>Iss Disk Free Cap:</TD><TD><FONT color=blue size=-1>0</TD></TR>
  <TR><TD rowspan=2><FONT color=blue><H2>ISS/OCR</TD><TD><FONT color=black size=-1>Oss Err Monitor</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Ocr Data Collect</TD><TD><FONT color=blue size=-1>Disabled</TD><TD><FONT color=black size=-1>Iss Disk Free (MB):</TD><TD><FONT color=blue size=-1>999</TD></TR>
  <TR>                                                          <TD><FONT color=black size=-1>BC Printer</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Ocr Engines</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Zip+4 Dir Ver:</TD><TD><FONT color=blue size=-1>4162004</TD></TR>
  <TR><TD><FONT color=black size=-1>Address Block</TD><TD><FONT color=black size=-1><FONT color=black size=-1>Pre BC Reader</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Iss Time Sgmt:</TD><TD><FONT color=blue size=-1>33</TD><TD><FONT color=black size=-1>Dsu Cnnct State:</TD><TD><FONT color=blue size=-1>Disconnected</TD></TR>
  <TR><TD rowspan=4><TABLE width=100% cellspacing=0 border=0><TR><TD><FONT color=black size=-1>Left:</TD><TD><FONT color=blue size=-1>0</TD></TR><TR><TD><FONT color=black size=-1>Right:</TD><TD><FONT color=blue size=-1>0</TD></TR><TR><TD><FONT color=black size=-1>Top:</TD><TD><FONT color=blue size=-1>0</TD></TR><TR><TD><FONT color=black size=-1>Botm:</TD><TD><FONT color=blue size=-1>0</TD></TR></TABLE></TD>
      <TD><FONT color=black size=-1>BC Verifier</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Iss Machine ID:</TD><TD><FONT color=blue size=-1>2603</TD><TD><FONT color=black size=-1>Iss Cnnct State:</TD><TD><FONT color=blue size=-1>Connected</TD></TR>
  <TR><TD><FONT color=black size=-1>IDTag Printer</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Iss Disk Images:</TD><TD><FONT color=blue size=-1>0</TD><TD><FONT color=black size=-1>Pics Cnnct State:</TD><TD><FONT color=blue size=-1>Disconnected</TD></TR>
  <TR><TD><FONT color=black size=-1>IDTag Reader</TD><TD><FONT color=blue size=-1>Enabled</TD><TD><FONT color=black size=-1>Average Img Size:</TD><TD><FONT color=blue size=-1>0</TD><TD><FONT color=black size=-1>Ucp Cnnct State:</TD><TD><FONT color=blue size=-1>Connected</TD></TR>
  <TR><TD><FONT color=black size=-1>IDTag Verifier</TD><TD><FONT color=blue size=-1>Enabled</TD><TD>&nbsp;</TD><TD>&nbsp;</TD><TD><FONT color=black size=-1>Mail Class:</TD><TD><FONT color=blue size=-1>1</TD></TR>
</TABLE>
</TD></TR>
  <TR><TD bgcolor=000066><FONT color=white>Online Mail Processing - ISS/OCR Current User Level:</TD></TR>
  <TR><TD>
<TABLE bgcolor=darkcyan width=100% cellspacing=1 cellpadding=1 border=0 bordercolor=orange>
  <TR>
    <TD colspan=3>
      <TABLE width=100% border=0>
        <TR>
          <TD><B><FONT color=00FFFF>04/24/04</FONT></B></TD>
          <TD><B><FONT color=00FFFF>16:51:03</FONT></B></TD>
          <TD align=right><B>Sortplan:</B></TD>
          <TD><B><FONT size=3 color=00FFFF>281DIOSS.EBF</FONT></B></TD>
        </TR>
        <TR>
          <TD align=right colspan=3><B>Date:</B></TD>
          <TD><B><FONT color=00FFFF>03/29/04</FONT></B></TD>
        </TR>
      </TABLE>
    </TD>
  </TR>
  <TR>
    <TD colspan=2>
      <TABLE border=0>
        <TR>
          <TD><B>GAR:</B></TD>
          <TD>
<TABLE width=400 cellspacing=0 cellpadding=0 border=1 borderColor=black>
  <TR borderColorDark=black borderColorLight=white>
    <TD width=68.1003% align=center bgcolor=green><FONT color=#FFFFFF>68</FONT></TD>
    <TD width=0.074151% align=center bgcolor=CCCCCC></TD>
    <TD width=0.163132% align=center bgcolor=66FFCC></TD>
    <TD width=0.0296604% align=center bgcolor=black></TD>
    <TD width=31.6328% align=center bgcolor=FF00FF><FONT color=#FFFFFF>31</FONT></TD>
  </TR>
</TABLE>
          </TD>
          <TD>%</TD>
        </TR>
        <TR>
          <TD><B>Last 100:</B></TD>
          <TD>
<TABLE width=400 cellspacing=0 cellpadding=0 border=1 borderColor=black>
  <TR borderColorDark=black borderColorLight=white>
    <TD width=65% align=center bgcolor=green><FONT color=#FFFFFF>65</FONT></TD>
    <TD width=35% align=center bgcolor=FF00FF><FONT color=#FFFFFF>35</FONT></TD>
  </TR>
</TABLE>
          </TD>
          <TD>%</TD>
        </TR>
      </TABLE>
    </TD>
    <TD>
<TABLE border=0 width=100%>
  <TR><TD align=right width=60%><B>Current Run:</B></TD><TD><B><FONT SIZE=3 COLOR=00FFFF>1</FONT></B></TD></TR>
  <TR><TD align=right><B>OEE:</B></TD><TD><B><FONT SIZE=3 COLOR=00FFFF>41.37</FONT></B></TD></TR>
</TABLE>
    </TD>
  </TR>
  <TR>
    <TD>
<TABLE border=0 width=100%>
  <TR><TD><BR></TD><TD><B>Total</B></TD><TD><B>Last 100</B></TD></TR>
  <TR><TD align=right><B>Fed</B></TD><TD bgcolor=blue><B><font color=#FFFFFF>6743</font></B></TD><TD bgcolor=blue><B><font color=#FFFFFF>100</font></B></TD></TR>
  <TR><TD align=right><B>Accepted</B></TD><TD bgcolor=green><B><font color=#FFFFFF>4592</font></B></TD><TD bgcolor=green><B><font color=#FFFFFF>65</font></B></TD></TR>
  <TR><TD align=right><B>Non Read</B></TD><TD bgcolor=purple><B><font color=#FFFFFF>0</font></B></TD><TD bgcolor=purple><B><font color=#FFFFFF>0</font></B></TD></TR>
  <TR><TD align=right><B>No Code</B></TD><TD bgcolor=003399><B><font color=#FFFFFF>0</font></B></TD><TD bgcolor=003399><B><font color=#FFFFFF>0</font></B></TD></TR>
  <TR><TD align=right><B>Out of Plan</B></TD><TD bgcolor=yellow><B><font color=#000000>0</font></B></TD><TD bgcolor=yellow><B><font color=#000000>0</font></B></TD></TR>
  <TR><TD align=right><B>BC Print Err</B></TD><TD bgcolor=CCCCCC><B><font color=#FFFFFF>5</font></B></TD><TD bgcolor=CCCCCC><B><font color=#FFFFFF>0</font></B></TD></TR>
  <TR><TD align=right><B>ID Tag Print</B></TD><TD bgcolor=66FFCC><B><font color=#FFFFFF>11</font></B></TD><TD bgcolor=66FFCC><B><font color=#FFFFFF>0</font></B></TD></TR>
  <TR><TD align=right><B>Mech/Tracking</B></TD><TD bgcolor=black><B><font color=#FFFFFF>2</font></B></TD><TD bgcolor=black><B><font color=#FFFFFF>0</font></B></TD></TR>
  <TR><TD align=right><B>OCR Rejects</B></TD><TD bgcolor=FF00FF><B><font color=#FFFFFF>2133</font></B></TD><TD bgcolor=FF00FF><B><font color=#FFFFFF>35</font></B></TD></TR>
</TABLE>
    </TD>
    <TD>
<TABLE border=0 width=100%>
  <TR><TD align=right><B>GAR:</B></TD><TD><B><FONT color=00FFFF>96</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>MAR:</B></TD><TD><B><FONT color=00FFFF>100</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>9/11:</B></TD><TD><B><FONT color=00FFFF>45</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>VER:</B></TD><TD><B><FONT color=00FFFF>0</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>TER:</B></TD><TD><B><FONT color=00FFFF>0</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>ORR:</B></TD><TD><B><FONT color=00FFFF>37</FONT></B></TD><TD>%</TD></TR>
  <TR><TD align=right><B>GAP:</B></TD><TD><B><FONT color=00FFFF>134</FONT></B></TD><TD>mm</TD></TR>
  <TR><TD colspan=2 align=center><B><U>Throughput:</U></B></TD></TR>
  <TR><TD align=right><B>RUN:</B></TD><TD><B><FONT color=00FFFF>29895</FONT></B></TD></TR>
  <TR><TD align=right><B>OP:</B></TD><TD><B><FONT color=00FFFF>17413</FONT></B></TD></TR>
</TABLE>
    </TD>
    <TD>
<TABLE border=1 width=100%>
<TR><TD>
  <TABLE border=0 width=100%>
    <TR><TD align=right width=60%><B>E-Stops:</B></TD><TD><B><FONT COLOR=00FFFF>0</FONT></B></TD></TR>
    <TR><TD align=right><B>Pocket Full:</B></TD><TD><B><FONT COLOR=00FFFF>2</FONT></B></TD></TR>
    <TR><TD align=right><B>Jams:</B></TD><TD><B><FONT COLOR=00FFFF>1</FONT></B></TD></TR>
    <TR><TD align=right><B>Seq Stops:</B></TD><TD><B><FONT COLOR=00FFFF>0</FONT></B></TD></TR>
  </TABLE>
</TD></TR>
<TR><TD>
  <TABLE border=0 width=100%>
    <TR><TD align=right width=60%><B>RPMS:</B></TD><TD><B><FONT COLOR=00FFFF>Disable</FONT></B></TD></TR>
    <TR><TD align=right><B>DAS:</B></TD><TD><B><FONT COLOR=00FFFF>Disable</FONT></B></TD></TR>
    <TR><TD align=right><B>OOS Pkt Seq:</B></TD><TD><B><FONT COLOR=00FFFF>Enabled</FONT></B></TD></TR>
    <TR><TD align=right><B>OOS Pkt Delta:</B></TD><TD><B><FONT COLOR=00FFFF>Enabled</FONT></B></TD></TR>
   <TR><TD align=right><B>CMD:</B></TD><TD><B><FONT COLOR=00FFFF>Disable</FONT></B></TD></TR>
    <TR><TD align=right><B>Lift All/RTS:</B></TD><TD><B><FONT COLOR=00FFFF>Disable</FONT></B></TD></TR>
  </TABLE>
</TD></TR>
</TABLE>
    </TD>
  </TR>
</TABLE></TD></TR>
</TABLE>
</CENTER>
</BODY>
</HTML>
0
 
LVL 44

Accepted Solution

by:
Karl Heinz Kremer earned 75 total points
ID: 10914371
I'm sorry, I probably should have tried to access the page and look at the data. I just assumed that "picture" meant that you did a screen capture or something similar.

Here is a very simple XSL script that only lists the contents of the table at the top of the page, field by field:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:param name="delim" select="':'"/>   <!-- tab     -->
<xsl:param name="nl"    select="'&#xA;'"/>   <!-- newline -->
<xsl:template match="/">
    <xsl:for-each select="html/body/center/table/tr/td/table/tr">
        <xsl:for-each select="td">
               <xsl:value-of select="."/>
               <xsl:value-of select="$nl"/>
        </xsl:for-each>
    </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Save this to a file (e.g. extract_table.xsl).
/usr/bin/wget -O- http://www.htd.dns2go.com/pic.html > pic.html
/usr/local/bin/tidy -asxml -o pic.xml pic.html
/usr/bin/xsltproc --html extract_table.xsl pic.xml

This will print something like this:

Cur Oper Mode
Iss Capture All
Disabled
Drupla Capture
Disabled
Iss Disk Free Cap:
0

Cur Oper Mode
Iss Capture All
Disabled
Drupla Capture
Disabled
Iss Disk Free Cap:
0

ISS/OCR

Oss Err Monitor
Enabled
Ocr Data Collect
Disabled
Iss Disk Free (MB):
999


ISS/OCR

Oss Err Monitor
Enabled
Ocr Data Collect
Disabled
Iss Disk Free (MB):
999
BC Printer
Enabled
Ocr Engines
Enabled
Zip+4 Dir Ver:
4162004

BC Printer
Enabled
Ocr Engines
Enabled
Zip+4 Dir Ver:
4162004
Address Block
Pre
BC Reader
Enabled
...

You can of course make the XSL more complex and format the data so that you have e.g. key value pairs per line:

key1: value1
key2: value2
...

0
 
LVL 8

Expert Comment

by:da99rmd
ID: 10926515
What things do you want from the page ?

/Rob
0
 
LVL 1

Author Comment

by:Ted22
ID: 10933045

I found by using tidy the page gets put in order.

Using this php script it removes the tags.
<?
$page = "pic2.html";
if (file_exists($page)) {
$fp = fopen($page, "r");
$contents = fread($fp, filesize($page));
echo strip_tags($contents);
fclose($fp);
}
?>

php -q pic.html > FileWithoutTags

Now I can grep what I need and colrm all but the numbers.

0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
script 11 136
Need bundled ATFTP tool for use with ESXi 4 76
can't run or update 'yum' on CentOS 7 7 53
Error while installing rpm 1 63
Have you ever been frustrated by having to click seven times in order to retrieve a small bit of information from the web, always the same seven clicks, scrolling down and down until you reach your target? When you know the benefits of the command l…
The purpose of this article is to fix the unknown display problem in Linux Mint operating system. After installing the OS if you see Display monitor is not recognized then we can install "MESA" utilities to fix this problem or we can install additio…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now