?
Solved

Parsing another programs log file

Posted on 2005-05-09
9
Medium Priority
?
179 Views
Last Modified: 2010-04-22
I've been collecting log files from a program I'm using and 90% of the time I can correctly parse the log with my current script.  Unfortunetly I can find no rhyme or reason to the header file they use for the log file, it is never exactly the same characters, bits or end in the same character.  The data I want is alwasy in the following form:

11,22,3333,4444,555555,666666,77,88,99,10,11,1212121212121212121212

Now the problem I am having is that sometimes the header has a comma "," in it and the cut command will keep that line of the header.  I've only experienced one line having this problem at the top of the file but would like to check all lines for this common format and remove the lines that do not follow this format.  I am new at bash scripting and can not figure out how to do this.

NOTE: the 12th comma deliminated info could contain a comma in it which is also another problem I can't figure out.  If there is a comma in the 12th spot it could give me 13,14,15,ect fields if I use the cut command.  Is there a way to have the last field just include everything after the last comma or does it already do that?  The cut command I've been using is:

cut -f "1 2 3 4 5 6 7 8 9 10 11 12" -d , -s < file.log

Thanks for any help.
0
Comment
Question by:cpwems
  • 4
  • 4
9 Comments
 
LVL 48

Expert Comment

by:Tintin
ID: 13963627
Can you show us a real sample header and data line?  There must be something else that can be used to distinguish them.

BTW, your current cut command can be reduced to

cut -f1-12 -d, -s <file.log

0
 

Author Comment

by:cpwems
ID: 13963824
Below are two examples of the beggining of the log file.  The first example the data line starts at '0f,00,00,806050a0' and the second example the data line starts at 'c8,00,00,ff5020ff,' and each data line is seperated by the NUL character '^@'.  I have many log files each about 4.5k, so I can send whole files if anyone needs.

Sample 1:
^@^@M^@¹^@^H^Ah^AÈ^A^W^B<85>^BÖ^BE^C<94>^C^B^DS^Dª^D^A^Eq^EÀ^E.^F^?^Fï^F@^G°^G^C^Hs^Hã^H5       <82>    î       =
<8a>
ö
E^K<96>^K^F^Lq^LÄ^L/^M<9a>^M
^Nz^Nê^N<^O­^O^A^Pq^Pá^PQ^QÁ^Q"^R|^R0f,00,00,806050a0,000001ab,000001c2,0014,00,01,01,00,^^^A^_^OJet examines you.^?1^@79,03,00,80c0c050,000001ac,000001c3,0035,00,01,02,00,^^^AJet begins to browse the merchandise in your bazaar.^@

Sample 2:
^@^@S^@<8c>^@à^@6^A<87>^A÷^AJ^B<97>^B^C^CR^Cº^C^V^Dg^D×^D*^Ew^Eã^E2^F<86>^Fù^FO^G<9d>^G
^HZ^HÉ^H^Y      k       ¹       &
v
Ò
#^K<93>^Kæ^K7^L§^Lú^LJ^M¹^M^K^NZ^NÈ^N^Y^Om^Oà^O6^P<85>^Pó^PD^Qc8,00,00,ff5020ff,00000000,00000000,001c,00,01,00,00,^^^A<<< Welcome to Phoenix! >>>^@c8,00,00,ff5020ff,00000001,00000001,0002,00,01,00,00,^^^A ^@00,00,00,80808080,00000002,00000002,001d,00,01,00,00,^^^A=== Area: Bastok Markets ===^@

This is the script I have written so far:
for i in $log_dir/*; do
  if [ -f $i ]; then
  # if the file is there
    filename=${i#$log_dir/}
    tr '\0' '\n' < $log_dir/$filename > temp.log
    csplit -s temp.log "/\0/"
    if [ -f xx01 ]; then
      cut -f1-12 -d , -s xx01 > $clean_dir/$filename
    fi
    rm -rf temp.log
    rm -rf xx*
  fi
done


Thanks for the short command Tintin.
0
 
LVL 48

Accepted Solution

by:
Tintin earned 900 total points
ID: 13964956
Are the ^R, ^Q etc, actual control characters?

If so, it would appear (from the sample data) that each header lines ends in a control character.  Assuming the data has no control characters, you could strip out all lines that end in a control character, that would then leave you with just the data.
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 

Author Comment

by:cpwems
ID: 13964965
The data does have control characters like ^A and ^B which indicate what color the text is supose to be from that point on.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 13965026
Instead of your cut, try:

sed "s/[^a-f0-9]*\([a-f0-9][a-f0-9],.*\)/\1/" xx01 >$clean_dir/$filename
0
 

Author Comment

by:cpwems
ID: 13965748
Ok that suggestion helped out as long as I piped it with the cut, but I've found other problems, here are some more raw sample files:

sample3:
^@^@B^@<9b>^@^A^AV^A¯^A^B^BR^B´^Bü^BY^C«^C^\^Dÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0d,00,00,8020c0a0,0000015d,00000190,000b,00,01,01,00,^^^A(Goduro) k^@00,00,00,80808080,0000015e,00000191,0022,00,01,00,00,^^^ACash's title: Black Dragon Slayer^@

sample4:
^@^@q^@â^@3^A×^A^T^Be^B®^B^F^Cc^C¾^C0^Ds^D¶^Dü^D<8e>^Eä^EU^FØ^F:^GÆ^G^L^Hj^Hµ^H&        t       ¿       K
®
^N^Kj^KÐ^KB^L ^Lø^L{^MÊ^MV^N¼^N^P^Oe^Oº^O,^Po^Pè^Pt^QÔ^Q>^RÁ^R'^S79,03,00,80c0c050,0000008c,00000096,003a,00,01,02,00,^^^ASearch result: Only one person found in the entire world.^@79,03,00,80c0c050,0000008d,00000097,003a,00,01,02,00,^^^ASearch result: Only one person found in the entire world.^@

The problem with sample4 is that in the header there is a comma and it is left over with:
,
79,03,00,80c0c050,0000008c,00000096,003a,00,01,02,00,^^^ASearch result: Only one person found in the entire world.
79,03,00,80c0c050,0000008d,00000097,003a,00,01,02,00,^^^ASearch result: Only one person found in the entire world.


Tintin thanks again for all your help.  I've been trying for weeks to do this myself.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 13974363
You should be able to just run:

grep , xx01 | sed "s/[^a-f0-9]*\([a-f0-9][a-f0-9],.*\)/\1/"

It works for me (note I have changed the control characters to X's and added newlines for the nulls to make it easier to test).

$ cat file
^@^@q^@â^@3^A×^A^T^Be^B®^B^F^Cc^C¾^C0^Ds^D¶^Dü^D<8e>^Eä^EU^FØ^F:^GÆ^G^L^Hj^Hµ^H&        t       ¿       K
®
XX,YY,ZZ79,03,00,80c0c050,0000008c,00000096,003a,00,01,02,00,^^^ASearch result: Only one person found in the entire world.
79,03,00,80c0c050,0000008d,00000097,003a,00,01,02,00,^^^ASearch result: Only one person found in the entire world.

$ grep , file | sed "s/[^a-f0-9]*\([a-f0-9][a-f0-9],.*\)/\1/"

79,03,00,80c0c050,0000008c,00000096,003a,00,01,02,00,^^^ASearch result: Only one person found in the entire world.
79,03,00,80c0c050,0000008d,00000097,003a,00,01,02,00,^^^ASearch result: Only one person found in the entire world.



0
 
LVL 7

Expert Comment

by:aib_42
ID: 13974810
This looks like a job for awk, I wonder why it hasn't been suggested...
0
 

Author Comment

by:cpwems
ID: 13974998
Well Tintin set me in the right direction in thought for me to solve it myself.  Found some huge unix book and just looked through it at all the commands and stumbled on the dd command.  So here is the following code that works on my log files.  I never realized that the control characters took up two bytes, hence why I couldn't figure out the size of the header.  Would still love to know what is in the header but it's not important.

log_dir='/home/tabber/ffxi_logs'
clean_dir='/home/tabber/clean'
                                                                               
for i in $log_dir/*; do
  if [ -f $i ]; then
  # if the file is there
    filename=${i#$log_dir/}
    dd bs=1 skip=100 < $log_dir/$filename | tr '\0' '\n' > $clean_dir/$filename
  fi
done


I'm not to sure how to assign points but Tintin you will get them all, please let me know if I do it wrong.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever been frustrated by having to click seven times in order to retrieve a small bit of information from the web, always the same seven clicks, scrolling down and down until you reach your target? When you know the benefits of the command l…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
This video shows how to quickly and easily deploy an email signature for all users in Office 365 and prevent it from being added to replies and forwards. (the resulting signature is applied on the server level in Exchange Online) The email signat…
Is your OST file inaccessible, Need to transfer OST file from one computer to another? Want to convert OST file to PST? If the answer to any of the above question is yes, then look no further. With the help of Stellar OST to PST Converter, you can e…
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question