• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 946
  • Last Modified:

Parsing text files using PHP and extracting values

Hi experts,

I have a txt file which users have the ability to upload an attachment using a form. I can grab the attachment for the particular user but I have no idea how to read the txt file using php and then extract certain data from it.

(All information is attached in the txt file)

I'm really interested in the first line of the txt file. I really need the following data:- NOTE the data way be varied per user and the content will not be the same number of characters each time

Information 1:- 'M a g m a g e d d o n'  
Information 2:- 'w a y 1 2'
Information 3:- 'N a b s i k'
Information 4:- 'k i n g o f k i n g s . e x e'

Does anybody know how to extract this data?

Thanks in advance
attachment-text.txt
0
EzEApostle
Asked:
EzEApostle
  • 4
  • 3
  • 2
  • +2
1 Solution
 
striker46Commented:
Use the function file_get_contents()

http://us.php.net/manual/en/function.file-get-contents.php

Then use explode() to get the contents separated by line breaks in the array
0
 
MikeRCWattsCommented:
A couple of things, related:

- what do you want to extract from the file exactly?

- the attachment you posted doesn't like like a text file - looks like some application's file - you'd need to know something about its format to parse reliably.

- I'm guessing the format isn't published.  In that case you could try parsing a number of them, guessing/deducing the format, but it's not very reliable ..... and this type of reverse engineering is doubtful?

It's easy enough to write some php to open the file an parse something out, just depends on the rules for parsing it...  Mike
0
 
EzEApostleAuthor Commented:
- what do you want to extract from the file exactly?

I showed the content above which I need to extract

- the attachment you posted doesn't like like a text file - looks like some application's file - you'd need to know something about its format to parse reliably.

Correct. It's not a txt file but a game application file. When playing a particular game you can choose to save a replay of the game you played. This txt file is the contents of the game file which is effectively known as .RA3Replay but opening as a txt file you can see the top line of the contents i.e. the players names and map

- I'm guessing the format isn't published.  In that case you could try parsing a number of them, guessing/deducing the format, but it's not very reliable ..... and this type of reverse engineering is doubtful?

I'm not sure to be honest, I've only just started learning PHP mysql so am some what a novice. A good example of the data working can be found here:-

URL ATTACHED - DO NOT WANT SEARCH ENGINE TO VIEW THIS LINK

It's the exact same replay and you will see the contents are returned to the screen.

Information 1:- 'M a g m a g e d d o n'  - THIS IS THE MAP NAME
Information 2:- 'w a y 1 2' - THIS IS A PLAYER
Information 3:- 'N a b s i k' - THIS IS A PLAYER
Information 4:- 'k i n g o f k i n g s . e x e' - THIS IS A PLAYER

Hope this helps
working-example.txt
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
striker46Commented:
There you go.

Output should be: M a g m a g e d d o n
<?php
 
// path to file
 
$file = 'text.txt';
 
$read = file_get_contents($file);
 
// get content as array, explode using return
 
$line = explode("\n", $read);
 
 
// remove first and last characters of the first line in array
 
$result = substr($line[0], 17, -1); 
$result = substr($result, 0, -3); 
 
echo $result;
 
 
 
?>

Open in new window

0
 
striker46Commented:
Use this to remove spaces from the outputted result:

$result = str_replace(' ','',$result);


So m a p m a g e d d o n shows as mapmageddon
$result = str_replace(' ','',$result);

Open in new window

0
 
striker46Commented:
This code will retrieve and loop the 4 lines or the number you set in the $lines variable.
It makes a for loop.

Notice it uses rtrim() to remove spaces at end of string (in the file you provided the first 2 lines had 2 spaces in the end, while the 2 last hadn't, this way it will remove it always). Then it also strips the last '.

In the example it echoes the results but you can do whatever changes you want...



<?php
 
$file = 'text.txt';
$start = '0';
$lines = '4'; 
 
$read = file_get_contents($file);
 
$line = explode("\n", $read);
 
for ($i = $start; $i <= $lines; $i++) {
 
$result = substr($line[$i], 17, -1); 
$result = substr(rtrim($result), 0, -1); 
$result = str_replace(' ','',$result);
echo $result . "<br />";
}
 
 
?>

Open in new window

0
 
Fero45Commented:
Mike's comment is true, the attached file is not a text file. The any coding seems to be a waste of time
0
 
Ray PaseurCommented:
It's always a little dicey to try to parse a binary file - we don't know what so much of the data means, but as I look over this thing, it looks like much of what you want may be found near the top of the file in clear text or semi-clear text.  Have a look at the code snippet - it is an extract.

Looking at your OP, you wanted to catch this:
Information 1:- 'M a g m a g e d d o n'  
Information 2:- 'w a y 1 2'
Information 3:- 'N a b s i k'
Information 4:- 'k i n g o f k i n g s . e x e'

I think we can find Magmageddon in the top of the file, but what about piou54, IgorPILGRIM, etc.?  Do you want to omit those from the results set?
M=283data/maps/official/map_mp_6_feasel4
;MC=E42C2F80;MS=0;SD=-1031081576;GSID=73B1;GT=-1;PC=-1;RU=3 100 10000 0 1 10 0 0 0 -1 0 -1 -1 1 
;S=Hway12,29EBF6B2,0,TT,0,8,0,0,0,1,-1,
:HNabsik,4DC03434,8088,TT,-1,8,-1,1,0,1,-1,
:Hkingofkings.exe,569F9BD8,0,TT,-1,7,-1,0,0,1,-1,
:Hpiou54,5131E7A3,8088,TT,5,2,3,1,0,1,-1,
:HIgorPILGRIM,4E251899,8088,TT,-1,2,2,0,0,1,-1,
:HMrPunisher,5ED2D350,8088,TT,4,8,-1,1,0,1,-1,:;           L a s t   R e p l a y Ù  ";

Open in new window

0
 
MikeRCWattsCommented:
I'd be tempted not to read in the whole file - $read = file_get_contents($file);

..just get the first 2k or whatever.

..as it's rather large, could perhaps be very large, and the data seems to be  at the top.

Ray's extract above makes it clear we might be guessing, might be OK but dodgy  ... try to figure out what the :H is, whether it's invariant or just chop of the first two characters and parse up to the next comma ...  but if you're going for this reverse engineering you'd need to look at a good number of such files, check they're all the same, preferably from different players/versions/times.

 also you'd remain at risk when the format changes, or if there is some factor you/we haven't guessed.

Sorry to be pessimistic, but ...

0
 
Ray PaseurCommented:

<?php // RAY_temp_parse_RA3.php
 
// SOME TEST DATA STARTING FROM THE TOP OF THE FILE
$test_data = "REPLAY HEADER      ~  ûD   3 v 3   F r i e n d l y   p e o p l e   o n l y   N o   M a t c h   D e s c r i p t i o n   M a g m a g e d d o n   F a k e M a p I D   ÔWÄ
w a y 1 2   Äã÷	N a b s i k   ysk i n g o f k i n g s . e x e   {OW
p i o u 5 4   yÆ
I g o r P I L G R I M    Í
M r P u n i s h e r         r     CNC3RPL RA3                   CÍI                                 M=283data/maps/official/map_mp_6_feasel4;MC=E42C2F80;MS=0;SD=-1031081576;GSID=73B1;GT=-1;PC=-1;RU=3 100 10000 0 1 10 0 0 0 -1 0 -1 -1 1 ;S=Hway12,29EBF6B2,0,TT,0,8,0,0,0,1,-1,:HNabsik,4DC03434,8088,TT,-1,8,-1,1,0,1,-1,:Hkingofkings.exe,569F9BD8,0,TT,-1,7,-1,0,0,1,-1,:Hpiou54,5131E7A3,8088,TT,5,2,3,1,0,1,-1,:HIgorPILGRIM,4E251899,8088,TT,-1,2,2,0,0,1,-1,:HMrPunisher,5ED2D350,8088,TT,4,8,-1,1,0,1,-1,:;           L a s t   R e p l a y Ù  
 7  Z   1.6.3230.17659ü²2";
 
// AN EXTRACT FROM THE TEST DATA, BROKEN UP FOR READABILITY
/* // *********
M=283data/maps/official/map_mp_6_feasel4
;MC=E42C2F80;MS=0;SD=-1031081576;GSID=73B1;GT=-1;PC=-1;RU=3 100 10000 0 1 10 0 0 0 -1 0 -1 -1 1
;S=Hway12,29EBF6B2,0,TT,0,8,0,0,0,1,-1,
:HNabsik,4DC03434,8088,TT,-1,8,-1,1,0,1,-1,
:Hkingofkings.exe,569F9BD8,0,TT,-1,7,-1,0,0,1,-1,
:Hpiou54,5131E7A3,8088,TT,5,2,3,1,0,1,-1,
:HIgorPILGRIM,4E251899,8088,TT,-1,2,2,0,0,1,-1,
:HMrPunisher,5ED2D350,8088,TT,4,8,-1,1,0,1,-1,:;           L a s t   R e p l a y Ù  
*/ // *********
 
// IF WE READ THIS FROM AN EXTERNAL SOURCE, TRUNCATE IT FOR FASTER PROCESSING
$test_data = substr($test_data,0,8192);
 
// BREAK OFF THE TOP OF THE TEST DATA
$things = explode('CNC3RPL RA3',$test_data);
 
// REMOVE THE UNREADABLE CHARACTERS
$topthing = clean_string($things[0]);
// REPLACE TRIPLE SPACES WITH PIPES
$topthing = str_replace('   ', '|', $topthing);
// REMOVE SPACES TO PACK LETTERS TOGETHER INTO WORDS
$topthing = str_replace(' ', '', $topthing);
// LOCATE 3v3|
$pos = strpos($topthing, '3v3|');
$topthing = substr($topthing, $pos);
// LOCATE THE FIRST UNREADABLE CHARACTER
$pos = strpos($topthing, '?');
$topthing = trim(substr($topthing,0,$pos));
// SHOW WHAT WE FOUND - MIGHT WANT TO EXPLODE THIS ON THE PIPE DELIMITER?
echo "<br/>$topthing \n";
 
// PROCESS SOME OF THE NEXT PART OF THE DATA
$nexthing = clean_string($things[1]);
// LOCATE M=283data/ FOR START OF USEFUL INFORMATION AND DISCARD THE FRONT NOISE
$pos = strpos($nexthing, 'M=283data/');
$nexthing = substr($nexthing, $pos);
// LOCATE ;S= FOR START OF PLAYER NAMES AND DISCARD FRONT NOISE
$pos = strpos($nexthing, ';S=');
$nexthing = substr($nexthing, $pos + strlen(';S='));
// LOCATE ,:; (COMMA COLON SEMICOLON) AND DISCARD TRAILING NOISE
$pos = strpos($nexthing, ',:;');
$nexthing = substr($nexthing,0,$pos);
 
// THE FIELDS ARE SEPARATED BY THE COLON
// THE NAMES ARE PREPENDED BY CAPITAL 'H'
// THE NAMES ARE ENDED BY COMMA
$names = explode(':', $nexthing);
 
// ITERATE OVER THE NAMES
foreach ($names as $name)
{
   $name = ereg_replace('^H', '', $name); // REMOVE H
   $pos  = strpos($name, ','); // LOCATE COMMA
   $name = substr($name, 0, $pos); // EXTRACT NAME
   echo "<br/>$name \n";
}
 
 
 
// FUNCTION TO REMOVE UNWANTED CHARACTERS
function clean_string($string)
{
   $new	= trim(ereg_replace('[^\' a-zA-Z0-9:;=/,_\.\-]', '?', $string));
return ( $new );
}

Open in new window

0
 
EzEApostleAuthor Commented:
Thanks very much for looking into this guys,

Ray:-
'I think we can find Magmageddon in the top of the file, but what about piou54, IgorPILGRIM, etc.?  Do you want to omit those from the results set?'

No I need those other names included in the results as well if possible. I just figured it would be easy to grab the others if it was similar to grabbing the other names?

There is also the option to rather play a 3v3 match a 1v1 match, 2v2, 2v3 etc etc so would that mess up the results using any of the code snippetts above?

There is also code in the file which returns what army/faction the players have chosen as well so it could get rather tricky.

As this coding work would probably take quite a lot of investigation/coding I would be happy to pay anyone who manages to code the file and return result sets to cover the matches £250 - possibly more as it could be seen as a full days work for you pros! If your interested Ray/others let me know your msn/email and we'll talk more about arrangements.

Thanks
Tim
0
 
Ray PaseurCommented:
Hi, Tim. Thanks for the offer, but I have a rather full plate at the moment.  However I can help with the "3v3" issue.  We can use a REGEX to isolate a number, followed by a 'v', followed by another number, followed by a pipe.  Something like this should do it:

[0-9]+v[0-9]+\|

As to the other names, the code snippet I posted above will capture those.  You can run it and see what you get.

As a practical matter, you might want to contact the site that creates that file and ask if the have an API (Application Programming Interface) available for the file.  If so, it would likely be the easiest way to find the data and certainly more dependable than trying to parse a binary.

Best regards, ~Ray
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 4
  • 3
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now