Solved

Data sort from txt file

Posted on 2011-03-25
5
256 Views
Last Modified: 2012-05-11
I have a data file that is a live document being written to constantly.  The data being written to it looks like this:

http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca

I need to extract and write to a separate file just the relevant bits - those being any url host and the ad size "z=728x90" - ideally I would like to put these bits of data into a table of some sort.

Any help/ideas?

thanks
0
Comment
Question by:eezar21
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 2

Expert Comment

by:thomas_nilsson
ID: 35213612
You are not saying much about the environment. Is that a text file on a common file server accessible form anywhere in your network? Is the text file a part of a webapp so that the parsing/extracting need to be done in whatever language/environement the app is in?

Also, is the *content* of the file lines looking like your example? And you want to extract from that something like:

Host                            Size
yeildmanager.com      728x90
somehost.org             5000x34


If so you'd probably need to figure out some pattern matching rules and use something like awk to extract the parts.

The other part of the problem is then if you want to only process the "new" lines (as I understand that it is being written to constantly) or if a periodic batch processing of the whole file would do?
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35213821
Based on the data given as an example, this code

<?php

$string = "http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca";

$pattern = '#https?://.+?(https?://.+)#';

preg_match( $pattern, $string, $matches );

$bits = parse_url( $matches[1]);
$params = explode( "&", $bits[path] );

print_r( $bits );
print_r( $params );

Open in new window



will produce this output

Array
(
    [scheme] => http
    [host] => www.geni.com
    [path] => /ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca
)
Array
(
    [0] => /ads/google,Z=728x90
    [1] => cb=1143214204
    [2] => s=603932
    [3] => _salt=3974840655
    [4] => B=10
    [5] => r=0,7fdfb488-5630-11e0-be23-003048d702ca
)
0
 

Author Comment

by:eezar21
ID: 35213990
Hi Thomas,

sorry you're right should have given more info.  The data is being extracted from our adserver and is a simple javascript script to let us know where our ad campaigns are being served.  The text file is a standalone file being held on the server and currently is being downloaded by ftp when I need to view it.

 In the example entry, the information that I would like to extract would look something like this:

URL 1                                                       URL 2                                                 Size
http://ad.yieldmanager.com/iframe3         http://www.geni.com/ads/google      728x90

For simplicity I would for now probably periodically batch process the file.

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35214048
Modified code

<?php

$string = "http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca";

$pattern = '#(https?://[^\?]+).+?(https?://.+)#';

preg_match( $pattern, $string, $matches );

$bits = parse_url( $matches[2]);
$params = explode( "&", $bits[path] );

print_r( $matches[1] );
echo "<br/>";
print_r( $bits );
echo "<br/>";
print_r( $params );

Open in new window


produces

http://ad.yieldmanager.com/iframe3

Array ( [scheme] => http [host] => www.geni.com [path] => /ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca )

Array ( [0] => /ads/google,Z=728x90 [1] => cb=1143214204 [2] => s=603932 [3] => _salt=3974840655 [4] => B=10 [5] => r=0,7fdfb488-5630-11e0-be23-003048d702ca )
0
 
LVL 2

Accepted Solution

by:
thomas_nilsson earned 500 total points
ID: 35214643
Here's some Javascript code that can be used to extract the fields you want:
var parts;

function Filter(stringToParse) {
	parts = stringToParse.split(/[,\?]/);
}

Filter.prototype.url1 = function () {
	return parts[0];
};

Filter.prototype.url2 = function () {
	return parts[3];
};

Filter.prototype.dimensions = function () {
	return parts[4].split(/&/)[0].slice(2);
};

Open in new window


And here is some JSTestDriver test cases:
FilterTest = TestCase("FilterTest");
var filter = new Filter("http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca");

FilterTest.prototype.testFindsURL1 = function() {
	assertEquals("http://ad.yieldmanager.com/iframe3", filter.url1());
};

FilterTest.prototype.testFindsURL2 = function() {
	assertEquals("http://www.geni.com/ads/google", filter.url2());
};

FilterTest.prototype.testFindsSize = function() {
	assertEquals("728x90", filter.dimensions());
};

Open in new window


0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Never store passwords in plain text or just their hash: it seems a no-brainier, but there are still plenty of people doing that. I present the why and how on this subject, offering my own real life solution that you can implement right away, bringin…
These days, all we hear about hacktivists took down so and so websites and retrieved thousands of user’s data. One of the techniques to get unauthorized access to database is by performing SQL injection. This article is quite lengthy which gives bas…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question