Solved

Data sort from txt file

Posted on 2011-03-25
5
257 Views
Last Modified: 2012-05-11
I have a data file that is a live document being written to constantly.  The data being written to it looks like this:

http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca

I need to extract and write to a separate file just the relevant bits - those being any url host and the ad size "z=728x90" - ideally I would like to put these bits of data into a table of some sort.

Any help/ideas?

thanks
0
Comment
Question by:eezar21
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 2

Expert Comment

by:thomas_nilsson
ID: 35213612
You are not saying much about the environment. Is that a text file on a common file server accessible form anywhere in your network? Is the text file a part of a webapp so that the parsing/extracting need to be done in whatever language/environement the app is in?

Also, is the *content* of the file lines looking like your example? And you want to extract from that something like:

Host                            Size
yeildmanager.com      728x90
somehost.org             5000x34


If so you'd probably need to figure out some pattern matching rules and use something like awk to extract the parts.

The other part of the problem is then if you want to only process the "new" lines (as I understand that it is being written to constantly) or if a periodic batch processing of the whole file would do?
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35213821
Based on the data given as an example, this code

<?php

$string = "http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca";

$pattern = '#https?://.+?(https?://.+)#';

preg_match( $pattern, $string, $matches );

$bits = parse_url( $matches[1]);
$params = explode( "&", $bits[path] );

print_r( $bits );
print_r( $params );

Open in new window



will produce this output

Array
(
    [scheme] => http
    [host] => www.geni.com
    [path] => /ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca
)
Array
(
    [0] => /ads/google,Z=728x90
    [1] => cb=1143214204
    [2] => s=603932
    [3] => _salt=3974840655
    [4] => B=10
    [5] => r=0,7fdfb488-5630-11e0-be23-003048d702ca
)
0
 

Author Comment

by:eezar21
ID: 35213990
Hi Thomas,

sorry you're right should have given more info.  The data is being extracted from our adserver and is a simple javascript script to let us know where our ad campaigns are being served.  The text file is a standalone file being held on the server and currently is being downloaded by ftp when I need to view it.

 In the example entry, the information that I would like to extract would look something like this:

URL 1                                                       URL 2                                                 Size
http://ad.yieldmanager.com/iframe3         http://www.geni.com/ads/google      728x90

For simplicity I would for now probably periodically batch process the file.

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35214048
Modified code

<?php

$string = "http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca";

$pattern = '#(https?://[^\?]+).+?(https?://.+)#';

preg_match( $pattern, $string, $matches );

$bits = parse_url( $matches[2]);
$params = explode( "&", $bits[path] );

print_r( $matches[1] );
echo "<br/>";
print_r( $bits );
echo "<br/>";
print_r( $params );

Open in new window


produces

http://ad.yieldmanager.com/iframe3

Array ( [scheme] => http [host] => www.geni.com [path] => /ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca )

Array ( [0] => /ads/google,Z=728x90 [1] => cb=1143214204 [2] => s=603932 [3] => _salt=3974840655 [4] => B=10 [5] => r=0,7fdfb488-5630-11e0-be23-003048d702ca )
0
 
LVL 2

Accepted Solution

by:
thomas_nilsson earned 500 total points
ID: 35214643
Here's some Javascript code that can be used to extract the fields you want:
var parts;

function Filter(stringToParse) {
	parts = stringToParse.split(/[,\?]/);
}

Filter.prototype.url1 = function () {
	return parts[0];
};

Filter.prototype.url2 = function () {
	return parts[3];
};

Filter.prototype.dimensions = function () {
	return parts[4].split(/&/)[0].slice(2);
};

Open in new window


And here is some JSTestDriver test cases:
FilterTest = TestCase("FilterTest");
var filter = new Filter("http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca");

FilterTest.prototype.testFindsURL1 = function() {
	assertEquals("http://ad.yieldmanager.com/iframe3", filter.url1());
};

FilterTest.prototype.testFindsURL2 = function() {
	assertEquals("http://www.geni.com/ads/google", filter.url2());
};

FilterTest.prototype.testFindsSize = function() {
	assertEquals("728x90", filter.dimensions());
};

Open in new window


0

Featured Post

Are You Using the Best Web Development Editor?

The worlds of web hosting and web development are constantly evolving. Every year we see design trends change, coding standards adapt and new frameworks/CMS created. With such a quick pace of change it’s easy to get lost trying to keep up.

See if your editor made the list.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently I was talking with Tim Sharp, one of my colleagues from our Technical Account Manager team about MongoDB’s scalability. While doing some quick training with some of the Percona team, Tim brought something to my attention...
Originally, this post was published on Monitis Blog, you can check it here . In business circles, we sometimes hear that today is the “age of the customer.” And so it is. Thanks to the enormous advances over the past few years in consumer techno…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)

626 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question