Solved

Data sort from txt file

Posted on 2011-03-25
5
255 Views
Last Modified: 2012-05-11
I have a data file that is a live document being written to constantly.  The data being written to it looks like this:

http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca

I need to extract and write to a separate file just the relevant bits - those being any url host and the ad size "z=728x90" - ideally I would like to put these bits of data into a table of some sort.

Any help/ideas?

thanks
0
Comment
Question by:eezar21
  • 2
  • 2
5 Comments
 
LVL 2

Expert Comment

by:thomas_nilsson
ID: 35213612
You are not saying much about the environment. Is that a text file on a common file server accessible form anywhere in your network? Is the text file a part of a webapp so that the parsing/extracting need to be done in whatever language/environement the app is in?

Also, is the *content* of the file lines looking like your example? And you want to extract from that something like:

Host                            Size
yeildmanager.com      728x90
somehost.org             5000x34


If so you'd probably need to figure out some pattern matching rules and use something like awk to extract the parts.

The other part of the problem is then if you want to only process the "new" lines (as I understand that it is being written to constantly) or if a periodic batch processing of the whole file would do?
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35213821
Based on the data given as an example, this code

<?php

$string = "http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca";

$pattern = '#https?://.+?(https?://.+)#';

preg_match( $pattern, $string, $matches );

$bits = parse_url( $matches[1]);
$params = explode( "&", $bits[path] );

print_r( $bits );
print_r( $params );

Open in new window



will produce this output

Array
(
    [scheme] => http
    [host] => www.geni.com
    [path] => /ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca
)
Array
(
    [0] => /ads/google,Z=728x90
    [1] => cb=1143214204
    [2] => s=603932
    [3] => _salt=3974840655
    [4] => B=10
    [5] => r=0,7fdfb488-5630-11e0-be23-003048d702ca
)
0
 

Author Comment

by:eezar21
ID: 35213990
Hi Thomas,

sorry you're right should have given more info.  The data is being extracted from our adserver and is a simple javascript script to let us know where our ad campaigns are being served.  The text file is a standalone file being held on the server and currently is being downloaded by ftp when I need to view it.

 In the example entry, the information that I would like to extract would look something like this:

URL 1                                                       URL 2                                                 Size
http://ad.yieldmanager.com/iframe3         http://www.geni.com/ads/google      728x90

For simplicity I would for now probably periodically batch process the file.

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35214048
Modified code

<?php

$string = "http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca";

$pattern = '#(https?://[^\?]+).+?(https?://.+)#';

preg_match( $pattern, $string, $matches );

$bits = parse_url( $matches[2]);
$params = explode( "&", $bits[path] );

print_r( $matches[1] );
echo "<br/>";
print_r( $bits );
echo "<br/>";
print_r( $params );

Open in new window


produces

http://ad.yieldmanager.com/iframe3

Array ( [scheme] => http [host] => www.geni.com [path] => /ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca )

Array ( [0] => /ads/google,Z=728x90 [1] => cb=1143214204 [2] => s=603932 [3] => _salt=3974840655 [4] => B=10 [5] => r=0,7fdfb488-5630-11e0-be23-003048d702ca )
0
 
LVL 2

Accepted Solution

by:
thomas_nilsson earned 500 total points
ID: 35214643
Here's some Javascript code that can be used to extract the fields you want:
var parts;

function Filter(stringToParse) {
	parts = stringToParse.split(/[,\?]/);
}

Filter.prototype.url1 = function () {
	return parts[0];
};

Filter.prototype.url2 = function () {
	return parts[3];
};

Filter.prototype.dimensions = function () {
	return parts[4].split(/&/)[0].slice(2);
};

Open in new window


And here is some JSTestDriver test cases:
FilterTest = TestCase("FilterTest");
var filter = new Filter("http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca");

FilterTest.prototype.testFindsURL1 = function() {
	assertEquals("http://ad.yieldmanager.com/iframe3", filter.url1());
};

FilterTest.prototype.testFindsURL2 = function() {
	assertEquals("http://www.geni.com/ads/google", filter.url2());
};

FilterTest.prototype.testFindsSize = function() {
	assertEquals("728x90", filter.dimensions());
};

Open in new window


0

Featured Post

Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses how to create an extensible mechanism for linked drop downs.
Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question