Solved

Data sort from txt file

Posted on 2011-03-25
5
251 Views
Last Modified: 2012-05-11
I have a data file that is a live document being written to constantly.  The data being written to it looks like this:

http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca

I need to extract and write to a separate file just the relevant bits - those being any url host and the ad size "z=728x90" - ideally I would like to put these bits of data into a table of some sort.

Any help/ideas?

thanks
0
Comment
Question by:eezar21
  • 2
  • 2
5 Comments
 
LVL 2

Expert Comment

by:thomas_nilsson
ID: 35213612
You are not saying much about the environment. Is that a text file on a common file server accessible form anywhere in your network? Is the text file a part of a webapp so that the parsing/extracting need to be done in whatever language/environement the app is in?

Also, is the *content* of the file lines looking like your example? And you want to extract from that something like:

Host                            Size
yeildmanager.com      728x90
somehost.org             5000x34


If so you'd probably need to figure out some pattern matching rules and use something like awk to extract the parts.

The other part of the problem is then if you want to only process the "new" lines (as I understand that it is being written to constantly) or if a periodic batch processing of the whole file would do?
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35213821
Based on the data given as an example, this code

<?php

$string = "http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca";

$pattern = '#https?://.+?(https?://.+)#';

preg_match( $pattern, $string, $matches );

$bits = parse_url( $matches[1]);
$params = explode( "&", $bits[path] );

print_r( $bits );
print_r( $params );

Open in new window



will produce this output

Array
(
    [scheme] => http
    [host] => www.geni.com
    [path] => /ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca
)
Array
(
    [0] => /ads/google,Z=728x90
    [1] => cb=1143214204
    [2] => s=603932
    [3] => _salt=3974840655
    [4] => B=10
    [5] => r=0,7fdfb488-5630-11e0-be23-003048d702ca
)
0
 

Author Comment

by:eezar21
ID: 35213990
Hi Thomas,

sorry you're right should have given more info.  The data is being extracted from our adserver and is a simple javascript script to let us know where our ad campaigns are being served.  The text file is a standalone file being held on the server and currently is being downloaded by ftp when I need to view it.

 In the example entry, the information that I would like to extract would look something like this:

URL 1                                                       URL 2                                                 Size
http://ad.yieldmanager.com/iframe3         http://www.geni.com/ads/google      728x90

For simplicity I would for now probably periodically batch process the file.

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35214048
Modified code

<?php

$string = "http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca";

$pattern = '#(https?://[^\?]+).+?(https?://.+)#';

preg_match( $pattern, $string, $matches );

$bits = parse_url( $matches[2]);
$params = explode( "&", $bits[path] );

print_r( $matches[1] );
echo "<br/>";
print_r( $bits );
echo "<br/>";
print_r( $params );

Open in new window


produces

http://ad.yieldmanager.com/iframe3

Array ( [scheme] => http [host] => www.geni.com [path] => /ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca )

Array ( [0] => /ads/google,Z=728x90 [1] => cb=1143214204 [2] => s=603932 [3] => _salt=3974840655 [4] => B=10 [5] => r=0,7fdfb488-5630-11e0-be23-003048d702ca )
0
 
LVL 2

Accepted Solution

by:
thomas_nilsson earned 500 total points
ID: 35214643
Here's some Javascript code that can be used to extract the fields you want:
var parts;

function Filter(stringToParse) {
	parts = stringToParse.split(/[,\?]/);
}

Filter.prototype.url1 = function () {
	return parts[0];
};

Filter.prototype.url2 = function () {
	return parts[3];
};

Filter.prototype.dimensions = function () {
	return parts[4].split(/&/)[0].slice(2);
};

Open in new window


And here is some JSTestDriver test cases:
FilterTest = TestCase("FilterTest");
var filter = new Filter("http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca");

FilterTest.prototype.testFindsURL1 = function() {
	assertEquals("http://ad.yieldmanager.com/iframe3", filter.url1());
};

FilterTest.prototype.testFindsURL2 = function() {
	assertEquals("http://www.geni.com/ads/google", filter.url2());
};

FilterTest.prototype.testFindsSize = function() {
	assertEquals("728x90", filter.dimensions());
};

Open in new window


0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

JavaScript can be used in a browser to change parts of a webpage dynamically. It begins with the following pattern: If condition W is true, do thing X to target Y after event Z. Below are some tips and tricks to help you get started with JavaScript …
This article demonstrates how to create a simple responsive confirmation dialog with Ok and Cancel buttons using HTML, CSS, jQuery and Promises
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now