Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Data sort from txt file

Posted on 2011-03-25
5
Medium Priority
?
261 Views
Last Modified: 2012-05-11
I have a data file that is a live document being written to constantly.  The data being written to it looks like this:

http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca

I need to extract and write to a separate file just the relevant bits - those being any url host and the ad size "z=728x90" - ideally I would like to put these bits of data into a table of some sort.

Any help/ideas?

thanks
0
Comment
Question by:eezar21
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 2

Expert Comment

by:thomas_nilsson
ID: 35213612
You are not saying much about the environment. Is that a text file on a common file server accessible form anywhere in your network? Is the text file a part of a webapp so that the parsing/extracting need to be done in whatever language/environement the app is in?

Also, is the *content* of the file lines looking like your example? And you want to extract from that something like:

Host                            Size
yeildmanager.com      728x90
somehost.org             5000x34


If so you'd probably need to figure out some pattern matching rules and use something like awk to extract the parts.

The other part of the problem is then if you want to only process the "new" lines (as I understand that it is being written to constantly) or if a periodic batch processing of the whole file would do?
0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35213821
Based on the data given as an example, this code

<?php

$string = "http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca";

$pattern = '#https?://.+?(https?://.+)#';

preg_match( $pattern, $string, $matches );

$bits = parse_url( $matches[1]);
$params = explode( "&", $bits[path] );

print_r( $bits );
print_r( $params );

Open in new window



will produce this output

Array
(
    [scheme] => http
    [host] => www.geni.com
    [path] => /ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca
)
Array
(
    [0] => /ads/google,Z=728x90
    [1] => cb=1143214204
    [2] => s=603932
    [3] => _salt=3974840655
    [4] => B=10
    [5] => r=0,7fdfb488-5630-11e0-be23-003048d702ca
)
0
 

Author Comment

by:eezar21
ID: 35213990
Hi Thomas,

sorry you're right should have given more info.  The data is being extracted from our adserver and is a simple javascript script to let us know where our ad campaigns are being served.  The text file is a standalone file being held on the server and currently is being downloaded by ftp when I need to view it.

 In the example entry, the information that I would like to extract would look something like this:

URL 1                                                       URL 2                                                 Size
http://ad.yieldmanager.com/iframe3         http://www.geni.com/ads/google      728x90

For simplicity I would for now probably periodically batch process the file.

0
 
LVL 34

Expert Comment

by:Beverley Portlock
ID: 35214048
Modified code

<?php

$string = "http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca";

$pattern = '#(https?://[^\?]+).+?(https?://.+)#';

preg_match( $pattern, $string, $matches );

$bits = parse_url( $matches[2]);
$params = explode( "&", $bits[path] );

print_r( $matches[1] );
echo "<br/>";
print_r( $bits );
echo "<br/>";
print_r( $params );

Open in new window


produces

http://ad.yieldmanager.com/iframe3

Array ( [scheme] => http [host] => www.geni.com [path] => /ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca )

Array ( [0] => /ads/google,Z=728x90 [1] => cb=1143214204 [2] => s=603932 [3] => _salt=3974840655 [4] => B=10 [5] => r=0,7fdfb488-5630-11e0-be23-003048d702ca )
0
 
LVL 2

Accepted Solution

by:
thomas_nilsson earned 2000 total points
ID: 35214643
Here's some Javascript code that can be used to extract the fields you want:
var parts;

function Filter(stringToParse) {
	parts = stringToParse.split(/[,\?]/);
}

Filter.prototype.url1 = function () {
	return parts[0];
};

Filter.prototype.url2 = function () {
	return parts[3];
};

Filter.prototype.dimensions = function () {
	return parts[4].split(/&/)[0].slice(2);
};

Open in new window


And here is some JSTestDriver test cases:
FilterTest = TestCase("FilterTest");
var filter = new Filter("http://ad.yieldmanager.com/iframe3?hGIeABw3CQCnyoUAAAAAAIISIgAAAAAAAABcAAYAAAAAAAIAAgAFEbi5KQAAAAAAktgDAAAAAAC1tCwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7mwQAAAAAAAIAAwAAAAAATDj0Fg.vuT9MOPQWD--5PxHfiVkvhsI.Ed-JWS-Gwj8dyeU.pN.OPx3J5T-k384.AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB8B4N1OFDUCQlO9WutGM3p4JuLb-ImA1ZlJ.6HAAAAAA==,,http://www.geni.com/ads/google,Z=728x90&cb=1143214204&s=603932&_salt=3974840655&B=10&r=0,7fdfb488-5630-11e0-be23-003048d702ca");

FilterTest.prototype.testFindsURL1 = function() {
	assertEquals("http://ad.yieldmanager.com/iframe3", filter.url1());
};

FilterTest.prototype.testFindsURL2 = function() {
	assertEquals("http://www.geni.com/ads/google", filter.url2());
};

FilterTest.prototype.testFindsSize = function() {
	assertEquals("728x90", filter.dimensions());
};

Open in new window


0

Featured Post

Learn how to optimize MySQL for your business need

With the increasing importance of apps & networks in both business & personal interconnections, perfor. has become one of the key metrics of successful communication. This ebook is a hands-on business-case-driven guide to understanding MySQL query parameter tuning & database perf

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this blog post, we’ll look at how ClickHouse performs in a general analytical workload using the star schema benchmark test.
Backups and Disaster RecoveryIn this post, we’ll look at strategies for backups and disaster recovery.
The viewer will learn how to count occurrences of each item in an array.
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

671 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question