Solved

match 2 textfiles and output with gaps line per line

Posted on 2004-10-04
10
238 Views
Last Modified: 2006-11-17
hi there

I have 2 text files one is the master url list with say 500 urls then the second file has say X number

is there away to compare the 2 and then output the results to a third file in line with the master file.

for example

Master file

http://www.domain1.com/?=1234
http://domain2.com/?=11243
http://domain3.biz/?=123543
http://www.domain4.com/?=1234
http://www.domain5.com/?=11243
http://www.domain6.net/?=123543

Match file

http://www.domain1.com/?=435456
http://www.domain3.biz/?=45656423    <-------- notice the www as they are not exact to the master ignore them on output to
http://www.domain4.com/?=3453454
http://www.domain6.net/?=3453453

Output file should look like this

http://www.domain1.com/?=435456
                                                      <------ due to the fact that the match does not contain the exact match before the = to the master url it leaves gaps instead

http://www.domain4.com/?=3453454

http://www.domain6.net/?=3453453

BUT you will notice that the match file has different endings so match only upto the = sign and ignore the rest

If you can do this your tip top :0)

regards
0
Comment
Question by:playstat
10 Comments
 
LVL 14

Expert Comment

by:ThG
ID: 12223400
I hope they are in some sort of order.. anyway. The following isn't tested, the output is NOT exactly as you want, but you get the idea of the matching process..

$fd = fopen("master.txt", "r");
$fm = fopen("match.txt", "w");

function fetch($f) {
  $tmp = fgets($f);
  if ($tmp === FALSE) return  FALSE;
  $tmp = rtrim($tmp, "\r\n");
  $tmp = preg_replace('/=.*/', '', $tmp); // remove everything after "="
  return $tmp;
}

$next = fetch($fm); // next match
while (($line = fetch($fd)) !== FALSE) {
  if ($line == $next) {
    print $line . "\n";
    $next = fetch($fm);
  }
  else
    print "\n"; // not found, so dont advance match.txt
}
0
 

Expert Comment

by:belcalan
ID: 12223488
Hi playstat,

Try this code:
<?php

$file1 = 'file1.txt';
$file2 = 'file2.txt';
$output_file = 'output.txt';

if (file_exists($file1) && file_exists($file2)) {
    $handle = fopen($file1, "rb");
    $f1_cont = '';
    while (!feof($handle)) {
      $f1_cont .= fread($handle, 8192);
    }
    fclose($handle);

    $handle = fopen($file2, "rb");
    $f2_cont = '';
    while (!feof($handle)) {
      $f2_cont .= fread($handle, 8192);
    }
    fclose($handle);

       $f1_array = preg_split('/\n/', $f1_cont);
       $f2_array = preg_split('/\n/', $f2_cont);  
   
    if (count($f1_array) > count($f2_array)) {
        $max = count($f1_array);
    }
    else {
        $max = count($f2_array);
    }
   
    $fho = fopen($output_file, 'w+');
   
    for ($x = 0;$x < $max; $x++) {
        $length = strpos($f1_array[$x], '?');
        if ($length == 0) {
            $length = strlen($f1_array[$x]);
        }
        $string1 = substr($f1_array[$x], 0, $length);

       
        if (check_array_2($string1)) {
            $output .= $string1."\n";
            print $string1."\n";            
        }
        else {
            $output .= "\n";            
        }
    }
    fwrite($fho, $output);
    fclose($fho);
}

function check_array_2($string) {
    global $f2_array;
   
    foreach ($f2_array as $row) {
        $row = rtrim($row);
        $length = strpos($row, '?');
        if ($length == 0) {
            $length = strlen($row);
        }
        $string2 = substr($row, 0, $length);                
       
        if ($string2 == $string) {
            return true;
        }
    }
    return false;
}
?>
0
 

Author Comment

by:playstat
ID: 12243436
bal it does not work all i get is a blank output file

regards
0
 

Author Comment

by:playstat
ID: 12243457
nope my bad all it does is chop off everything on and after the = sign

plus the output of the file is not line by line
0
 

Author Comment

by:playstat
ID: 12243516
plus there are no gaps in the matches that are not found on output
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:playstat
ID: 12243582
Tell you waht sratch the gaps just insert the original master urls on the output file if you can please
0
 
LVL 27

Expert Comment

by:Diablo84
ID: 12243990
Hi playstat,

I thought i would have a quick look at your question now, and I think this renders the output you are looking for, i tested it with the sample data you provided. You may need to configure the first four variables (hopefully they are self explanitory).


<?php
$path = $_SERVER['DOCUMENT_ROOT']."/";
$master_file = $path."master.txt";
$match_file = $path."match.txt";
$output_file = $path."output.txt";

$master_array = explode("\n",file_get_contents($master_file));
$match_array = explode("\n",file_get_contents($match_file));

foreach($match_array as $match_item) {
 $parse = parse_url($match_item);
 $parse = $parse['scheme']."://".$parse['host']."/";
 foreach ($master_array as $master_item) if (strstr($master_item,$parse)) $output[] = $match_item;
}

$output = implode("\n",$output);

$handle = fopen($output_file,"w") or die("Cannot open file");
fwrite($handle, $output) or die("Cannot write to file");
fclose($handle);

echo "Analysis Complete!";
?>


If there are any problems or the output isn't quite right post back and il check in the morning.

Best Wishes

|)iablo
0
 
LVL 27

Expert Comment

by:Diablo84
ID: 12244015
Sorry, i forgot about the gaps, a little modification and....

<?php
$path = $_SERVER['DOCUMENT_ROOT']."/Playstat/";
$master_file = $path."master.txt";
$match_file = $path."match.txt";
$output_file = $path."output.txt";

$master_array = explode("\n",file_get_contents($master_file));
$match_array = explode("\n",file_get_contents($match_file));

foreach($match_array as $match_item) {
 $parse = parse_url($match_item);
 $parse = $parse['scheme']."://".$parse['host']."/";
 $check = false;
 foreach ($master_array as $master_item) {
  if (strstr($master_item,$parse)) {
   $output[] = $match_item;
   $check = true;
  }
 }
 if ($check == false) $output[] = "\n";
}

$output = implode("\n",$output);

$handle = fopen($output_file,"w") or die("Cannot open file");
fwrite($handle, $output) or die("Cannot write to file");
fclose($handle);

echo "Analysis Complete!";
?>
0
 

Author Comment

by:playstat
ID: 12244297
Diablo can you change it so say the url in the match file is not exact to upto the = in the master file insert the master URL instead of gaps to the output file

Master file

http://www.domain1.com/?=1234
http://domain2.com/?=11243
http://domain3.biz/?=123543
http://www.domain4.com/?=1234
http://www.domain5.com/?=11243
http://www.domain6.net/?=123543

Match file

http://www.domain1.com/?=435456
http://www.domain3.biz/?=45656423    <-------- notice the www as they are not exact to the master ignore them on output to
http://www.domain4.com/?=3453454
http://www.domain6.net/?=3453453

Output file should look like this

http://www.domain1.com/?=435456       <-- matched to master successful
http://domain2.com/?=11243                  <---- master file url as there was no exact to the match file
http://domain3.biz/?=123543                  <----- master file url
http://www.domain4.com/?=3453454      <-- matched to master successful
http://www.domain5.com/?=11243         <------master file url
http://www.domain6.net/?=3453453       <-- matched to master successful


regards

0
 
LVL 27

Accepted Solution

by:
Diablo84 earned 500 total points
ID: 12247007
Try this:


<?php
$path = $_SERVER['DOCUMENT_ROOT']."/Playstat/";
$master_file = $path."master.txt";
$match_file = $path."match.txt";
$output_file = $path."output.txt";

$master_array = explode("\n",file_get_contents($master_file));
$match_array = explode("\n",file_get_contents($match_file));

foreach ($master_array as $master_item) {
 $parse = parse_url($master_item);
 $parse = $parse['scheme']."://".$parse['host']."/";
 $check = false;
 foreach($match_array as $match_item) {
  if ($master_item == $match_item) {
   $output[] = $master_item;
   $check = true;
  }
  elseif (strstr($match_item,$parse)) {
   $output[] = $match_item;
   $check = true;
  }
 }
 if ($check == false) $output[] = $master_item;
}

$output = implode("\n",$output);

$handle = fopen($output_file,"w") or die("Cannot open file");
fwrite($handle, $output) or die("Cannot write to file");
fclose($handle);

echo "Analysis Complete!";
?>
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to dynamically set the form action using jQuery.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now