Solved

match 2 textfiles and output with gaps line per line

Posted on 2004-10-04
10
277 Views
Last Modified: 2006-11-17
hi there

I have 2 text files one is the master url list with say 500 urls then the second file has say X number

is there away to compare the 2 and then output the results to a third file in line with the master file.

for example

Master file

http://www.domain1.com/?=1234
http://domain2.com/?=11243
http://domain3.biz/?=123543
http://www.domain4.com/?=1234
http://www.domain5.com/?=11243
http://www.domain6.net/?=123543

Match file

http://www.domain1.com/?=435456
http://www.domain3.biz/?=45656423    <-------- notice the www as they are not exact to the master ignore them on output to
http://www.domain4.com/?=3453454
http://www.domain6.net/?=3453453

Output file should look like this

http://www.domain1.com/?=435456
                                                      <------ due to the fact that the match does not contain the exact match before the = to the master url it leaves gaps instead

http://www.domain4.com/?=3453454

http://www.domain6.net/?=3453453

BUT you will notice that the match file has different endings so match only upto the = sign and ignore the rest

If you can do this your tip top :0)

regards
0
Comment
Question by:playstat
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 14

Expert Comment

by:ThG
ID: 12223400
I hope they are in some sort of order.. anyway. The following isn't tested, the output is NOT exactly as you want, but you get the idea of the matching process..

$fd = fopen("master.txt", "r");
$fm = fopen("match.txt", "w");

function fetch($f) {
  $tmp = fgets($f);
  if ($tmp === FALSE) return  FALSE;
  $tmp = rtrim($tmp, "\r\n");
  $tmp = preg_replace('/=.*/', '', $tmp); // remove everything after "="
  return $tmp;
}

$next = fetch($fm); // next match
while (($line = fetch($fd)) !== FALSE) {
  if ($line == $next) {
    print $line . "\n";
    $next = fetch($fm);
  }
  else
    print "\n"; // not found, so dont advance match.txt
}
0
 

Expert Comment

by:belcalan
ID: 12223488
Hi playstat,

Try this code:
<?php

$file1 = 'file1.txt';
$file2 = 'file2.txt';
$output_file = 'output.txt';

if (file_exists($file1) && file_exists($file2)) {
    $handle = fopen($file1, "rb");
    $f1_cont = '';
    while (!feof($handle)) {
      $f1_cont .= fread($handle, 8192);
    }
    fclose($handle);

    $handle = fopen($file2, "rb");
    $f2_cont = '';
    while (!feof($handle)) {
      $f2_cont .= fread($handle, 8192);
    }
    fclose($handle);

       $f1_array = preg_split('/\n/', $f1_cont);
       $f2_array = preg_split('/\n/', $f2_cont);  
   
    if (count($f1_array) > count($f2_array)) {
        $max = count($f1_array);
    }
    else {
        $max = count($f2_array);
    }
   
    $fho = fopen($output_file, 'w+');
   
    for ($x = 0;$x < $max; $x++) {
        $length = strpos($f1_array[$x], '?');
        if ($length == 0) {
            $length = strlen($f1_array[$x]);
        }
        $string1 = substr($f1_array[$x], 0, $length);

       
        if (check_array_2($string1)) {
            $output .= $string1."\n";
            print $string1."\n";            
        }
        else {
            $output .= "\n";            
        }
    }
    fwrite($fho, $output);
    fclose($fho);
}

function check_array_2($string) {
    global $f2_array;
   
    foreach ($f2_array as $row) {
        $row = rtrim($row);
        $length = strpos($row, '?');
        if ($length == 0) {
            $length = strlen($row);
        }
        $string2 = substr($row, 0, $length);                
       
        if ($string2 == $string) {
            return true;
        }
    }
    return false;
}
?>
0
 

Author Comment

by:playstat
ID: 12243436
bal it does not work all i get is a blank output file

regards
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:playstat
ID: 12243457
nope my bad all it does is chop off everything on and after the = sign

plus the output of the file is not line by line
0
 

Author Comment

by:playstat
ID: 12243516
plus there are no gaps in the matches that are not found on output
0
 

Author Comment

by:playstat
ID: 12243582
Tell you waht sratch the gaps just insert the original master urls on the output file if you can please
0
 
LVL 27

Expert Comment

by:Diablo84
ID: 12243990
Hi playstat,

I thought i would have a quick look at your question now, and I think this renders the output you are looking for, i tested it with the sample data you provided. You may need to configure the first four variables (hopefully they are self explanitory).


<?php
$path = $_SERVER['DOCUMENT_ROOT']."/";
$master_file = $path."master.txt";
$match_file = $path."match.txt";
$output_file = $path."output.txt";

$master_array = explode("\n",file_get_contents($master_file));
$match_array = explode("\n",file_get_contents($match_file));

foreach($match_array as $match_item) {
 $parse = parse_url($match_item);
 $parse = $parse['scheme']."://".$parse['host']."/";
 foreach ($master_array as $master_item) if (strstr($master_item,$parse)) $output[] = $match_item;
}

$output = implode("\n",$output);

$handle = fopen($output_file,"w") or die("Cannot open file");
fwrite($handle, $output) or die("Cannot write to file");
fclose($handle);

echo "Analysis Complete!";
?>


If there are any problems or the output isn't quite right post back and il check in the morning.

Best Wishes

|)iablo
0
 
LVL 27

Expert Comment

by:Diablo84
ID: 12244015
Sorry, i forgot about the gaps, a little modification and....

<?php
$path = $_SERVER['DOCUMENT_ROOT']."/Playstat/";
$master_file = $path."master.txt";
$match_file = $path."match.txt";
$output_file = $path."output.txt";

$master_array = explode("\n",file_get_contents($master_file));
$match_array = explode("\n",file_get_contents($match_file));

foreach($match_array as $match_item) {
 $parse = parse_url($match_item);
 $parse = $parse['scheme']."://".$parse['host']."/";
 $check = false;
 foreach ($master_array as $master_item) {
  if (strstr($master_item,$parse)) {
   $output[] = $match_item;
   $check = true;
  }
 }
 if ($check == false) $output[] = "\n";
}

$output = implode("\n",$output);

$handle = fopen($output_file,"w") or die("Cannot open file");
fwrite($handle, $output) or die("Cannot write to file");
fclose($handle);

echo "Analysis Complete!";
?>
0
 

Author Comment

by:playstat
ID: 12244297
Diablo can you change it so say the url in the match file is not exact to upto the = in the master file insert the master URL instead of gaps to the output file

Master file

http://www.domain1.com/?=1234
http://domain2.com/?=11243
http://domain3.biz/?=123543
http://www.domain4.com/?=1234
http://www.domain5.com/?=11243
http://www.domain6.net/?=123543

Match file

http://www.domain1.com/?=435456
http://www.domain3.biz/?=45656423    <-------- notice the www as they are not exact to the master ignore them on output to
http://www.domain4.com/?=3453454
http://www.domain6.net/?=3453453

Output file should look like this

http://www.domain1.com/?=435456       <-- matched to master successful
http://domain2.com/?=11243                  <---- master file url as there was no exact to the match file
http://domain3.biz/?=123543                  <----- master file url
http://www.domain4.com/?=3453454      <-- matched to master successful
http://www.domain5.com/?=11243         <------master file url
http://www.domain6.net/?=3453453       <-- matched to master successful


regards

0
 
LVL 27

Accepted Solution

by:
Diablo84 earned 500 total points
ID: 12247007
Try this:


<?php
$path = $_SERVER['DOCUMENT_ROOT']."/Playstat/";
$master_file = $path."master.txt";
$match_file = $path."match.txt";
$output_file = $path."output.txt";

$master_array = explode("\n",file_get_contents($master_file));
$match_array = explode("\n",file_get_contents($match_file));

foreach ($master_array as $master_item) {
 $parse = parse_url($master_item);
 $parse = $parse['scheme']."://".$parse['host']."/";
 $check = false;
 foreach($match_array as $match_item) {
  if ($master_item == $match_item) {
   $output[] = $master_item;
   $check = true;
  }
  elseif (strstr($match_item,$parse)) {
   $output[] = $match_item;
   $check = true;
  }
 }
 if ($check == false) $output[] = $master_item;
}

$output = implode("\n",$output);

$handle = fopen($output_file,"w") or die("Cannot open file");
fwrite($handle, $output) or die("Cannot write to file");
fclose($handle);

echo "Analysis Complete!";
?>
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
This article discusses how to implement server side field validation and display customized error messages to the client.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question