check 2 files and alter url output

hi there

here is a fast version of the script please use this and modify

Would it be possible to correct a url IF the second text file has a wrong url and insert into column d of $output_file csv file.
Then write to a second csv file mastercorrected.csv with master and corrected columns---- master,corrected

master

http://www.domain1.com/master?=345
http://domain2.com/master?=afg
http://domain3.com/correct?=abc
http://domain4.com/?=rwe
http://domain5.com=234
http://www.domain6.com=342

match file with right domain with wrong url format but right ID at end.

http://www.domain1.com/wrongtype?=234
http://domain3.com/correct?=365
http://domain4.com/oopps?=856
http://domain2.com/master?=465

CSV output with ,

Master file (which is column A with code above        Corrected url add to column E of $output_file

http://www.domain1.com/master?=345                  http://www.domain1.com/master?=234
http://domain2.com/master?=afg                      http://domain2.com/master?=465
http://domain3.com/correct?=abc                     http://domain3.com/correct?=365
http://domain4.com/?=rwe                            http://www.domain4.com/?=856
http://domain5.com=234                              http://domain5.com=234
http://www.domain6.com=342                          http://www.domain6.com=342



<?php

$master_file = "list.txt";
$match_file = "match.txt";
$output_file = "file.csv";
$csvfile = "mastercorrected.csv"; // this file has 2 columns seperated by a , master text file first and match file

$master_array = preg_split('/[\\r\\n]+/', trim(file_get_contents($master_file)));
$match_array = preg_split('/[\\r\\n]+/', trim(file_get_contents($match_file)));
$second_array = array();

// apply trim() to remove blank chars
foreach ($master_array as $key=>$master_item) $master_array[$key] = trim($master_item);
foreach ($match_array as $key=>$match_item) $match_array[$key] = trim($match_item);

// analyse
foreach($master_array as $key=>$master_item) {
 $check = false;
 $master_parse = explode('=', $master_item);
 unset($second_item);
 foreach ($match_array as $match_item) {
  $match_parse = explode('=', $match_item);
  if ($master_parse[0] == $match_parse[0]) {
   $second_item = $match_item;
   $check = true;
   break;
  }
 }
 $second_array[$key] = isset($second_item) ? $second_item : $master_item;
 if ($check == false)
      $non_match[$key] = $master_item;
}

$output = array();
foreach($master_array as $key=>$master_item) {
 $output[] = "{$master_item},{$second_array[$key]},{$non_match[$key]},corrected after =";
}
//above are columns A B C ---- D

$output = implode("\n",$output);

$handle = fopen($output_file,"wb") or die("Cannot open file");
fwrite($handle, $output) or die("Cannot write to file");
fclose($handle);

$non_match = implode("\n",$non_match);
$handle = fopen("nomatches.txt","w+") or die("Cannot open file");
fwrite($handle, $non_match) or die("Cannot write to file");
fclose($handle);



echo "Analysis Complete!";
?>



thanks
playstatAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Marcus BointonCommented:
It's really not very clear what you're trying to do, but it looks like a job for mod_rewrite anyway?
0
playstatAuthor Commented:
compare a master url line by line file with another url text file.

first step

domain match
second step between /---------> =  keep
after = insert into output file column E.

IF there is no match insert master url in its place

regards

0
aolXFTCommented:
I'm confused too.

You want to syncronise one file with another?

Please try to explain further what you are trying.

$master_file, and $match_file are what exactly?
0
Cloud Class® Course: Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

playstatAuthor Commented:
the master file and second file compare the to match the domain

if they are identical upto the = add to column D (it was E my mistake)

if the domains match each other but different after the domain and upto =

use the master as the url and insert the ID in from the second file

master

www.domain1.com/masterfile/thisformat=234

second file will have exact domain name different ID pointer

www.domain1.com/otherfile/wrongformat=456

then output the corrected url into column D ---

www.domain1.com/masterfile/thisformat=456 <------------

as you can see it is then based on the master url but the id from the second url is inserted in column D

regards
0
playstatAuthor Commented:
the variables are the 2 files one master line by line urls the other second text file line by line urls

master text file

http://www.domain1.com/master?=345
http://domain2.com/master?=afg
http://domain3.com/correct?=abc
http://domain4.com/?=rwe
http://domain5.com=234
http://www.domain6.com=342

second text file

http://www.domain1.com/wrongtype?=234
http://domain3.com/correct?=365
http://domain4.com/oopps?=856
http://domain2.com/master?=465

with right code should insert into column D

Master file (which is column A with code above        Corrected url add to column D of $output_file

http://www.domain1.com/master?=345                  http://www.domain1.com/master?=234
http://domain2.com/master?=afg                      http://domain2.com/master?=465
http://domain3.com/correct?=abc                     http://domain3.com/correct?=365
http://domain4.com/?=rwe                            http://www.domain4.com/?=856
http://domain5.com=234                              http://domain5.com=234
http://www.domain6.com=342                          http://www.domain6.com=342

0
Marcus BointonCommented:
It's still not clear what you're doing - what are columns? Why are 'wrongtype', 'oopps' etc needed at all? Aside from that, it all looks like straightforward URL rewriting, and as such is definitely a job for mod_rewrite - it will be much faster than using PHP. This would go in an apache config file, or a .htaccess file:

ReWrite On
RewriteRule www.domain1.com/master\?=345 www.domain1.com/master\?=234 [PT]
RewriteRule domain2.com/master\?=afg domain2.com/master\?=465 [PT]
RewriteRule domain3.com/correct\?=abc domain3.com/correct\?=365 [PT]
RewriteRule domain4.com/\?=rwe domain4.com/\?=856 [PT]

the [PT] at the end of each line makes it look like the URLs are correct, but returns the new URL result. You can force them to redirect (so the changed URL is visible to the browser) by using [R] instead.
0
playstatAuthor Commented:
i need it in a text file

can you please do it in php

regards
0
keteracelCommented:
ok... try this.... I wasn't entirely sure as to what you wanted so this is my best guess:

<?php

function extractDetailsFromURLs($URLarray) {
 $items = array();
 $i = 0;
 
 foreach($URLarray as $item) {
  if (!preg_match("/(https?:\\/\\/.+?)((\\/.*?|)\\??=)(.+)/i", trim($item), $match)) continue;

  $items[$i]["domain"]  = $match[1];
  $items[$i]["rest"]    = $match[2];
  $items[$i]["value"]   = $match[4];
  $items[$i]["matched"] = false;
  $items[$i++]["all"]   = $match[0];
 }
 return $items;
}

$master_file = "list.txt";
$match_file = "match.txt";
$output_file = "file.csv";
$csvfile = "mastercorrected.csv"; // this file has 2 columns seperated by a , master text file first and match file

$master_array = file($master_file);
$match_array  = file($match_file);

$master_items = extractDetailsFromURLs($master_array);
$match_items  = extractDetailsFromURLs($match_array);

$output = array();
$i = 0;

foreach($master_items as $item) {
 foreach($match_items as $item2) {
  if ($item2["domain"] == $item["domain"]) {
    $item2["matched"] = $item["matched"] = true;
    $output[$i++] = "{$item["all"]},,,{$item["domain"]}{$item["rest"]}{$item2["value"]}";
  }
 }
 
 if (!$item["matched"]) $output[$i++] = "{$item["all"]},,,{$item["all"]}";
}

$output = implode("\n",$output);

$handle = fopen($output_file,"wb") or die("Cannot open file");
fwrite($handle, $output) or die("Cannot write to file");
fclose($handle);

header("Status: 302 Found");
header("Location: $output_file");
?>
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
keteracelCommented:
hey playstat ,

I've answered two of your questions and you haven't closed them yet...

keteracel
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.