?
Solved

check 2 files and alter url output

Posted on 2004-12-01
9
Medium Priority
?
252 Views
Last Modified: 2008-03-17
hi there

here is a fast version of the script please use this and modify

Would it be possible to correct a url IF the second text file has a wrong url and insert into column d of $output_file csv file.
Then write to a second csv file mastercorrected.csv with master and corrected columns---- master,corrected

master

http://www.domain1.com/master?=345
http://domain2.com/master?=afg
http://domain3.com/correct?=abc
http://domain4.com/?=rwe
http://domain5.com=234
http://www.domain6.com=342

match file with right domain with wrong url format but right ID at end.

http://www.domain1.com/wrongtype?=234
http://domain3.com/correct?=365
http://domain4.com/oopps?=856
http://domain2.com/master?=465

CSV output with ,

Master file (which is column A with code above        Corrected url add to column E of $output_file

http://www.domain1.com/master?=345                  http://www.domain1.com/master?=234
http://domain2.com/master?=afg                      http://domain2.com/master?=465
http://domain3.com/correct?=abc                     http://domain3.com/correct?=365
http://domain4.com/?=rwe                            http://www.domain4.com/?=856
http://domain5.com=234                              http://domain5.com=234
http://www.domain6.com=342                          http://www.domain6.com=342



<?php

$master_file = "list.txt";
$match_file = "match.txt";
$output_file = "file.csv";
$csvfile = "mastercorrected.csv"; // this file has 2 columns seperated by a , master text file first and match file

$master_array = preg_split('/[\\r\\n]+/', trim(file_get_contents($master_file)));
$match_array = preg_split('/[\\r\\n]+/', trim(file_get_contents($match_file)));
$second_array = array();

// apply trim() to remove blank chars
foreach ($master_array as $key=>$master_item) $master_array[$key] = trim($master_item);
foreach ($match_array as $key=>$match_item) $match_array[$key] = trim($match_item);

// analyse
foreach($master_array as $key=>$master_item) {
 $check = false;
 $master_parse = explode('=', $master_item);
 unset($second_item);
 foreach ($match_array as $match_item) {
  $match_parse = explode('=', $match_item);
  if ($master_parse[0] == $match_parse[0]) {
   $second_item = $match_item;
   $check = true;
   break;
  }
 }
 $second_array[$key] = isset($second_item) ? $second_item : $master_item;
 if ($check == false)
      $non_match[$key] = $master_item;
}

$output = array();
foreach($master_array as $key=>$master_item) {
 $output[] = "{$master_item},{$second_array[$key]},{$non_match[$key]},corrected after =";
}
//above are columns A B C ---- D

$output = implode("\n",$output);

$handle = fopen($output_file,"wb") or die("Cannot open file");
fwrite($handle, $output) or die("Cannot write to file");
fclose($handle);

$non_match = implode("\n",$non_match);
$handle = fopen("nomatches.txt","w+") or die("Cannot open file");
fwrite($handle, $non_match) or die("Cannot write to file");
fclose($handle);



echo "Analysis Complete!";
?>



thanks
0
Comment
Question by:playstat
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 25

Expert Comment

by:Marcus Bointon
ID: 12719978
It's really not very clear what you're trying to do, but it looks like a job for mod_rewrite anyway?
0
 

Author Comment

by:playstat
ID: 12841192
compare a master url line by line file with another url text file.

first step

domain match
second step between /---------> =  keep
after = insert into output file column E.

IF there is no match insert master url in its place

regards

0
 
LVL 6

Expert Comment

by:aolXFT
ID: 12844167
I'm confused too.

You want to syncronise one file with another?

Please try to explain further what you are trying.

$master_file, and $match_file are what exactly?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 

Author Comment

by:playstat
ID: 12847767
the master file and second file compare the to match the domain

if they are identical upto the = add to column D (it was E my mistake)

if the domains match each other but different after the domain and upto =

use the master as the url and insert the ID in from the second file

master

www.domain1.com/masterfile/thisformat=234

second file will have exact domain name different ID pointer

www.domain1.com/otherfile/wrongformat=456

then output the corrected url into column D ---

www.domain1.com/masterfile/thisformat=456 <------------

as you can see it is then based on the master url but the id from the second url is inserted in column D

regards
0
 

Author Comment

by:playstat
ID: 12847785
the variables are the 2 files one master line by line urls the other second text file line by line urls

master text file

http://www.domain1.com/master?=345
http://domain2.com/master?=afg
http://domain3.com/correct?=abc
http://domain4.com/?=rwe
http://domain5.com=234
http://www.domain6.com=342

second text file

http://www.domain1.com/wrongtype?=234
http://domain3.com/correct?=365
http://domain4.com/oopps?=856
http://domain2.com/master?=465

with right code should insert into column D

Master file (which is column A with code above        Corrected url add to column D of $output_file

http://www.domain1.com/master?=345                  http://www.domain1.com/master?=234
http://domain2.com/master?=afg                      http://domain2.com/master?=465
http://domain3.com/correct?=abc                     http://domain3.com/correct?=365
http://domain4.com/?=rwe                            http://www.domain4.com/?=856
http://domain5.com=234                              http://domain5.com=234
http://www.domain6.com=342                          http://www.domain6.com=342

0
 
LVL 25

Expert Comment

by:Marcus Bointon
ID: 12849749
It's still not clear what you're doing - what are columns? Why are 'wrongtype', 'oopps' etc needed at all? Aside from that, it all looks like straightforward URL rewriting, and as such is definitely a job for mod_rewrite - it will be much faster than using PHP. This would go in an apache config file, or a .htaccess file:

ReWrite On
RewriteRule www.domain1.com/master\?=345 www.domain1.com/master\?=234 [PT]
RewriteRule domain2.com/master\?=afg domain2.com/master\?=465 [PT]
RewriteRule domain3.com/correct\?=abc domain3.com/correct\?=365 [PT]
RewriteRule domain4.com/\?=rwe domain4.com/\?=856 [PT]

the [PT] at the end of each line makes it look like the URLs are correct, but returns the new URL result. You can force them to redirect (so the changed URL is visible to the browser) by using [R] instead.
0
 

Author Comment

by:playstat
ID: 12851724
i need it in a text file

can you please do it in php

regards
0
 
LVL 9

Accepted Solution

by:
keteracel earned 2000 total points
ID: 12901337
ok... try this.... I wasn't entirely sure as to what you wanted so this is my best guess:

<?php

function extractDetailsFromURLs($URLarray) {
 $items = array();
 $i = 0;
 
 foreach($URLarray as $item) {
  if (!preg_match("/(https?:\\/\\/.+?)((\\/.*?|)\\??=)(.+)/i", trim($item), $match)) continue;

  $items[$i]["domain"]  = $match[1];
  $items[$i]["rest"]    = $match[2];
  $items[$i]["value"]   = $match[4];
  $items[$i]["matched"] = false;
  $items[$i++]["all"]   = $match[0];
 }
 return $items;
}

$master_file = "list.txt";
$match_file = "match.txt";
$output_file = "file.csv";
$csvfile = "mastercorrected.csv"; // this file has 2 columns seperated by a , master text file first and match file

$master_array = file($master_file);
$match_array  = file($match_file);

$master_items = extractDetailsFromURLs($master_array);
$match_items  = extractDetailsFromURLs($match_array);

$output = array();
$i = 0;

foreach($master_items as $item) {
 foreach($match_items as $item2) {
  if ($item2["domain"] == $item["domain"]) {
    $item2["matched"] = $item["matched"] = true;
    $output[$i++] = "{$item["all"]},,,{$item["domain"]}{$item["rest"]}{$item2["value"]}";
  }
 }
 
 if (!$item["matched"]) $output[$i++] = "{$item["all"]},,,{$item["all"]}";
}

$output = implode("\n",$output);

$handle = fopen($output_file,"wb") or die("Cannot open file");
fwrite($handle, $output) or die("Cannot write to file");
fclose($handle);

header("Status: 302 Found");
header("Location: $output_file");
?>
0
 
LVL 9

Expert Comment

by:keteracel
ID: 13028908
hey playstat ,

I've answered two of your questions and you haven't closed them yet...

keteracel
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

These days socially coordinated efforts have turned into a critical requirement for enterprises.
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses
Course of the Month13 days, 22 hours left to enroll

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question