Link to home
Start Free TrialLog in
Avatar of JP_TechGroup
JP_TechGroupFlag for United States of America

asked on

Illegal Characters in HTML File Path

I have inherited an application which displays images and documents added by users from a PC based front end.
The developer did not put any error checking into the file add procedure and I am finding many files added with spaces and # in their names. I need a way to clean up these bad files in real time using php so that these badly named files can be displayed in HTML.
Thank you in advance for your help.
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Please post some examples of the file names.  Without that, all we can do is guess about what might work well for you.

As a general rule, you can use letters, numbers and underscores in ASCII characters with any file system, so the strategy would be to replace non-compliant characters with something from this set.
Avatar of Brian Tao
The PHP function rawurlencode() is what you need. Link to the official definition is here: http://php.net/manual/en/function.rawurlencode.php
Use this PHP code to rename all the files in a batch!
<?php

$dir = '/path/to/images';
if ($handle = opendir($dir)) {
    while (false !== ($fileName = readdir($handle))) {
        $newName = preg_replace('/[^a-z0-9\._]/i', '_', $fileName);
        if(is_file($dir.$fileName)) rename($dir.$fileName, $dir.$newName);
    }
    closedir($handle);
}

?>

Open in new window

Avatar of JP_TechGroup

ASKER

Here is a perfect example:
http://somerootdirectory.comdocuments/40049/PO%20# 4126058646.pdf

Apparently some user named hundreds of documents with spaces and hashtags.
Renaming the files themselves, plus the document paths is not an option, unfortunately.
rawurlencode won't touch the hashtag, unfortunately :/
Sorry, should have mentioned I tried that and htmlentities() as well.
The output from PHP RawUrlEncode() looks pretty ugly here.  We might need some additional information.
<?php // demo/temp_jp_tech.php

/**
 * http://www.experts-exchange.com/questions/28695005/Illegal-Characters-in-HTML-File-Path.html#a40868707
 * http://www.faqs.org/rfcs/rfc3986.html
 * http://php.net/manual/en/function.rawurlencode.php
 */
error_reporting(E_ALL);
echo '<pre>';


// TEST DATA FROM THE POST AT EE
$url = <<<EOD
http://somerootdirectory.comdocuments/40049/PO%20# 4126058646.pdf
EOD;


// SHOW THE ORIGINAL AND THE WORK PRODUCT
echo PHP_EOL . $url;
echo PHP_EOL . rawUrlEncode($url);

Open in new window

Let's try to understand the desired use case.  You've said we cannot rename the files, so my question would be "why not?"  What prevents the obvious best solution from being done?

What would you want to do with something like this:

http://somerootdirectory.comdocuments/40049/PO%20# 4126058646.pdf

Would you want to use it in a URL, or send it to a client in an email link?  Are you concerned that com and documents are run together?  They look like they might need a DIRECTOR_SEPARATOR character to be useful.

Please tell us a little more about this, and maybe we can help you find some useful way out of the morass!
You said: " I need a way to clean up these bad files in real time using php..."  Please give a detailed explanation of what you want to do in realtime.
I do not have access to the client front-end. Moreover, I have been informed that the developer has no intention of putting any file name validation into their code. As such, filenames with hashtags and spaces will continue to occur and it is somehow my problem to make the files with these bad names open from the website. My desired outcome is to take an href as above and get it to a state whereby it can be opened in a webbrowser.
How do you now get the name of the file?  Show us the PHP code that you use but that is not working to get the files.
... the developer has no intention of putting any file name validation into their code
The developer should be fired for cause and the application should be refactored.

Hashtags have a legitimate place in the URL scheme; they are used as markers in HTML documents.

The correct answer here is to rename the files.  If you are not allowed to do that, we need to understand why.
ASKER CERTIFIED SOLUTION
Avatar of skij
skij
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That was what I was looking for. Thank you!