Illegal Characters in HTML File Path

I have inherited an application which displays images and documents added by users from a PC based front end.
The developer did not put any error checking into the file add procedure and I am finding many files added with spaces and # in their names. I need a way to clean up these bad files in real time using php so that these badly named files can be displayed in HTML.
Thank you in advance for your help.
LVL 1
JP_TechGroupAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ray PaseurCommented:
Please post some examples of the file names.  Without that, all we can do is guess about what might work well for you.

As a general rule, you can use letters, numbers and underscores in ASCII characters with any file system, so the strategy would be to replace non-compliant characters with something from this set.
Brian TaoSenior Business Solutions ConsultantCommented:
The PHP function rawurlencode() is what you need. Link to the official definition is here: http://php.net/manual/en/function.rawurlencode.php
skijCommented:
Use this PHP code to rename all the files in a batch!
<?php

$dir = '/path/to/images';
if ($handle = opendir($dir)) {
    while (false !== ($fileName = readdir($handle))) {
        $newName = preg_replace('/[^a-z0-9\._]/i', '_', $fileName);
        if(is_file($dir.$fileName)) rename($dir.$fileName, $dir.$newName);
    }
    closedir($handle);
}

?>

Open in new window

Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

JP_TechGroupAuthor Commented:
Here is a perfect example:
http://somerootdirectory.comdocuments/40049/PO%20# 4126058646.pdf

Apparently some user named hundreds of documents with spaces and hashtags.
Renaming the files themselves, plus the document paths is not an option, unfortunately.
JP_TechGroupAuthor Commented:
rawurlencode won't touch the hashtag, unfortunately :/
Sorry, should have mentioned I tried that and htmlentities() as well.
Ray PaseurCommented:
The output from PHP RawUrlEncode() looks pretty ugly here.  We might need some additional information.
<?php // demo/temp_jp_tech.php

/**
 * http://www.experts-exchange.com/questions/28695005/Illegal-Characters-in-HTML-File-Path.html#a40868707
 * http://www.faqs.org/rfcs/rfc3986.html
 * http://php.net/manual/en/function.rawurlencode.php
 */
error_reporting(E_ALL);
echo '<pre>';


// TEST DATA FROM THE POST AT EE
$url = <<<EOD
http://somerootdirectory.comdocuments/40049/PO%20# 4126058646.pdf
EOD;


// SHOW THE ORIGINAL AND THE WORK PRODUCT
echo PHP_EOL . $url;
echo PHP_EOL . rawUrlEncode($url);

Open in new window

Ray PaseurCommented:
Let's try to understand the desired use case.  You've said we cannot rename the files, so my question would be "why not?"  What prevents the obvious best solution from being done?

What would you want to do with something like this:

http://somerootdirectory.comdocuments/40049/PO%20# 4126058646.pdf

Would you want to use it in a URL, or send it to a client in an email link?  Are you concerned that com and documents are run together?  They look like they might need a DIRECTOR_SEPARATOR character to be useful.

Please tell us a little more about this, and maybe we can help you find some useful way out of the morass!
skijCommented:
You said: " I need a way to clean up these bad files in real time using php..."  Please give a detailed explanation of what you want to do in realtime.
JP_TechGroupAuthor Commented:
I do not have access to the client front-end. Moreover, I have been informed that the developer has no intention of putting any file name validation into their code. As such, filenames with hashtags and spaces will continue to occur and it is somehow my problem to make the files with these bad names open from the website. My desired outcome is to take an href as above and get it to a state whereby it can be opened in a webbrowser.
skijCommented:
How do you now get the name of the file?  Show us the PHP code that you use but that is not working to get the files.
Ray PaseurCommented:
... the developer has no intention of putting any file name validation into their code
The developer should be fired for cause and the application should be refactored.

Hashtags have a legitimate place in the URL scheme; they are used as markers in HTML documents.

The correct answer here is to rename the files.  If you are not allowed to do that, we need to understand why.
skijCommented:
You said that rawurlencode won't touch the hashtag.

Actually, newer versions of PHP have fixed this problem, but you are correct that older versions of PHP do not properly encode the hashtag.

Try using this:
echo ModifiedUrlEncode(  'path/file #2!.png' );

function ModifiedUrlEncode($url){
 return str_ireplace(array('#','%5c'),array('%23','/'),rawurlencode($url));
}

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
JP_TechGroupAuthor Commented:
That was what I was looking for. Thank you!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.