[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1583
  • Last Modified:

Prevent internal server error 500 when URL contains some special characters like ##??

Hi,

When testing a site, i encountered an internal server error (500), when adding multiple # or ? to the url.
The url structure =
example.com/pagename/pagetitle/pageid

The title does 'nothing', you can change this to anything you like, except when adding multiple #'s or ?'s. Then a internal server error is encountered.

Also, the customized 500 error page is not shown, instead a default browser error page is shown.

1. Why do i get this error? As this part of the url, the pagetitle, is not used to load page data or alike.
2. how can i prevent this error notice, and just serve a 404 page or something?

Thanks!
0
peps03
Asked:
peps03
2 Solutions
 
arober11Commented:
Hi

If your using Apache HTTPD, check the location of the ErrorDocument entries in the httpd.conf  and associated included *.conf files.

Your after an entry along the following lines in your sites virtual host definition, or the root httpd.conf, if your not using virtual hosts:

ErrorDocument 500 http://error.example.com/server_error.html
0
 
peps03Author Commented:
Thanks for you reaction!

i dont have access to the httpd.conf file. the hosting company manages that.

i've set it up like this:
ErrorDocument 400 /example.com/errors/400.php
ErrorDocument 404 /example.com/errors/404.php
ErrorDocument 500 /example.com/errors/500.php

and this works for the 404 error. but not for 400 and 500 errors if characters like ### or ?? are in the url

do i have to add any other rules to the .htaccess file to prevent these 400 and 500 errors?
0
 
Tony McCreathTechnical SEO ConsultantCommented:
It may be your website code that is not handling the URLs correctly. Do you use a CMS or a custom made website?
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
peps03Author Commented:
custom made website.

url structure:
example.com/pagename/pagetitle/pageid

it should not matter what is entered in the pagetitle. but if i manually enter ## or ?? i get a 500 error. Instead nothing should happen, as the id is the identifier.

also:
Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.

this is strange, as the 404 error doc does work.
0
 
Tony McCreathTechnical SEO ConsultantCommented:
So your custom code that interprets the URL crashes if ## or ?? are present. You need to debug it.

Note that # and ? have special meaning in URLs and should not be part of your path characters. For example a browser will not send a # nor any characters after it.
0
 
peps03Author Commented:
Thanks for your reply.
i understand.

but this:
http://edition.cnn.com/2012/08/15/opinion/chairez-flag-#####???waving/index.html?hpt=us_c2

just gives a 404. i get a 500 server error. and also not the custom 500 errorpage
0
 
Tony McCreathTechnical SEO ConsultantCommented:
That example is actually this with regard the page that gets requested:

http://edition.cnn.com/2012/08/15/opinion/chairez-flag-

As I said, the # and anything after is removed before the request is made.

So the problem is not the # but the fact it is requesting a missing page, and your website is causing a 500 error instead of a 404 error in those cases.
0
 
peps03Author Commented:
Aaaah oke, i get it.

and your website is causing a 500 error instead of a 404 error in those cases

that is my second problem.

Because the 404´s do work in other scenario´s
0
 
whosbetterthanmeCommented:
Tiggerito is correct. the # symbol in a URL (unencoded) is interpreted as a anchor. While the ? (unencoded) is telling the server that the string immediately before it is a script and you might be passing some arguments.

So if you have an anchor on your page that's called "here", the code in your page might look like:
<a name="here"></a>

then the URL that has # will call the page and go to that anchor.:

www.domain.com/page.php#here

So, if you have a # symbol, the server is going to attempt to send you to a page looking for an anchor tag.

The ? symbol tells the server that it's a script.

www.domain/myscript.php?some_other_stuff

the server is going to look for a script called myscript.php.

It's possible that the web server is configured to do different things other than the standard operations.
0
 
peps03Author Commented:
Oke, i have been testing a lot lately, and i'm a few steps further.

url structure:
example.com/pagename/pagetitle/pageid

if a # or ? is inserted into the pagename part of the url the behavior is as expected, a 404 error.

if a # or ? is inserted into the pagetitle part of the url the behaviour is still not as expected, a 500 error, thanks to your explanation above i now understand why.
(but still not why i receive a 500 instead of a 404, as only the pageid is not found.. not the entire page.

my rewrite rule:
RewriteRule ^([^/]*)/([^/]*)/([^/]*)$ ./$1.php?$1=$3 [L,QSA]

how can i make the server ignore all that is in the second ([^/]*)/ of the url?

as an inserted # or ? would 'block' the third part of the url, the last ([^/]*)
0
 
Tony McCreathTechnical SEO ConsultantCommented:
# is not sent in a request so has no influence on the rule.

? And the parameters are separated out and also not relevant to the rule. They need to be matched using a sequence like this

RewriteCond %{QUERY_STRING}  ^a=b$ [NC]
RewriteRule ^test$ /new? [R=301,NE,NC,L]

You still want to find out why your PHP code is erroring for those URLs. Debug time
0
 
peps03Author Commented:
My php code is not erroring i believe, i think apache is.

And it's not about sent requests, they can also be prevented.

It's about manually adding ## or ?? somewhere in the url in the browser.

Normally this would trigger a 404, not a 400 or 500.

So i think its apache right? How should i debug this? i don't use any more rules than:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^/]*)/([^/]*)/([^/]*)$ ./$1.php?$1=$3 [L,QSA]

and

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.*)$ $1.php

thanks
0
 
Tony McCreathTechnical SEO ConsultantCommented:
The # never gets sent to the server so consider the problem is the part of the URL before. Test it by removing the # and text after and you should see the same error.

I also thing the ? issue again is that it is breaking the base part of the URL up and causing an invalid path.

Note that the Query part of the url (the ? and after) is not included in the RewriteRule. Default behaviour is to just place it back into the resulting URL after the Rule is processed.

So the type of URL that is causing your problems is:

example.com/pagename/page#title/pageid

In this case the server will only see this

example.com/pagename/page

Which would match your second rule and cause

pagename/page.php

or

example.com/pagename/page?title/pageid

The Rules use this:

pagename/page

which again matches the last rule. This time the querystring is added back, resulting in this final URL of

pagename/page.php?title/pageid


Note that in both cases the resulting URL used looks bad and refers to a missing file. In a normal case your first rule would be hit and result in:

./pagename.php?pagename=pageid

Which I presume exists and will work.

A possible cause for the issue is that your second rule does not state it is last (L). This means the resulting URL is reprocessed (I think). This could cause an infinite loop of rewrite changes and your 500 error. Try changing the last line to this and see if things work better:

RewriteRule ^(.*)$ $1.php [L]

Or just remove the rule to see if things change.

The rules you are using and the php file structure don't look robust to me. As in the examples above, the rewrites could end up invoking random php files in random folders.

Most systems invoke a static php file (index.php) and pass all the parameters to it in the query string. It's php then works out how to handle things.
0
 
peps03Author Commented:
Thanks, that cleared things up!

i changed the order:

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.*)$ $1.php

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^/]*)/([^/]*)/([^/]*)$ ./$1.php?$1=$3 [L,QSA]

and now ## and ?? don't cause 500 errors anymore!
now i get the expected 404! great

only the 400 error remains, i found out this is caused by a % in the url (manually added)

i believe this should also result in an 404...

or is this normal? as i see even wikipedia returns an 400:
http://en.wikipedia.org/wiki/C%NN

cnn manages to return a custom 400:
http://edition.cnn.com/2012/08/23/world/europ%e/uk-honor-murder-zara/index.html?hpt=hp_c2
0
 
Tony McCreathTechnical SEO ConsultantCommented:
% is a special character indicating a URL encoded sequence that expects specific characters afterwards (2 hex characters I think). I suspect you are causing a URL that cannot be decoded and the server itself is throwing the error.

If you make your cnn example cause a valid encoding then it shows a different error page:

http://edition.cnn.com/2012/08/23/world/europ%ee/uk-honor-murder-zara/index.html?hpt=hp_c2

That little change caused it to return a 404 instead of a 400.
0
 
peps03Author Commented:
Oke, i see.

I suspect you are causing a URL that cannot be decoded and the server itself is throwing the error
.
Yes, i just added the % somewhere.

Why do i get to see a standard error 400 page? Just like in the wikipedia example above? (with invalid encoding)

Why doesn't it also show my custom 400 error page? Like it now also shows my custom 404 error page?
0
 
Tony McCreathTechnical SEO ConsultantCommented:
Not sure there. Maybe that would be another question so you can attract experts in that area.
0
 
peps03Author Commented:
Many thanks for all the help Tiggerito!

I'll open a new question for the 400 error issue!

thanks.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now