Spaces and Characters getting escaped in search url

Drupal 7.  

My search was working fine until I went live this morning with my site, which is in a subdirectory.  My gut feeling is something in the htaccess file did this, but I dunno.

If I search for "cessna" everything works fine.  But if I search for "cessna airplane" for example, the search string returned is weird - "cessna%2520plane".  Where is that '52' coming from?

Anyway the site is live, you can see what I mean here:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

michaelgiaimoConnect With a Mentor Author Commented:
So it's also not just the search module, which tells me it must be the htaccess.  Watch how it rewrites this link:

A couple guesses:

%20 is url encoded space. %25 is a url encoded escape character, escaping the space for reasons unknown. Maybe try running cron.php a couple times on your new site if it needs indexed. I think there's a setting somewhere to great out your index. If you clear it and run cron a few times, your site will be re-indexed and that could help.

Best regards,

*to CLEAR out your index

Silly phone autocorrect
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Ray PaseurCommented:
I wonder if there is any confusion between urlencode() and rawurlencode() and their corresponding decode functions.

When I searched a b, it took me to the advanced search page and pre-loaded the search box with a%20B.  With urlencode() you would expect a+b.

Got similar results for a b c -- a%20b%20c
Tried with two blanks a  b and got a%20%20b
Tried cessna airplane and got cessna%20airplane
Tried bombardier commercial aircraft and got bombardier%20commercial%20aircraft.  Also got no search results which does not make sense given the subject matter.

Is there any JavaScript that acts on the form input, beyond what we see in the view source?
michaelgiaimoAuthor Commented:
As far as I know, the only Javascript involved sets the "Search this site" text and removes it when clicked on.

Now, take note of the URL, it's different from what appears in the search form input.  When I search for cessna airplane, the input form returns cessna%20airplane but the search url has cessna%2520airplane - and from what I see that's the killer - if you put cessna%20airplane in the url it works fine.
Ray PaseurCommented:
Go to this page:

Click the search in the main part of the page (not the header) a few times and watch what is happening to the URL.

The %20 (hex 20) is the blank character, rawurlencoded.  If it were just urlencoded it would be a plus sign.
The %25 character is the percent sign, urlencoded or rawurlencoded.  Both functions convert it the same way.

So it looks like the percent sign character is being converted into its %25 encoding, and this is used to replace the percent sign character each time the search form is submitted.  I am guessing that there is some kind of double encoding going on.  This happens to any string with a blank in it.

The original string is a b.  The first conversion makes this a%20b.  The next conversion begins to propagate the double-converted percent sign.

This could be a coding error somewhere in the framework or modules.  But you said you suspect the .htaccess file.  Would you please post that here?  Thanks.
michaelgiaimoAuthor Commented:
Sure.  Only reason I suspect the htaccess is because this just started happening this morning, when we went live with the new site.  I modified the root htaccess to hide the drupal subdirectory - here's what's in the root:

Options -Indexes
Options +FollowSymLinks
RewriteEngine on

# stuff to let through (ignore)
RewriteCond %{REQUEST_URI} "/openx/" [OR]
RewriteCond %{REQUEST_URI} "/typo3/" [OR]
RewriteCond %{REQUEST_URI} "/oa/"
RewriteRule (.*) $1 [L]

# Redirect all user to without WWW
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

# Serve Drupal from sub directory in web root
RewriteRule ^$ drupal/index.php [L]
RewriteCond %{DOCUMENT_ROOT}/drupal%{REQUEST_URI} -f
RewriteRule .* drupal/$0 [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* drupal/index.php?q=$0 [QSA]

Open in new window

Then, here is the htaccess from the Drupal subdirectory:

# Apache/PHP/Drupal settings:

# Protect files and directories from prying eyes.
<FilesMatch "\.(engine|inc|info|install|make|module|profile|test|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)$|^(\..*|Entries.*|Repository|Root|Tag|Template)$">
  Order allow,deny

# Don't show directory listings for URLs which map to a directory.
Options -Indexes

# Follow symbolic links in this directory.
Options +FollowSymLinks

# Make Drupal handle any 404 errors.
ErrorDocument 404 /index.php

# Force simple error message for requests for non-existent favicon.ico.
<Files favicon.ico>
  # There is no end quote below, for compatibility with Apache 1.3.
  ErrorDocument 404 "The requested file favicon.ico was not found.

# Set the default handler.
DirectoryIndex index.php index.html index.htm

# Override PHP settings that cannot be changed at runtime. See
# sites/default/default.settings.php and drupal_initialize_variables() in
# includes/ for settings that can be changed at runtime.

# PHP 5, Apache 1 and 2.
<IfModule mod_php5.c>
  php_flag magic_quotes_gpc                 off
  php_flag magic_quotes_sybase              off
  php_flag register_globals                 off
  php_flag session.auto_start               off
  php_value mbstring.http_input             pass
  php_value mbstring.http_output            pass
  php_flag mbstring.encoding_translation    off

# Requires mod_expires to be enabled.
<IfModule mod_expires.c>
  # Enable expirations.
  ExpiresActive On

  # Cache all files for 2 weeks after access (A).
  ExpiresDefault A1209600

  <FilesMatch \.php$>
    # Do not allow PHP scripts to be cached unless they explicitly send cache
    # headers themselves. Otherwise all scripts would have to overwrite the
    # headers set by mod_expires if they want another caching behavior. This may
    # fail if an error occurs early in the bootstrap process, and it may cause
    # problems if a non-Drupal PHP file is installed in a subdirectory.
    ExpiresActive Off

# Various rewrite rules.
<IfModule mod_rewrite.c>
  RewriteEngine on

  # Block access to "hidden" directories whose names begin with a period. This
  # includes directories used by version control systems such as Subversion or
  # Git to store control files. Files whose names begin with a period, as well
  # as the control files used by CVS, are protected by the FilesMatch directive
  # above.
  # NOTE: This only works when mod_rewrite is loaded. Without mod_rewrite, it is
  # not possible to block access to entire directories from .htaccess, because
  # <DirectoryMatch> is not allowed here.
  # If you do not have mod_rewrite installed, you should remove these
  # directories from your webroot or otherwise protect them from being
  # downloaded.
  RewriteRule "(^|/)\." - [F]

  # If your site can be accessed both with and without the 'www.' prefix, you
  # can use one of the following settings to redirect users to your preferred
  # URL, either WITH or WITHOUT the 'www.' prefix. Choose ONLY one option:
  # To redirect all users to access the site WITH the 'www.' prefix,
  # ( will be redirected to
  # uncomment the following:
  # RewriteCond %{HTTP_HOST} !^www\. [NC]
  # RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
  # To redirect all users to access the site WITHOUT the 'www.' prefix,
  # ( will be redirected to
  # uncomment the following:
  # RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
  # RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

  # Modify the RewriteBase if you are using Drupal in a subdirectory or in a
  # VirtualDocumentRoot and the rewrite rules are not working properly.
  # For example if your site is at uncomment and
  # modify the following line:
RewriteBase /drupal
  # If your site is running in a VirtualDocumentRoot at,
  # uncomment the following line:
  # RewriteBase /

  # Pass all requests not referring directly to files in the filesystem to
  # index.php. Clean URLs are handled in drupal_environment_initialize().
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_URI} !=/favicon.ico
  RewriteRule ^ index.php [L]

  # Rules to correctly serve gzip compressed CSS and JS files.
  # Requires both mod_rewrite and mod_headers to be enabled.
  <IfModule mod_headers.c>
    # Serve gzip compressed CSS files if they exist and the client accepts gzip.
    RewriteCond %{HTTP:Accept-encoding} gzip
    RewriteCond %{REQUEST_FILENAME}\.gz -s
    RewriteRule ^(.*)\.css $1\.css\.gz [QSA]

    # Serve gzip compressed JS files if they exist and the client accepts gzip.
    RewriteCond %{HTTP:Accept-encoding} gzip
    RewriteCond %{REQUEST_FILENAME}\.gz -s
    RewriteRule ^(.*)\.js $1\.js\.gz [QSA]

    # Serve correct content types, and prevent mod_deflate double gzip.
    RewriteRule \.css\.gz$ - [T=text/css,E=no-gzip:1]
    RewriteRule \.js\.gz$ - [T=text/javascript,E=no-gzip:1]

    <FilesMatch "(\.js\.gz|\.css\.gz)$">
      # Serve correct encoding type.
      Header append Content-Encoding gzip
      # Force proxies to cache gzipped & non-gzipped css/js files separately.
      Header append Vary Accept-Encoding

Open in new window

Ray PaseurCommented:
Yeah, I think your instincts are right about .htaccess.  I have asked the moderators to add this to the Apache Zone, too.
michaelgiaimoAuthor Commented:
More involved than this one issue, htaccess was borked.
All Courses

From novice to tech pro — start learning today.