Solved

Spaces and Characters getting escaped in search url

Posted on 2011-10-01
10
567 Views
Last Modified: 2012-05-12
Drupal 7.  

My search was working fine until I went live this morning with my site, which is in a subdirectory.  My gut feeling is something in the htaccess file did this, but I dunno.

If I search for "cessna" everything works fine.  But if I search for "cessna airplane" for example, the search string returned is weird - "cessna%2520plane".  Where is that '52' coming from?

Anyway the site is live, you can see what I mean here: http://www.ainonline.com
0
Comment
Question by:michaelgiaimo
  • 4
  • 3
  • 2
10 Comments
 
LVL 10

Expert Comment

by:Ultrus
Comment Utility
A couple guesses:

%20 is url encoded space. %25 is a url encoded escape character, escaping the space for reasons unknown. Maybe try running cron.php a couple times on your new site if it needs indexed. I think there's a setting somewhere to great out your index. If you clear it and run cron a few times, your site will be re-indexed and that could help.

Best regards,

Chris
0
 
LVL 10

Expert Comment

by:Ultrus
Comment Utility
*to CLEAR out your index

Silly phone autocorrect
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
I wonder if there is any confusion between urlencode() and rawurlencode() and their corresponding decode functions.
http://us.php.net/manual/en/function.rawurlencode.php
http://us.php.net/manual/en/function.urlencode.php

When I searched a b, it took me to the advanced search page and pre-loaded the search box with a%20B.  With urlencode() you would expect a+b.

Got similar results for a b c -- a%20b%20c
Tried with two blanks a  b and got a%20%20b
Tried cessna airplane and got cessna%20airplane
Tried bombardier commercial aircraft and got bombardier%20commercial%20aircraft.  Also got no search results which does not make sense given the subject matter.

Is there any JavaScript that acts on the form input, beyond what we see in the view source?
0
 

Author Comment

by:michaelgiaimo
Comment Utility
As far as I know, the only Javascript involved sets the "Search this site" text and removes it when clicked on.

Now, take note of the URL, it's different from what appears in the search form input.  When I search for cessna airplane, the input form returns cessna%20airplane but the search url has cessna%2520airplane - and from what I see that's the killer - if you put cessna%20airplane in the url it works fine.
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Go to this page:
http://ainonline.com/?q=search/node/cessna%2520airplane

Click the search in the main part of the page (not the header) a few times and watch what is happening to the URL.

The %20 (hex 20) is the blank character, rawurlencoded.  If it were just urlencoded it would be a plus sign.
The %25 character is the percent sign, urlencoded or rawurlencoded.  Both functions convert it the same way.

So it looks like the percent sign character is being converted into its %25 encoding, and this is used to replace the percent sign character each time the search form is submitted.  I am guessing that there is some kind of double encoding going on.  This happens to any string with a blank in it.

The original string is a b.  The first conversion makes this a%20b.  The next conversion begins to propagate the double-converted percent sign.

This could be a coding error somewhere in the framework or modules.  But you said you suspect the .htaccess file.  Would you please post that here?  Thanks.
0
 

Author Comment

by:michaelgiaimo
Comment Utility
Sure.  Only reason I suspect the htaccess is because this just started happening this morning, when we went live with the new site.  I modified the root htaccess to hide the drupal subdirectory - here's what's in the root:

Options -Indexes
Options +FollowSymLinks
RewriteEngine on


# stuff to let through (ignore)
RewriteCond %{REQUEST_URI} "/openx/" [OR]
RewriteCond %{REQUEST_URI} "/typo3/" [OR]
RewriteCond %{REQUEST_URI} "/oa/"
RewriteRule (.*) $1 [L]


# Redirect all user to without WWW
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

# Serve Drupal from sub directory in web root
RewriteRule ^$ drupal/index.php [L]
RewriteCond %{DOCUMENT_ROOT}/drupal%{REQUEST_URI} -f
RewriteRule .* drupal/$0 [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* drupal/index.php?q=$0 [QSA]

Open in new window



Then, here is the htaccess from the Drupal subdirectory:

#
# Apache/PHP/Drupal settings:
#

# Protect files and directories from prying eyes.
<FilesMatch "\.(engine|inc|info|install|make|module|profile|test|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)$|^(\..*|Entries.*|Repository|Root|Tag|Template)$">
  Order allow,deny
</FilesMatch>

# Don't show directory listings for URLs which map to a directory.
Options -Indexes

# Follow symbolic links in this directory.
Options +FollowSymLinks

# Make Drupal handle any 404 errors.
ErrorDocument 404 /index.php

# Force simple error message for requests for non-existent favicon.ico.
<Files favicon.ico>
  # There is no end quote below, for compatibility with Apache 1.3.
  ErrorDocument 404 "The requested file favicon.ico was not found.
</Files>

# Set the default handler.
DirectoryIndex index.php index.html index.htm

# Override PHP settings that cannot be changed at runtime. See
# sites/default/default.settings.php and drupal_initialize_variables() in
# includes/bootstrap.inc for settings that can be changed at runtime.

# PHP 5, Apache 1 and 2.
<IfModule mod_php5.c>
  php_flag magic_quotes_gpc                 off
  php_flag magic_quotes_sybase              off
  php_flag register_globals                 off
  php_flag session.auto_start               off
  php_value mbstring.http_input             pass
  php_value mbstring.http_output            pass
  php_flag mbstring.encoding_translation    off
</IfModule>

# Requires mod_expires to be enabled.
<IfModule mod_expires.c>
  # Enable expirations.
  ExpiresActive On

  # Cache all files for 2 weeks after access (A).
  ExpiresDefault A1209600

  <FilesMatch \.php$>
    # Do not allow PHP scripts to be cached unless they explicitly send cache
    # headers themselves. Otherwise all scripts would have to overwrite the
    # headers set by mod_expires if they want another caching behavior. This may
    # fail if an error occurs early in the bootstrap process, and it may cause
    # problems if a non-Drupal PHP file is installed in a subdirectory.
    ExpiresActive Off
  </FilesMatch>
</IfModule>

# Various rewrite rules.
<IfModule mod_rewrite.c>
  RewriteEngine on

  # Block access to "hidden" directories whose names begin with a period. This
  # includes directories used by version control systems such as Subversion or
  # Git to store control files. Files whose names begin with a period, as well
  # as the control files used by CVS, are protected by the FilesMatch directive
  # above.
  #
  # NOTE: This only works when mod_rewrite is loaded. Without mod_rewrite, it is
  # not possible to block access to entire directories from .htaccess, because
  # <DirectoryMatch> is not allowed here.
  #
  # If you do not have mod_rewrite installed, you should remove these
  # directories from your webroot or otherwise protect them from being
  # downloaded.
  RewriteRule "(^|/)\." - [F]

  # If your site can be accessed both with and without the 'www.' prefix, you
  # can use one of the following settings to redirect users to your preferred
  # URL, either WITH or WITHOUT the 'www.' prefix. Choose ONLY one option:
  #
  # To redirect all users to access the site WITH the 'www.' prefix,
  # (http://example.com/... will be redirected to http://www.example.com/...)
  # uncomment the following:
  # RewriteCond %{HTTP_HOST} !^www\. [NC]
  # RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
  #
  # To redirect all users to access the site WITHOUT the 'www.' prefix,
  # (http://www.example.com/... will be redirected to http://example.com/...)
  # uncomment the following:
  # RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
  # RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

  # Modify the RewriteBase if you are using Drupal in a subdirectory or in a
  # VirtualDocumentRoot and the rewrite rules are not working properly.
  # For example if your site is at http://example.com/drupal uncomment and
  # modify the following line:
RewriteBase /drupal
  #
  # If your site is running in a VirtualDocumentRoot at http://example.com/,
  # uncomment the following line:
  # RewriteBase /


  # Pass all requests not referring directly to files in the filesystem to
  # index.php. Clean URLs are handled in drupal_environment_initialize().
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_URI} !=/favicon.ico
  RewriteRule ^ index.php [L]

  # Rules to correctly serve gzip compressed CSS and JS files.
  # Requires both mod_rewrite and mod_headers to be enabled.
  <IfModule mod_headers.c>
    # Serve gzip compressed CSS files if they exist and the client accepts gzip.
    RewriteCond %{HTTP:Accept-encoding} gzip
    RewriteCond %{REQUEST_FILENAME}\.gz -s
    RewriteRule ^(.*)\.css $1\.css\.gz [QSA]

    # Serve gzip compressed JS files if they exist and the client accepts gzip.
    RewriteCond %{HTTP:Accept-encoding} gzip
    RewriteCond %{REQUEST_FILENAME}\.gz -s
    RewriteRule ^(.*)\.js $1\.js\.gz [QSA]

    # Serve correct content types, and prevent mod_deflate double gzip.
    RewriteRule \.css\.gz$ - [T=text/css,E=no-gzip:1]
    RewriteRule \.js\.gz$ - [T=text/javascript,E=no-gzip:1]

    <FilesMatch "(\.js\.gz|\.css\.gz)$">
      # Serve correct encoding type.
      Header append Content-Encoding gzip
      # Force proxies to cache gzipped & non-gzipped css/js files separately.
      Header append Vary Accept-Encoding
    </FilesMatch>
  </IfModule>
</IfModule>

Open in new window

0
 

Accepted Solution

by:
michaelgiaimo earned 0 total points
Comment Utility
So it's also not just the search module, which tells me it must be the htaccess.  Watch how it rewrites this link:

http://www.ainonline.com/?q=aviation-news/blogs/ain-blog-industry-struggling-recover-%E2%80%9Cgee-thanks-mr-president%E2%80%9D/30496

0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Yeah, I think your instincts are right about .htaccess.  I have asked the moderators to add this to the Apache Zone, too.
0
 

Author Closing Comment

by:michaelgiaimo
Comment Utility
More involved than this one issue, htaccess was borked.
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

This article discusses four methods for overlaying images in a container on a web page
This article discusses how to create an extensible mechanism for linked drop downs.
The viewer will learn how to dynamically set the form action using jQuery.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now