Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Spaces and Characters getting escaped in search url

Posted on 2011-10-01
10
Medium Priority
?
594 Views
Last Modified: 2012-05-12
Drupal 7.  

My search was working fine until I went live this morning with my site, which is in a subdirectory.  My gut feeling is something in the htaccess file did this, but I dunno.

If I search for "cessna" everything works fine.  But if I search for "cessna airplane" for example, the search string returned is weird - "cessna%2520plane".  Where is that '52' coming from?

Anyway the site is live, you can see what I mean here: http://www.ainonline.com
0
Comment
Question by:michaelgiaimo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
10 Comments
 
LVL 10

Expert Comment

by:Ultrus
ID: 36897143
A couple guesses:

%20 is url encoded space. %25 is a url encoded escape character, escaping the space for reasons unknown. Maybe try running cron.php a couple times on your new site if it needs indexed. I think there's a setting somewhere to great out your index. If you clear it and run cron a few times, your site will be re-indexed and that could help.

Best regards,

Chris
0
 
LVL 10

Expert Comment

by:Ultrus
ID: 36897145
*to CLEAR out your index

Silly phone autocorrect
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 36897211
I wonder if there is any confusion between urlencode() and rawurlencode() and their corresponding decode functions.
http://us.php.net/manual/en/function.rawurlencode.php
http://us.php.net/manual/en/function.urlencode.php

When I searched a b, it took me to the advanced search page and pre-loaded the search box with a%20B.  With urlencode() you would expect a+b.

Got similar results for a b c -- a%20b%20c
Tried with two blanks a  b and got a%20%20b
Tried cessna airplane and got cessna%20airplane
Tried bombardier commercial aircraft and got bombardier%20commercial%20aircraft.  Also got no search results which does not make sense given the subject matter.

Is there any JavaScript that acts on the form input, beyond what we see in the view source?
0
Will your db performance match your db growth?

In Percona’s white paper “Performance at Scale: Keeping Your Database on Its Toes,” we take a high-level approach to what you need to think about when planning for database scalability.

 

Author Comment

by:michaelgiaimo
ID: 36897258
As far as I know, the only Javascript involved sets the "Search this site" text and removes it when clicked on.

Now, take note of the URL, it's different from what appears in the search form input.  When I search for cessna airplane, the input form returns cessna%20airplane but the search url has cessna%2520airplane - and from what I see that's the killer - if you put cessna%20airplane in the url it works fine.
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 36897465
Go to this page:
http://ainonline.com/?q=search/node/cessna%2520airplane

Click the search in the main part of the page (not the header) a few times and watch what is happening to the URL.

The %20 (hex 20) is the blank character, rawurlencoded.  If it were just urlencoded it would be a plus sign.
The %25 character is the percent sign, urlencoded or rawurlencoded.  Both functions convert it the same way.

So it looks like the percent sign character is being converted into its %25 encoding, and this is used to replace the percent sign character each time the search form is submitted.  I am guessing that there is some kind of double encoding going on.  This happens to any string with a blank in it.

The original string is a b.  The first conversion makes this a%20b.  The next conversion begins to propagate the double-converted percent sign.

This could be a coding error somewhere in the framework or modules.  But you said you suspect the .htaccess file.  Would you please post that here?  Thanks.
0
 

Author Comment

by:michaelgiaimo
ID: 36897784
Sure.  Only reason I suspect the htaccess is because this just started happening this morning, when we went live with the new site.  I modified the root htaccess to hide the drupal subdirectory - here's what's in the root:

Options -Indexes
Options +FollowSymLinks
RewriteEngine on


# stuff to let through (ignore)
RewriteCond %{REQUEST_URI} "/openx/" [OR]
RewriteCond %{REQUEST_URI} "/typo3/" [OR]
RewriteCond %{REQUEST_URI} "/oa/"
RewriteRule (.*) $1 [L]


# Redirect all user to without WWW
RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

# Serve Drupal from sub directory in web root
RewriteRule ^$ drupal/index.php [L]
RewriteCond %{DOCUMENT_ROOT}/drupal%{REQUEST_URI} -f
RewriteRule .* drupal/$0 [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* drupal/index.php?q=$0 [QSA]

Open in new window



Then, here is the htaccess from the Drupal subdirectory:

#
# Apache/PHP/Drupal settings:
#

# Protect files and directories from prying eyes.
<FilesMatch "\.(engine|inc|info|install|make|module|profile|test|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)$|^(\..*|Entries.*|Repository|Root|Tag|Template)$">
  Order allow,deny
</FilesMatch>

# Don't show directory listings for URLs which map to a directory.
Options -Indexes

# Follow symbolic links in this directory.
Options +FollowSymLinks

# Make Drupal handle any 404 errors.
ErrorDocument 404 /index.php

# Force simple error message for requests for non-existent favicon.ico.
<Files favicon.ico>
  # There is no end quote below, for compatibility with Apache 1.3.
  ErrorDocument 404 "The requested file favicon.ico was not found.
</Files>

# Set the default handler.
DirectoryIndex index.php index.html index.htm

# Override PHP settings that cannot be changed at runtime. See
# sites/default/default.settings.php and drupal_initialize_variables() in
# includes/bootstrap.inc for settings that can be changed at runtime.

# PHP 5, Apache 1 and 2.
<IfModule mod_php5.c>
  php_flag magic_quotes_gpc                 off
  php_flag magic_quotes_sybase              off
  php_flag register_globals                 off
  php_flag session.auto_start               off
  php_value mbstring.http_input             pass
  php_value mbstring.http_output            pass
  php_flag mbstring.encoding_translation    off
</IfModule>

# Requires mod_expires to be enabled.
<IfModule mod_expires.c>
  # Enable expirations.
  ExpiresActive On

  # Cache all files for 2 weeks after access (A).
  ExpiresDefault A1209600

  <FilesMatch \.php$>
    # Do not allow PHP scripts to be cached unless they explicitly send cache
    # headers themselves. Otherwise all scripts would have to overwrite the
    # headers set by mod_expires if they want another caching behavior. This may
    # fail if an error occurs early in the bootstrap process, and it may cause
    # problems if a non-Drupal PHP file is installed in a subdirectory.
    ExpiresActive Off
  </FilesMatch>
</IfModule>

# Various rewrite rules.
<IfModule mod_rewrite.c>
  RewriteEngine on

  # Block access to "hidden" directories whose names begin with a period. This
  # includes directories used by version control systems such as Subversion or
  # Git to store control files. Files whose names begin with a period, as well
  # as the control files used by CVS, are protected by the FilesMatch directive
  # above.
  #
  # NOTE: This only works when mod_rewrite is loaded. Without mod_rewrite, it is
  # not possible to block access to entire directories from .htaccess, because
  # <DirectoryMatch> is not allowed here.
  #
  # If you do not have mod_rewrite installed, you should remove these
  # directories from your webroot or otherwise protect them from being
  # downloaded.
  RewriteRule "(^|/)\." - [F]

  # If your site can be accessed both with and without the 'www.' prefix, you
  # can use one of the following settings to redirect users to your preferred
  # URL, either WITH or WITHOUT the 'www.' prefix. Choose ONLY one option:
  #
  # To redirect all users to access the site WITH the 'www.' prefix,
  # (http://example.com/... will be redirected to http://www.example.com/...)
  # uncomment the following:
  # RewriteCond %{HTTP_HOST} !^www\. [NC]
  # RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
  #
  # To redirect all users to access the site WITHOUT the 'www.' prefix,
  # (http://www.example.com/... will be redirected to http://example.com/...)
  # uncomment the following:
  # RewriteCond %{HTTP_HOST} ^www\.(.+)$ [NC]
  # RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]

  # Modify the RewriteBase if you are using Drupal in a subdirectory or in a
  # VirtualDocumentRoot and the rewrite rules are not working properly.
  # For example if your site is at http://example.com/drupal uncomment and
  # modify the following line:
RewriteBase /drupal
  #
  # If your site is running in a VirtualDocumentRoot at http://example.com/,
  # uncomment the following line:
  # RewriteBase /


  # Pass all requests not referring directly to files in the filesystem to
  # index.php. Clean URLs are handled in drupal_environment_initialize().
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_URI} !=/favicon.ico
  RewriteRule ^ index.php [L]

  # Rules to correctly serve gzip compressed CSS and JS files.
  # Requires both mod_rewrite and mod_headers to be enabled.
  <IfModule mod_headers.c>
    # Serve gzip compressed CSS files if they exist and the client accepts gzip.
    RewriteCond %{HTTP:Accept-encoding} gzip
    RewriteCond %{REQUEST_FILENAME}\.gz -s
    RewriteRule ^(.*)\.css $1\.css\.gz [QSA]

    # Serve gzip compressed JS files if they exist and the client accepts gzip.
    RewriteCond %{HTTP:Accept-encoding} gzip
    RewriteCond %{REQUEST_FILENAME}\.gz -s
    RewriteRule ^(.*)\.js $1\.js\.gz [QSA]

    # Serve correct content types, and prevent mod_deflate double gzip.
    RewriteRule \.css\.gz$ - [T=text/css,E=no-gzip:1]
    RewriteRule \.js\.gz$ - [T=text/javascript,E=no-gzip:1]

    <FilesMatch "(\.js\.gz|\.css\.gz)$">
      # Serve correct encoding type.
      Header append Content-Encoding gzip
      # Force proxies to cache gzipped & non-gzipped css/js files separately.
      Header append Vary Accept-Encoding
    </FilesMatch>
  </IfModule>
</IfModule>

Open in new window

0
 

Accepted Solution

by:
michaelgiaimo earned 0 total points
ID: 36898081
So it's also not just the search module, which tells me it must be the htaccess.  Watch how it rewrites this link:

http://www.ainonline.com/?q=aviation-news/blogs/ain-blog-industry-struggling-recover-%E2%80%9Cgee-thanks-mr-president%E2%80%9D/30496

0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 36898112
Yeah, I think your instincts are right about .htaccess.  I have asked the moderators to add this to the Apache Zone, too.
0
 

Author Closing Comment

by:michaelgiaimo
ID: 37319310
More involved than this one issue, htaccess was borked.
0

Featured Post

Plug and play, no additional software required!

The ATEN UE3310 USB3.1 Gen1 Extender Cable allows users to extend the distance between the computer and USB devices up to 10 m (33 ft). The UE3310 is a high-quality, cost-effective solution for professional environments such as hospitals, factories and business facilities.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
This article discusses how to create an extensible mechanism for linked drop downs.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question