Link to home
Create AccountLog in
Avatar of morako
morakoFlag for United States of America

asked on

How can I strip the results in this recordset from unwanted characters.

How can I strip the results in this recordset from unwanted characters.

My XML that I am trying ti spit out is breaking with invalid characters.

Can someone help me with code to escape these characters?
<?php 
ob_start();
header('Content-type: text/xml');
header('Pragma: public');        
header('Cache-control: private');
header('Expires: -1');
if (!function_exists("GetSQLValueString")) {
function GetSQLValueString($theValue, $theType, $theDefinedValue = "", $theNotDefinedValue = "") 
{
  if (PHP_VERSION < 6) {
    $theValue = get_magic_quotes_gpc() ? stripslashes($theValue) : $theValue;
  }

  $theValue = function_exists("mysql_real_escape_string") ? mysql_real_escape_string($theValue) : mysql_escape_string($theValue);

  switch ($theType) {
    case "text":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;    
    case "long":
    case "int":
      $theValue = ($theValue != "") ? intval($theValue) : "NULL";
      break;
    case "double":
      $theValue = ($theValue != "") ? doubleval($theValue) : "NULL";
      break;
    case "date":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;
    case "defined":
      $theValue = ($theValue != "") ? $theDefinedValue : $theNotDefinedValue;
      break;
  }
  return $theValue;
}
}

mysql_select_db($database_search, $search);
$query_rsAll = "SELECT id, title, description  FROM links";
$rsAll = mysql_query($query_rsAll, $search) or die(mysql_error());
$row_rsAll = mysql_fetch_assoc($rsAll);
$totalRows_rsAll = mysql_num_rows($rsAll);
?><?php echo('<?xml version="1.0" encoding="utf-8"?>'); ?>
<listings>
  <?php if ($totalRows_rsAll > 0) { // Show if recordset not empty ?>
  <?php do { ?>
	<listing>
		<?php foreach ($row_rsAll as $column=>$value) { ?>
		<<?php echo $column; ?>><![CDATA[<?php echo (($row_rsAll[$column])); ?>]]></<?php echo $column; ?>>
		<?php } ?>
	</listing>
    <?php } while ($row_rsAll = mysql_fetch_assoc($rsAll)); ?>
	<?php } // Show if recordset not empty ?>
</listings>
<?php
mysql_free_result($rsAll);
?>

Open in new window

Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Please use the code snippet to post the XML string, thanks.
Avatar of morako

ASKER


The XML page cannot be displayed 
Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later. 


--------------------------------------------------------------------------------

An invalid character was found in text content. Error processing resource 'http://beta.wwbn.com/xml1.php'. Line 99, Positi...

    <description><![CDATA[You

Open in new window

Here is what I found in the generated XML:

You?re should probably be "Your" (possessive).

Looks like someone did a cut-and-paste from Word.
<?xml version="1.0" encoding="utf-8"?><listings>

    	<listing>
				<id><![CDATA[1230176651]]></id>
				<title><![CDATA[Mcall.com Restaurant Videos]]></title>
				<description><![CDATA[You¿re timing is great. Our new online Restaurant Clips offer you an interactive and convenient way to view and select the ¿faire of your choice¿. 

Whether you¿re looking for a casual meal at a reasonable price or a romantic dinner for that special occasion¿ you¿ll find it here.  

Just click the links to the right to see what¿s available by area and by type. We know you¿ll enjoy! 
]]></description>

			</listing>

ETC ETC ETC

Open in new window

Avatar of morako

ASKER

I am not sure what you mean by this:

Here is what I found in the generated XML:

You?re should probably be "Your" (possessive).

Looks like someone did a cut-and-paste from Word.


Is there a way I can code to prevent this from displaying with errors?
Avatar of morako

ASKER

I am not a php coder, just trying to help a client.  Can you tell me how to implement a fix/
@morako: we can't always be sitting at the computer working on your problems, and when you post a question at EE, you should expect it to take a while to get the answers.

In this case I recommend that you hire a PHP coder, because the answers are going to require some in-depth technical know-how.  Let's try to take this a step at a time.

Do you understand what happens when you highlight text in a Word document, copy it, and paste it into an HTML textarea?
Next question, do you control the content of this site:
http://beta.wwbn.com/xml1.php

And if so, how does the content get put into the site?
You might also want to be aware of this.
wwbn.png
Avatar of morako

ASKER

I am not sure how it is imputed. Forms, I sure..  I am more concerned with the output.  in XML.
Avatar of morako

ASKER

I thought there might be some built-in function in php to clean out the strings.
We might be able to clean the strings, but a FAR better way to deal with this is on the input side of things.  Just do not allow bad characters to get into the data in the first place.

Anyway, please post the PHP script that generates the XML and I'll try to give you a hand with the clean-up.  It has to be cleaned up before it is loaded into an XML string.
Avatar of morako

ASKER

Here it is..

Thanks in advance..
<?php 
ob_start();
header('Content-type: text/xml');
header('Pragma: public');        
header('Cache-control: private');
header('Expires: -1');
if (!function_exists("GetSQLValueString")) {
function GetSQLValueString($theValue, $theType, $theDefinedValue = "", $theNotDefinedValue = "") 
{
  if (PHP_VERSION < 6) {
    $theValue = get_magic_quotes_gpc() ? stripslashes($theValue) : $theValue;
  }

  $theValue = function_exists("mysql_real_escape_string") ? mysql_real_escape_string($theValue) : mysql_escape_string($theValue);

  switch ($theType) {
    case "text":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;    
    case "long":
    case "int":
      $theValue = ($theValue != "") ? intval($theValue) : "NULL";
      break;
    case "double":
      $theValue = ($theValue != "") ? doubleval($theValue) : "NULL";
      break;
    case "date":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;
    case "defined":
      $theValue = ($theValue != "") ? $theDefinedValue : $theNotDefinedValue;
      break;
  }
  return $theValue;
}
}

mysql_select_db($database_search, $search);
$query_rsAll = "SELECT id, title, description  FROM links";
$rsAll = mysql_query($query_rsAll, $search) or die(mysql_error());
$row_rsAll = mysql_fetch_assoc($rsAll);
$totalRows_rsAll = mysql_num_rows($rsAll);
?><?php echo('<?xml version="1.0" encoding="utf-8"?>'); ?>
<listings>
  <?php if ($totalRows_rsAll > 0) { // Show if recordset not empty ?>
  <?php do { ?>
	<listing>
		<?php foreach ($row_rsAll as $column=>$value) { ?>
		<<?php echo $column; ?>><![CDATA[<?php echo (($row_rsAll[$column])); ?>]]></<?php echo $column; ?>>
		<?php } ?>
	</listing>
    <?php } while ($row_rsAll = mysql_fetch_assoc($rsAll)); ?>
	<?php } // Show if recordset not empty ?>
</listings>
<?php
mysql_free_result($rsAll);
?>

Open in new window

I don't have a data set to test with, but this should help.  Please test it and let me know how it looks, thanks.
The strategy here (line 49) is to "entitize" the output.  If this works, great.  If not, we will need a little more code to remove the bad characters.
<?php 
ob_start();
header('Content-type: text/xml');
header('Pragma: public');        
header('Cache-control: private');
header('Expires: -1');
if (!function_exists("GetSQLValueString")) {
function GetSQLValueString($theValue, $theType, $theDefinedValue = "", $theNotDefinedValue = "") 
{
  if (PHP_VERSION < 6) {
    $theValue = get_magic_quotes_gpc() ? stripslashes($theValue) : $theValue;
  }

  $theValue = function_exists("mysql_real_escape_string") ? mysql_real_escape_string($theValue) : mysql_escape_string($theValue);

  switch ($theType) {
    case "text":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;    
    case "long":
    case "int":
      $theValue = ($theValue != "") ? intval($theValue) : "NULL";
      break;
    case "double":
      $theValue = ($theValue != "") ? doubleval($theValue) : "NULL";
      break;
    case "date":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;
    case "defined":
      $theValue = ($theValue != "") ? $theDefinedValue : $theNotDefinedValue;
      break;
  }
  return $theValue;
}
}

mysql_select_db($database_search, $search);
$query_rsAll = "SELECT id, title, description  FROM links";
$rsAll = mysql_query($query_rsAll, $search) or die(mysql_error());
$row_rsAll = mysql_fetch_assoc($rsAll);
$totalRows_rsAll = mysql_num_rows($rsAll);
?><?php echo('<?xml version="1.0" encoding="utf-8"?>'); ?>
<listings>
  <?php if ($totalRows_rsAll > 0) { // Show if recordset not empty ?>
  <?php do { ?>
	<listing>
		<?php foreach ($row_rsAll as $column=>$value) { ?>
		<<?php echo htmlentities($column); ?>><![CDATA[<?php echo htmlentities($row_rsAll[$column]); ?>]]></<?php echo htmlentities($column); ?>>
		<?php } ?>
	</listing>
    <?php } while ($row_rsAll = mysql_fetch_assoc($rsAll)); ?>
	<?php } // Show if recordset not empty ?>
</listings>
<?php
mysql_free_result($rsAll);
?>

Open in new window

Avatar of morako

ASKER

XML Parsing Error: not well-formed
Location: http://beta.wwbn.com/xml1.php?t=1
Line Number 99, Column 30:                        <description><![CDATA[You¿re timing is great. Our new online Restaurant Clips offer you an interactive and convenient way to view and select the ¿faire of your choice¿.
---------------------------------------------------------^
Looks like it may be a character-set encoding problem or a string pasted from Word.  Unfortunately I cannot get the exact characters through my text editor, so we will have to translate the information to remove the bad characters, accepting only known, good characters.

This script should inject 'ERROR' in place of any unwanted characters.  Give it a try and let's see if we get too many ERROR strings.  We may want to adjust the REGEX before we get to a final version of the script.
<?php // RAY_temp_morako.php

ob_start();
header('Content-type: text/xml');
header('Pragma: public');        
header('Cache-control: private');
header('Expires: -1');
if (!function_exists("GetSQLValueString")) {
function GetSQLValueString($theValue, $theType, $theDefinedValue = "", $theNotDefinedValue = "") 
{
  if (PHP_VERSION < 6) {
    $theValue = get_magic_quotes_gpc() ? stripslashes($theValue) : $theValue;
  }

  $theValue = function_exists("mysql_real_escape_string") ? mysql_real_escape_string($theValue) : mysql_escape_string($theValue);

  switch ($theType) {
    case "text":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;    
    case "long":
    case "int":
      $theValue = ($theValue != "") ? intval($theValue) : "NULL";
      break;
    case "double":
      $theValue = ($theValue != "") ? doubleval($theValue) : "NULL";
      break;
    case "date":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;
    case "defined":
      $theValue = ($theValue != "") ? $theDefinedValue : $theNotDefinedValue;
      break;
  }
  return $theValue;
}
}

mysql_select_db($database_search, $search);
$query_rsAll = "SELECT id, title, description  FROM links";
$rsAll = mysql_query($query_rsAll, $search) or die(mysql_error());
$row_rsAll = mysql_fetch_assoc($rsAll);
$totalRows_rsAll = mysql_num_rows($rsAll);
?><?php echo('<?xml version="1.0" encoding="utf-8"?>'); ?>
<listings>
  <?php if ($totalRows_rsAll > 0) { // Show if recordset not empty ?>
  <?php do { ?>
	<listing>
		<?php foreach ($row_rsAll as $column=>$value) { ?>
		<<?php echo $column; ?>><![CDATA[<?php echo clean_string($value);?>]]></<?php echo $column; ?>>
		<?php } ?>
	</listing>
    <?php } while ($row_rsAll = mysql_fetch_assoc($rsAll)); ?>
	<?php } // Show if recordset not empty ?>
</listings>
<?php
mysql_free_result($rsAll);

// A FUNCTION TO RETURN A CLEAN STRING
function clean_string($str)
{
    // LETTERS, NUMBERS AND PUNCTUATION ONLY
    $regex = '/[^ A-Za-z0-9!,'"\?\$\.\-]/';

    $str = trim($str);
    $str = preg_replace($regex, 'ERROR', $str);
    return $str;
}

Open in new window

Avatar of morako

ASKER

$regex = '/[^ A-Za-z0-9!,"\?\$\.\-]/';

I took out a ' after the comma.
Why did you remove the apostrophe?  If it was a parsing problem, we can use HEREDOC notation instead.  Let's try this one and see if we get fewer errors.
<?php // RAY_temp_morako.php
error_reporting(E_ALL);

ob_start();
header('Content-type: text/xml');
header('Pragma: public');        
header('Cache-control: private');
header('Expires: -1');
if (!function_exists("GetSQLValueString")) {
function GetSQLValueString($theValue, $theType, $theDefinedValue = "", $theNotDefinedValue = "") 
{
  if (PHP_VERSION < 6) {
    $theValue = get_magic_quotes_gpc() ? stripslashes($theValue) : $theValue;
  }

  $theValue = function_exists("mysql_real_escape_string") ? mysql_real_escape_string($theValue) : mysql_escape_string($theValue);

  switch ($theType) {
    case "text":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;    
    case "long":
    case "int":
      $theValue = ($theValue != "") ? intval($theValue) : "NULL";
      break;
    case "double":
      $theValue = ($theValue != "") ? doubleval($theValue) : "NULL";
      break;
    case "date":
      $theValue = ($theValue != "") ? "'" . $theValue . "'" : "NULL";
      break;
    case "defined":
      $theValue = ($theValue != "") ? $theDefinedValue : $theNotDefinedValue;
      break;
  }
  return $theValue;
}
}

mysql_select_db($database_search, $search);
$query_rsAll = "SELECT id, title, description  FROM links";
$rsAll = mysql_query($query_rsAll, $search) or die(mysql_error());
$row_rsAll = mysql_fetch_assoc($rsAll);
$totalRows_rsAll = mysql_num_rows($rsAll);
?><?php echo('<?xml version="1.0" encoding="utf-8"?>'); ?>
<listings>
  <?php if ($totalRows_rsAll > 0) { // Show if recordset not empty ?>
  <?php do { ?>
	<listing>
		<?php foreach ($row_rsAll as $column=>$value) { ?>
		<<?php echo $column; ?>><![CDATA[<?php echo clean_string($value);?>]]></<?php echo $column; ?>>
		<?php } ?>
	</listing>
    <?php } while ($row_rsAll = mysql_fetch_assoc($rsAll)); ?>
	<?php } // Show if recordset not empty ?>
</listings>
<?php
mysql_free_result($rsAll);

// A FUNCTION TO RETURN A CLEAN STRING
function clean_string($str)
{
    // LETTERS, NUMBERS AND PUNCTUATION ONLY
$regex = <<<REGEX
/[^ A-Za-z0-9!,&;'"\?\$\.\-\n\r\t]/
REGEX;

    $str = trim($str);
    $str = preg_replace($regex, 'ERROR', $str);
    return $str;
}

Open in new window

Avatar of morako

ASKER

What do you think?
Think about what?  I'm still interested in getting the number of ERROR strings to a minimum.  Are we ready for another test at that URL?
Avatar of morako

ASKER

yes
Let's try this version of the function.  Please post back when it's time to test again.  Feel free to correct any parse errors!  Thanks, ~Ray


// A FUNCTION TO RETURN A CLEAN STRING
function clean_string($str)
{
    // LETTERS, NUMBERS AND PUNCTUATION ONLY
    $regex
    = '#'         // A REGEX DELIMITER
    . '['         // START CHARACTER CLASS
    . '^'         // MATCH THINGS THAT ARE _NOT_ HERE
    . ' '         // BLANK
    . 'A-Z'       // LETTERS
    . '0-9'       // NUMBERS
    . '!,&;:/'    // PUNCTUATION THAT IS OK IN CLEAR TEXT
    . "'"         // APOSTROPHE
    . '"'         // DOUBLE QUOTE
    . '\?\$\.\-'  // META-CHARACTERS PUNCTUATION THAT MUST BE ESCAPED
    . '\(\)\+\*'  // MORE META-CHARACTERS
    . '\n\r\t'    // NEWLINES AND TABS
    . ']'         // END CHARACTER CLASS
    . '#'         // END REGEX DELIMITER
    . 'is'        // CASE-INSENSITIVE, MULTILINE
    ;
    $str = trim($str);
    $str = preg_replace($regex, 'ERROR', $str);
    return $str;
}

Open in new window

Avatar of morako

ASKER

its up
Good - we are getting rid of the false positives.  Let's try this version...
// A FUNCTION TO RETURN A CLEAN STRING
function clean_string($str)
{
    // LETTERS, NUMBERS AND PUNCTUATION ONLY
    $regex
    = '#'         // A REGEX DELIMITER
    . '['         // START CHARACTER CLASS
    . '^'         // MATCH THINGS THAT ARE _NOT_ HERE
    . ' '         // BLANK
    . 'A-Z'       // LETTERS
    . '0-9'       // NUMBERS
    . '!,&;:/'    // PUNCTUATION THAT IS OK IN CLEAR TEXT
    . '@_='       // SPECIAL CHARACTERS
    . "'"         // APOSTROPHE
    . '"'         // DOUBLE QUOTE
    . '\?\$\.\-'  // META-CHARACTERS PUNCTUATION THAT MUST BE ESCAPED
    . '\(\)\+\*'  // MORE META-CHARACTERS
    . '\n\r\t'    // NEWLINES AND TABS
    . ']'         // END CHARACTER CLASS
    . '#'         // END REGEX DELIMITER
    . 'is'        // CASE-INSENSITIVE, MULTILINE
    ;
    $str = trim($str);
    $str = preg_replace($regex, 'ERROR', $str);
    return $str;
}

Open in new window

Avatar of morako

ASKER

its up
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Avatar of morako

ASKER

its up
Looks good to me.  Happy New Year, ~Ray
Avatar of morako

ASKER

Thanks
Avatar of morako

ASKER

great work Thanks..