Find unicode character in webpage?

I am trying to search for the unicode character &#8652 (double arrrow) in a web page.

The follow code validates the character and searches using PHP preg_match.
It is does not find the required character. How can I fix this?

echo mb_convert_encoding('&#8652', 'UTF-8', 'HTML-ENTITIES');

$var=file_get_contents("http://mywebsite.com")    ;
$var1=utf8_encode($var)  ;

$result = preg_match($arrow, $var1, $matches)     ;

Open in new window

code4Asked:
Who is Participating?
 
Ray PaseurCommented:
See if this makes sense.  Look at the bottom of the page to see the location of the string in the rendered document.
http://www.iconoun.com/demo/temp_code4.php

<?php // demo/temp_code4.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28397143.html
// REF http://php.net/manual/en/function.utf8-encode.php
// REF http://www.asciitable.com/
// REF http://en.wikipedia.org/wiki/UTF-8


// THE TEST DATA MAY CONTAIN ISO CHARACTERS THAT NEED TO BE CONVERTED TO UTF-8 CHARACTERS
$url = "http://web.centre.edu/shiba/Chemistry%20Symbols%20in%20Word1.htm";
$htm = file_get_contents($url);

// FIRST CHARSET PREVAILS
echo '<meta charset="utf8" />';        // GARBLES NON-UTF-8

// CONVERT THE DATA SET AND DISPLAY THE PAGE
$new = utf8_encode($htm);
echo $new;

// LOCATE A CHARACTER STRING
$sig = '&#8652';
$pos = strpos($new, $sig);
echo PHP_EOL . htmlentities($sig) . "  LOCATED AT $pos";

Open in new window

0
 
Dan CraciunIT ConsultantCommented:
Try
$result = preg_match("\x{21CC}", $var1, $matches);

Open in new window

HTH,
Dan
0
 
Ray PaseurCommented:
To understand what is happening here, please read this article:
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html

You may also want to learn about this function:
http://us1.php.net/manual/en/function.mb-ereg-match.php

If you want to give us a small sample of the data, I can show you how to find and fix the issues.  But I need the actual test data, not a description of the data.  Thanks, ~Ray
0
Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

 
code4Author Commented:
Thanks.
The code produces the following error on my system:

Warning: preg_match() [<a href='function.preg-match'>function.preg-match</a>]: Delimiter must not be alphanumeric or backslash

Here is data for testing:
$arrow= mb_convert_encoding('&#8652', 'UTF-8', 'HTML-ENTITIES');

$var=file_get_contents("http://web.centre.edu/shiba/Chemistry%20Symbols%20in%20Word1.htm");
$var1=utf8_encode($var)  ;

$result = preg_match($arrow, $var1, $matches)     ;

Open in new window

0
 
Dan CraciunIT ConsultantCommented:
You (and me too) forgot the delimiters:

$result = preg_match('/&#8652;/', $var1, $matches);

Open in new window

0
 
Ray PaseurCommented:
0
 
Dan CraciunIT ConsultantCommented:
I think so, Ray. It's a Word document saved a HTML (ugh), and the character sequence the OP looks for is in plain text, so no need for any encoding.
0
 
Ray PaseurCommented:
There may be a little more "odd" here than just a Word-driven HTML page.  My recommendation to the college would be to get an agency that is familiar with web development to help build a new web site!
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.centre.edu%2F&charset=%28detect+automatically%29&doctype=Inline&group=0
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.