Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Find unicode character in webpage?

Posted on 2014-03-25
8
Medium Priority
?
312 Views
Last Modified: 2014-03-25
I am trying to search for the unicode character &#8652 (double arrrow) in a web page.

The follow code validates the character and searches using PHP preg_match.
It is does not find the required character. How can I fix this?

echo mb_convert_encoding('&#8652', 'UTF-8', 'HTML-ENTITIES');

$var=file_get_contents("http://mywebsite.com")    ;
$var1=utf8_encode($var)  ;

$result = preg_match($arrow, $var1, $matches)     ;

Open in new window

0
Comment
Question by:code4
  • 4
  • 3
8 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39954408
Try
$result = preg_match("\x{21CC}", $var1, $matches);

Open in new window

HTH,
Dan
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 39954426
To understand what is happening here, please read this article:
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html

You may also want to learn about this function:
http://us1.php.net/manual/en/function.mb-ereg-match.php

If you want to give us a small sample of the data, I can show you how to find and fix the issues.  But I need the actual test data, not a description of the data.  Thanks, ~Ray
0
 

Author Comment

by:code4
ID: 39954453
Thanks.
The code produces the following error on my system:

Warning: preg_match() [<a href='function.preg-match'>function.preg-match</a>]: Delimiter must not be alphanumeric or backslash

Here is data for testing:
$arrow= mb_convert_encoding('&#8652', 'UTF-8', 'HTML-ENTITIES');

$var=file_get_contents("http://web.centre.edu/shiba/Chemistry%20Symbols%20in%20Word1.htm");
$var1=utf8_encode($var)  ;

$result = preg_match($arrow, $var1, $matches)     ;

Open in new window

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39954465
You (and me too) forgot the delimiters:

$result = preg_match('/&#8652;/', $var1, $matches);

Open in new window

0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 39954473
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39954480
I think so, Ray. It's a Word document saved a HTML (ugh), and the character sequence the OP looks for is in plain text, so no need for any encoding.
0
 
LVL 111

Accepted Solution

by:
Ray Paseur earned 2000 total points
ID: 39954500
See if this makes sense.  Look at the bottom of the page to see the location of the string in the rendered document.
http://www.iconoun.com/demo/temp_code4.php

<?php // demo/temp_code4.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28397143.html
// REF http://php.net/manual/en/function.utf8-encode.php
// REF http://www.asciitable.com/
// REF http://en.wikipedia.org/wiki/UTF-8


// THE TEST DATA MAY CONTAIN ISO CHARACTERS THAT NEED TO BE CONVERTED TO UTF-8 CHARACTERS
$url = "http://web.centre.edu/shiba/Chemistry%20Symbols%20in%20Word1.htm";
$htm = file_get_contents($url);

// FIRST CHARSET PREVAILS
echo '<meta charset="utf8" />';        // GARBLES NON-UTF-8

// CONVERT THE DATA SET AND DISPLAY THE PAGE
$new = utf8_encode($htm);
echo $new;

// LOCATE A CHARACTER STRING
$sig = '&#8652';
$pos = strpos($new, $sig);
echo PHP_EOL . htmlentities($sig) . "  LOCATED AT $pos";

Open in new window

0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 39954529
There may be a little more "odd" here than just a Word-driven HTML page.  My recommendation to the college would be to get an agency that is familiar with web development to help build a new web site!
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.centre.edu%2F&charset=%28detect+automatically%29&doctype=Inline&group=0
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
Originally, this post was published on Monitis Blog, you can check it here . In business circles, we sometimes hear that today is the “age of the customer.” And so it is. Thanks to the enormous advances over the past few years in consumer techno…
This video teaches users how to migrate an existing Wordpress website to a new domain.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Suggested Courses

572 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question