Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Find unicode character in webpage?

Posted on 2014-03-25
8
Medium Priority
?
307 Views
Last Modified: 2014-03-25
I am trying to search for the unicode character &#8652 (double arrrow) in a web page.

The follow code validates the character and searches using PHP preg_match.
It is does not find the required character. How can I fix this?

echo mb_convert_encoding('&#8652', 'UTF-8', 'HTML-ENTITIES');

$var=file_get_contents("http://mywebsite.com")    ;
$var1=utf8_encode($var)  ;

$result = preg_match($arrow, $var1, $matches)     ;

Open in new window

0
Comment
Question by:code4
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39954408
Try
$result = preg_match("\x{21CC}", $var1, $matches);

Open in new window

HTH,
Dan
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 39954426
To understand what is happening here, please read this article:
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html

You may also want to learn about this function:
http://us1.php.net/manual/en/function.mb-ereg-match.php

If you want to give us a small sample of the data, I can show you how to find and fix the issues.  But I need the actual test data, not a description of the data.  Thanks, ~Ray
0
 

Author Comment

by:code4
ID: 39954453
Thanks.
The code produces the following error on my system:

Warning: preg_match() [<a href='function.preg-match'>function.preg-match</a>]: Delimiter must not be alphanumeric or backslash

Here is data for testing:
$arrow= mb_convert_encoding('&#8652', 'UTF-8', 'HTML-ENTITIES');

$var=file_get_contents("http://web.centre.edu/shiba/Chemistry%20Symbols%20in%20Word1.htm");
$var1=utf8_encode($var)  ;

$result = preg_match($arrow, $var1, $matches)     ;

Open in new window

0
Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39954465
You (and me too) forgot the delimiters:

$result = preg_match('/&#8652;/', $var1, $matches);

Open in new window

0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 39954473
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39954480
I think so, Ray. It's a Word document saved a HTML (ugh), and the character sequence the OP looks for is in plain text, so no need for any encoding.
0
 
LVL 111

Accepted Solution

by:
Ray Paseur earned 2000 total points
ID: 39954500
See if this makes sense.  Look at the bottom of the page to see the location of the string in the rendered document.
http://www.iconoun.com/demo/temp_code4.php

<?php // demo/temp_code4.php
error_reporting(E_ALL);


// SEE http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_28397143.html
// REF http://php.net/manual/en/function.utf8-encode.php
// REF http://www.asciitable.com/
// REF http://en.wikipedia.org/wiki/UTF-8


// THE TEST DATA MAY CONTAIN ISO CHARACTERS THAT NEED TO BE CONVERTED TO UTF-8 CHARACTERS
$url = "http://web.centre.edu/shiba/Chemistry%20Symbols%20in%20Word1.htm";
$htm = file_get_contents($url);

// FIRST CHARSET PREVAILS
echo '<meta charset="utf8" />';        // GARBLES NON-UTF-8

// CONVERT THE DATA SET AND DISPLAY THE PAGE
$new = utf8_encode($htm);
echo $new;

// LOCATE A CHARACTER STRING
$sig = '&#8652';
$pos = strpos($new, $sig);
echo PHP_EOL . htmlentities($sig) . "  LOCATED AT $pos";

Open in new window

0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 39954529
There may be a little more "odd" here than just a Word-driven HTML page.  My recommendation to the college would be to get an agency that is familiar with web development to help build a new web site!
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.centre.edu%2F&charset=%28detect+automatically%29&doctype=Inline&group=0
0

Featured Post

Simplify Your Workload with One Tool

How do you combat today’s intelligent hacker while managing multiple domains and platforms? By simplifying your workload with one tool. With Lunarpages hosting through Plesk Onyx, you can:

Automate SSL generation and installation with two clicks
Experience total server control

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

We are witnesses that everyone is saying that our children shouldn't "play" with a technology because it is dangerous. This article is going to prove that they are wrong.
Ready to get certified? Check out some courses that help you prepare for third-party exams.
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question