Solved

php postgresql convert mixed encoding to utf-8

Posted on 2014-12-08
6
359 Views
Last Modified: 2014-12-08
I have user input/EDI that may be in different encoding (ISO, Win, etc.).  When I insert it into the database PostgreSQL complain "invalid byte sequence for encoding "UTF8".

PostgreSQL DB encoding is UTF-8.

I cannot control the user input/EDI.  The SQL query actually has mixed encoding, including UTF-8.

Is there a way to convert the none UTF-8 strings/chars to UTF-8 and ignore the strings/chars that are already UTF-8?

Thanks.
0
Comment
Question by:flowerbloom
  • 3
  • 2
6 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40487960
This is a complicated area of work, and you may find that there is a lot to know in order to make this come out right.  I've researched it and run it to ground for both PHP and MySQL.  If PostGreSQL is using UTF8 and the issue is that the client input is incompatible with UTF8, this article will lead you in the right direction.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html
0
 
LVL 1

Author Comment

by:flowerbloom
ID: 40487983
Hi Ray,

Nice article.  It does not help me.  Let me provide an example.

$q = "inset into t1 (f1,f2,f3) values ('v1','v2','v3');";
$pg_exec($q);

Where database is PostgreSQL and has utf-8 encoding, v1 encoding is utf-8, v2 encoding is iso5589-1, and v3 encoding is win1255.

Error message:  "invalid byte sequence for encoding "UTF8".

I need something like:
$q = convert_to_utf8_ignoring_already_utf8($q);
$pg_exec($q);

Update successfully.


I need something like:
function convert_to_utf8_ignoring_already_utf8 ($s) {
  $new_s = do some magic with $s.  break it apart, put it together, ignore utf-8.  Make all of $s utf-8.
  return $new_s;
}


Thanks.
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 40487992
There is this function utf8_encode http://php.net/function.utf8-encode but it does not 'automatically' recognize UTF-8 strings.  You have to know what you are feeding it.
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 
LVL 1

Author Comment

by:flowerbloom
ID: 40488032
Hi Dave.  This does not help. Thanks anyhow.
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 40488105
Where database is PostgreSQL and has utf-8 encoding, v1 encoding is utf-8, v2 encoding is iso5589-1, and v3 encoding is win1255.
This is something you must fix -- there is no automated solution.

$q = convert_to_utf8_ignoring_already_utf8($q);
The article explains why this is impossible.  Sorry, there are no unicorns.  Your script must detect the encoding of the existing data and must use the correct PHP functions to handle the process of conversion to UTF8, with a sensitivity to data that is already UTF8.  

If you want to go back to the article and read it carefully for comprehension, I'll be glad to help.  I wrote it as clearly as I could, but this is not a subject with magic bullets -- it requires detailed, step-by-step understanding of the issues.  If you read the article and think I might be able to help, please post your input data and the exact output you want to achieve.  By "input" I mean the data you get from the external sources, such as a client request.  By "output" I mean the data you want to put into the PostGreSQL query strings.
0
 
LVL 1

Author Closing Comment

by:flowerbloom
ID: 40488109
The perfect solution will be a function that breaks down the PostgreSQL query and go over each input fields/values and convert to UTF-8.  Oh well.  Thanks anyhow.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to count occurrences of each item in an array.

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now