Link to home
Start Free TrialLog in
Avatar of flowerbloom
flowerbloom

asked on

php postgresql convert mixed encoding to utf-8

I have user input/EDI that may be in different encoding (ISO, Win, etc.).  When I insert it into the database PostgreSQL complain "invalid byte sequence for encoding "UTF8".

PostgreSQL DB encoding is UTF-8.

I cannot control the user input/EDI.  The SQL query actually has mixed encoding, including UTF-8.

Is there a way to convert the none UTF-8 strings/chars to UTF-8 and ignore the strings/chars that are already UTF-8?

Thanks.
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

This is a complicated area of work, and you may find that there is a lot to know in order to make this come out right.  I've researched it and run it to ground for both PHP and MySQL.  If PostGreSQL is using UTF8 and the issue is that the client input is incompatible with UTF8, this article will lead you in the right direction.
https://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html
Avatar of flowerbloom
flowerbloom

ASKER

Hi Ray,

Nice article.  It does not help me.  Let me provide an example.

$q = "inset into t1 (f1,f2,f3) values ('v1','v2','v3');";
$pg_exec($q);

Where database is PostgreSQL and has utf-8 encoding, v1 encoding is utf-8, v2 encoding is iso5589-1, and v3 encoding is win1255.

Error message:  "invalid byte sequence for encoding "UTF8".

I need something like:
$q = convert_to_utf8_ignoring_already_utf8($q);
$pg_exec($q);

Update successfully.


I need something like:
function convert_to_utf8_ignoring_already_utf8 ($s) {
  $new_s = do some magic with $s.  break it apart, put it together, ignore utf-8.  Make all of $s utf-8.
  return $new_s;
}


Thanks.
Avatar of Dave Baldwin
There is this function utf8_encode http://php.net/function.utf8-encode but it does not 'automatically' recognize UTF-8 strings.  You have to know what you are feeding it.
Hi Dave.  This does not help. Thanks anyhow.
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The perfect solution will be a function that breaks down the PostgreSQL query and go over each input fields/values and convert to UTF-8.  Oh well.  Thanks anyhow.