Solved

php postgresql convert mixed encoding to utf-8

Posted on 2014-12-08
6
314 Views
Last Modified: 2014-12-08
I have user input/EDI that may be in different encoding (ISO, Win, etc.).  When I insert it into the database PostgreSQL complain "invalid byte sequence for encoding "UTF8".

PostgreSQL DB encoding is UTF-8.

I cannot control the user input/EDI.  The SQL query actually has mixed encoding, including UTF-8.

Is there a way to convert the none UTF-8 strings/chars to UTF-8 and ignore the strings/chars that are already UTF-8?

Thanks.
0
Comment
Question by:flowerbloom
  • 3
  • 2
6 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40487960
This is a complicated area of work, and you may find that there is a lot to know in order to make this come out right.  I've researched it and run it to ground for both PHP and MySQL.  If PostGreSQL is using UTF8 and the issue is that the client input is incompatible with UTF8, this article will lead you in the right direction.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_11880-Unicode-PHP-and-Character-Collisions.html
0
 
LVL 1

Author Comment

by:flowerbloom
ID: 40487983
Hi Ray,

Nice article.  It does not help me.  Let me provide an example.

$q = "inset into t1 (f1,f2,f3) values ('v1','v2','v3');";
$pg_exec($q);

Where database is PostgreSQL and has utf-8 encoding, v1 encoding is utf-8, v2 encoding is iso5589-1, and v3 encoding is win1255.

Error message:  "invalid byte sequence for encoding "UTF8".

I need something like:
$q = convert_to_utf8_ignoring_already_utf8($q);
$pg_exec($q);

Update successfully.


I need something like:
function convert_to_utf8_ignoring_already_utf8 ($s) {
  $new_s = do some magic with $s.  break it apart, put it together, ignore utf-8.  Make all of $s utf-8.
  return $new_s;
}


Thanks.
0
 
LVL 82

Expert Comment

by:Dave Baldwin
ID: 40487992
There is this function utf8_encode http://php.net/function.utf8-encode but it does not 'automatically' recognize UTF-8 strings.  You have to know what you are feeding it.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 1

Author Comment

by:flowerbloom
ID: 40488032
Hi Dave.  This does not help. Thanks anyhow.
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 40488105
Where database is PostgreSQL and has utf-8 encoding, v1 encoding is utf-8, v2 encoding is iso5589-1, and v3 encoding is win1255.
This is something you must fix -- there is no automated solution.

$q = convert_to_utf8_ignoring_already_utf8($q);
The article explains why this is impossible.  Sorry, there are no unicorns.  Your script must detect the encoding of the existing data and must use the correct PHP functions to handle the process of conversion to UTF8, with a sensitivity to data that is already UTF8.  

If you want to go back to the article and read it carefully for comprehension, I'll be glad to help.  I wrote it as clearly as I could, but this is not a subject with magic bullets -- it requires detailed, step-by-step understanding of the issues.  If you read the article and think I might be able to help, please post your input data and the exact output you want to achieve.  By "input" I mean the data you get from the external sources, such as a client request.  By "output" I mean the data you want to put into the PostGreSQL query strings.
0
 
LVL 1

Author Closing Comment

by:flowerbloom
ID: 40488109
The perfect solution will be a function that breaks down the PostgreSQL query and go over each input fields/values and convert to UTF-8.  Oh well.  Thanks anyhow.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
Many developers have database experience, but are new to PostgreSQL. It has some truly inspiring capabilities. I have several years' experience with Microsoft's SQL Server. When I began working with MySQL, I wanted a quick-reference to MySQL (htt…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now