multi-language php application

web5dev7
web5dev7 used Ask the Experts™
on
Hi,

I need advice. I have an existing ajax/php/mySQL web application (working, hosted, live, all good) that is basically a newsletter sign-up form, but with the following added functions added:

1. ajax form validations (jquery, etc) and uses math captcha

2. serves a unique coupon code to user (both on confirmation page and via autoresponse HTML email). Static codes are pre-populated in db table - served one per email address.

3. autoresponse HTML email (as mentioned above)

4. password protected client control panel to display db records and export to CSV

All of the above is working fine (although the codebase is somewhat complex and uses about a dozen php class includes and various javascript files).  It is a U.S. based english language microsite on a shared hosting account.  Now I want to "internationalize" it for a couple dozen other countries - each version in it's own native language (spanish, chinese, japanese, french, arabic, etc, etc.) - each in a subdirectory of existing site.

Towards deciding if I should take on this challenge - here's the question...

In regards to the "backend" functionality itemized above - what complications am I likely to encounter due to foreign language character sets, etc. ?  

I am concerned about...

1. autoresponse MIME html emails: various languages, various platforms and email clients.

2. ajax validations handling and display of foreign character sets

3. server side validations handling and display of foreign character sets

4. mySql handling and display of foreign character sets

5. security issues

6.  might it require sub-contracting native speaking developers, etc, etc.

Any projections about potential problem areas would be much appreciated - thanks!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Most Valuable Expert 2011
Top Expert 2016

Commented:
what complications am I likely to encounter...
Your existing list is pretty comprehensive.  This article shows the design pattern for a multi-lingual site, one that can be built out in steps, one language at a time, with a minimum of interference between the different language "silos."

http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_8910-A-Polyglot-Web-Site-in-PHP.html

Probably UTF-8 will be your friend.  Definitely you will need native speakers, especially in the languages that do not use the western alphabets.   The cultural differences are too great to be accommodated in a single translation.  I used to handle Japanese-English translations with two translators - one a native Japanese speaker who spoke English as a second language and the other a native English speaker who was conversationally fluent in Japanese.  The intermediate documents were in J'english and they were usually laughable no matter which author performed the first translation.  But the second translation made for a highly professional work product.

Best of luck, ~Ray
greetings  : web5dev7, , The outline of what you want to do is not unusuall for tying to internationalize an existing web site, Unfortunately (in my limited view) trying to make a "one size fits all" site in other languages and cultures  may take much effort.  Some languages are full of "subtle" references and do not directly translate "word for word", other times suggestions or explanations can be culturally offensive in a mechanical translation.

You say - "might it require sub-contracting native speaking developers", , this would be a definite YES, you will need much help with languages and "references"

My suggestion would be to get a Paid development consultant for translations. There's all kinds of things that you have never thought of, that you will encounter in trying to get a functional language translation, if you hire someone with experience and knowledge, you may save money and time in the long run, unless you have very little to translate.  You will probally not even be able to view the translated pages in your english language browser, much less have any idea in they make any sense or show what was in the original english.

Author

Commented:
Thanks for your advice.  However, I should have been clearer that the translations - the static "front-end" text/copy for each version (including error messages) will be provided to me in plain text, pre-translated by native speakers from each country and that is not going to be my responsibility - mostly a copy/paste on my part.

So although I anticipate some HTML issues with the incorrect display of particular foreign characters in the static content of the web page, I am mainly concerned with the dynamic stuff - the "back-end" client-side and server-side programming (handling, storage, rendering or display) - those dynamic parts, as listed in my post above, being submitted/processed/stored/rendered by php, javascript, mysql, etc.  Of particular concern, the correct rendering and delivery of the autoresponder HTML emails.

So with that clarification, any other thoughts ?
11/26 Forrester Webinar: Savings for Enterprise

How can your organization benefit from savings just by replacing your legacy backup solutions with Acronis' #CyberProtection? Join Forrester's Joe Branca and Ryan Davis from Acronis live as they explain how you can too.

Author

Commented:
p.s.... in short.... an international marketing company is responsible for translations - I'm responsible building out the translated sites so they function correctly.
Most Valuable Expert 2011
Top Expert 2016

Commented:
What is the concern about the autoresponder emails?  Will these be prepared by the marketing translators?

Author

Commented:
Well even in english occasionally there is some character (apostrophe, etc) that gets rendered funky in the autoresponse emails - need to escape characters, etc... so I figured all those odd "hieroglyphics" looking characters in some languages are not going to fare well, each language with it's own tildes, etc.  

But am I concerned for nothing ?

For example, also I have read that others have experienced body copy of emails are ok, but subject line is garbled characters - and varies with email client?

What about form validations functions using regular expressions, etc, etc.. does that stuff stay the same when you are validating Japanese ?   Like below...

                  if (!ereg("^(([A-Za-z0-9!#$%&'*+/=?^_`{|}~-][A-Za-z0-9!#$%&'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$", $local_array[$i])) {
                  return false;
                  }
            
I've dealt with Latin based languages and they cause minor rendering issues but japanese, chinese, arabic, etc - seems like trouble  ??

The marketing people are thinking that we can take the existing english site, do a "save as", drop in the pre-translated text and we're good. Are they right ?
Most Valuable Expert 2011
Top Expert 2016
Commented:
The marketing people are right if you've got consistent storage and rendering in the consistent character set, which is probably UTF-8.  But PHP has no native support for multi-byte strings.  And ereg() which is deprecated is especially susceptible to error.  So while they may be right, it is not up to them to implement the system, and the implementation is not simple or straightforward.

Read this article and then sit back and think about it for a while.  Then read it again.  There is not a short, easy answer, or I would have given it to you already.
http://www.joelonsoftware.com/articles/Unicode.html

Next, please read the PHP.net web site about multi-byte strings.
http://php.net/manual/en/book.mbstring.php

HTH, ~Ray
Software Developer
Commented:
UTF-8 really is your freind and you need to create both your html pages and emails in UTF-8, to not get wrong display. This is really the solution to 95% of your problems.

Other 5% are to be expected, if you process data based on keywords, which users enter or things like that, and of course there always is the chance for anything not thought of erroring.

But in general, you can
1. Set a mysql database to UTF-8 as it's charset/collation sequence, see for example:
http://mysql.rjweb.org/doc.php/charcoll
2. create your HTML pages with UTF-8, eg see EE page source code
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />

Output coming from UTF-8 encoded mysql database then will render correctly.
3. create mails in UTF-8

Eg taken from php.net manual of mail():
function mail_utf8($to, $from_user, $from_email, 
                                             $subject = '(No subject)', $message = '')
    { 
      $from_user = "=?UTF-8?B?".base64_encode($from_user)."?=";
       $subject = "=?UTF-8?B?".base64_encode($subject)."?=";
 
      $headers = "From: $from_user <$from_email>\r\n". 
               "MIME-Version: 1.0" . "\r\n" . 
               "Content-type: text/html; charset=UTF-8" . "\r\n"; 

     return mail($to, $subject, $message, $headers); 
   }
 

Open in new window


The only problematic steps you are confronted with initially is to convert all your html pages and mysql data to UTF-8, if they are not yet. Eg simply changing mysql charset/collation now after you stored in Latin1 or any ANSI codepage would not convert all your data.

But after you moved and transformed everything to UTF-8 you will have no problem. Even simple mail clients can process UTF-8 mails.

Don't forget to configure your web server to signal UTF-8 in the automatic standard http headers, eg look at
http://www.utf8.com/

In Apache server config or .htaccess, this will cause the HTTP header to be generated for text/html and text/plain content:
AddDefaultCharset UTF-8
http://www.w3.org/International/O-HTTP-charset.en.php
also explains that for Jigsaw and IIS:
In Internet Services Manager, right-click "Default Web Site" (or the site you want to configure) and go to "Properties" => "HTTP Headers" => "File Types..." => "New Type...". Put in the extension you want to map, separately for each extension; IIS users will probably want to map .htm, .html,... Then, for Content type, add "text/html;charset=utf-8" (without the quotes; substitute your desired charset for utf-8; do not leave any spaces anywhere because IIS ignores all text after spaces).
Or as alternative let every php script generate a http-header for the content-type to override the server default header. The reason is, it isn't sufficient your html output has a content-type meta tag with the correct encoding, the browser will expect what the http header announces and may go into quirks mode, if that doesn't match.

Bye, Olaf.

Author

Commented:
Thanks for all the input.  I've read it all and my head is spinning so I have concluded that this project is probably best accomplished by contracting out the development work.  What would you guesstimate as a cost range (just rough guesstimate) for hiring outside developer/s to complete a project like this ?  To recap the project:

1. existing 3 page English "microsite", with ajax contact form (6 fields) to be replicated in 20 languages. The form entry is captured in a mySql db and sends autoresponse HTML email (about 3 paragraphs of copy) to user .
3. all text translations  provided by foreign marketing offices at no cost (not my responsibility).
4. please see original post above for more details about site functionality.
???, ,  you ask for a ballpark "just rough guesstimate" for a price,  because of my inability to view or understand muti-byte glyph languages like korean, japanese, ectc. I refuse to do any multi-language translation stuff (although, as you seem to be doing, the PHP backend, stuff is something I might could do? maybe), ,  But you say - "replicated in 20 languages", this is very many, and will be extra effort. , , ,you may think about taking this as a "One Step at a Time" development, doing the european (latin based) launguages first, and when development on that is acceptable, then do the eastern (my term) sets like japanese next. .

Anyway, the price will depend on the people you get to do it and HOW MUCH TEXT you need done, with more charges for every added language.  If the reports I got for cost of the translation, and set up, for five languages was accurate, it was a little over $2000  US dollars, but I thought they were slow, but I have no real take on that, and costs are not my department at all.
But I can tell you that if you want a "bargain" and do not have rigorous requirements, most larger universities have student run web translation services (companies) that may be worth a look if you are near one.
Also this is NOT a rare thing to do anymore, so if you look around (web search) and acually contact them they will send you info and a quote for their costs, if you send them del specs for some of what you need..

Author

Commented:
slick, thanks for the input, but the language translations are not needed - that will be provided at NO cost.

Since hourly rates may vary - maybe a better way to put this question  - how about time instead of cost estimate - so how many development HOURS might it take (EXCLUDING language translations).

I'm just hoping for some rough guesstimates.  Ray, Olaf, Slick - pretend that one of you were taking on this project - what would you guesstimate for the hours it would take to complete, including debugging, etc ?

Author

Commented:
I am going to present the development time guesstimation as a new question.  You are invited to contribute - thanks!

Author

Commented:
Thanks - I will open new question related to this..

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial