PHP Encoding problem (How to do like google !)
Posted on 2006-05-22
I am facing a problem with character encoding in php , i will describe my problem by example :
I have 3 files :
file1.html : it's charset=WINDOWS-1256
file2.html : it's charset=UTF-8
file3.html : it's charset=ISO-8859-1
i use a php script to spider these files and index it (complete search engine spider,indexer,interface)
when the script finish spidering and indexing these files , i use the interface to search for specific keyword.
if i put the charset of the search page to WINDOWS-1256 only file1.html results will be found , and if i put charset of search page to UTF-8 only file2.html will be found ... etc .
Is there anyway to unified the charset of these pages (exactly like what google do , treat every thing as UTF-8 even if the page is not utf-8)
this is not a problem when the keyword i search for is english word , but if it is in another language , the problem appeared !