Dear Experts,

I have the problem that I need a simple script way to convert html files that are produced as UTF-8 files to ANSI ASCII files for one old target system.

I have found so far the JDK native2ascii.exe tool and powershell out-file -encoding ascii

The jdk is to complex and the powershell produces ??? for äöü

What can I do?

ZvonkoSystems architectAsked:
I have tended to write functions that parse the html file after I create it, replacing the specific non-ASCII characters with the appropriate html markup.  Reference this site for the HTML equivalents:

Here is a function I used to do this.  In this case, I used the full file path as the $report variable.  Simply add the UTF-8/HTML key/value pair in the $replacementHashTable object to provide a list for sorting.

	function Format-Report {
		# Create hash table for replacing characters
		$replacementHashTable = @{
			"á" = "á";
			"à" = "à";
			"â" = "â";
                        "ä" - "ä";
			"é" = "é";
			"è" = "è"; 
			"ê" = "ê";
                        "ö" = "ö";
                        "ü" = "ü";
		# Test for file
		if([System.IO.File]::Exists($report)) {
			foreach($character in $replacementHashTable.GetEnumerator()) {
				Get-Content $report | Where-Object { $_ -match $character.Key } | ForEach-Object { $_ -replace $character.Key, $character.Value };
		} else {
			Write-Error "$report was not found.  Cannot replace non-ASCII characters.";

ZvonkoSystems architectAuthor Commented:
Thank you very much Expert wls3 for your proposal.
You are right and there is realy no other way to convert UTF-8 to ASCII then to do the HTML Entity conversion.

But because my Document Management System handles many different languages I will never get the complette National Characters set of chars.

Therefore I have written this toEntity() conversion script:
$asciiDir = [System.IO.DirectoryInfo] '.\ascii'
if (!$asciiDir.Exists){ $asciiDir.Create() }

$toEntity = [System.Text.RegularExpressions.MatchEvaluator]{

foreach ( $file in (get-childitem .\| where {$_.extension -eq ".htm"} ) ){ 
  get-content $file -encoding UTF8 | foreach {[Regex]::Replace($_ , "[^\x00-\x7f]", $toEntity )} | out-file ascii\$file -encoding ascii 

Please have a look at it and give me a feedback what can be improved.

Thanks in advance,
ZvonkoSystems architectAuthor Commented:
No comments where posted to my last request for feedback.
Therefore I will close this question by accepting wls3 approach as solution.
ZvonkoSystems architectAuthor Commented:
Thank you for your help.
