Long question, but that's the deal.
I'm sorting EML files. I have a program that scans directories to pull out EMLs, and another that sorts them back into a proper structure. (One I want) That's working fairly well. Or, it was.
Until I got a ton of eml files from a user, with names created from subject lines of the email. This would be fine, if he apparently didn't have a ton of spam with special characters. (No idea what it was, may Cyrillic, Chinese, Japanese. Not sure, doesn't matter) This made Ruby very unhappy.
I'm using Ruby 1.9+, so it will read the file just fine. The problem seems to be that Ruby naturally takes these files and displays them differently. I believe it's converting the original name, which was Windows-1252 into UTF-8. Which would be fine, but when I go to rename the file, Ruby can't find the file, because the conversion changed the filename.
The analog would be you telling me I just got a new employee named Jorge. I decide the employee is actually named 'George'. But when I go to work with 'George', there isn't anyone there named George, but Jorge still exists. The file is there, Ruby just changed the working name.
My problem is that I can't figure out how to Ruby to cut it out. There are several tips on how to encode the data within
a file, but little about the file name, namely because I can't reference the file to modify it. And I cannot encode the filename variable to Windows-1252, because it throws an exception regarding characters in UTF-8 that will not convert to Windows-1252. (Which makes me question how it went from Windows-1252 to UTF-8 in the first place; perhaps it's a one-way conversion with substituted letters.)
I can't find a way around this, short of moving the EML files (many gigs) over to a Linux based machine, running the script there, and then bringing them over to the Windows machine for final sorting. I'm hoping that will work, but I'm questioning if there is a better way around this.
FileUtils.move pwd+"/"+f1, $scriptDir+"/"+f1