Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 754
  • Last Modified:

fopen problem

Hi, I am migrating solution from vs2005 to vs2003...pls don't ask why :)
in there, there is function:
wstring getXml(const wstring& str)
{
       FILE * pf = _wfopen(str.c_str(), L"rt, ccs=UTF-8");
       wstring ws;
       wchar_t wsz[8192];  // big enough for very long lines
         if (pf==NULL) return ws;
       while (fgetws(wsz, sizeof(wsz), pf) != NULL)
       {
               ws += wsz;
       }
        fclose(pf);
        return ws;
}

in VS2003 UTF-8 is not supported with wfopen(as JKR pointed out) :(, so when reading any xml i get garbage for the unicode characters.
can anyone please help me to find replacement of this function? I will be greatly grateful

http://msdn2.microsoft.com/en-us/library/yeby3zcb(VS.80).aspx
http://msdn2.microsoft.com/en-us/library/yeby3zcb(vs.71).aspx
0
Dimkov
Asked:
Dimkov
  • 13
  • 11
1 Solution
 
AxterCommented:
>>please help me to find replacement of this function? I will be greatly grateful

Either convert the file name to UTF-8 (char* string)
or convert it to UNICODE (wchar_t* string)
0
 
AxterCommented:
I am actually surprise that vs2005 supports UTF-8 on a wchar_t* string.

That really doesn't make sense.  The main point of using UTF-8, is that you can use less bytes then UNICODE 2/4 byte wstring, and still be able to reference local characters.

UTF-8 is normally used with char* type API functions, and not wchar_t* types.
0
 
AxterCommented:
      FILE * pf = _fopen((const char*)str.c_str(), "rt");  //If you're sure this is a true UTF-8 string, then cast it to (const char*)
       wstring ws;
       wchar_t wsz[8192];  // big enough for very long lines
         if (pf==NULL) return ws;
       while (fgetws(wsz, sizeof(wsz), pf) != NULL)
       {
               ws += wsz;
       }
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
DimkovAuthor Commented:
Axter, thanks for the input
FILE * pf = _fopen((const char*)str.c_str(), "rt");  //If you're sure this is a true UTF-8 string, then cast it to (const char*)
This is not needed since it just represents the name of the file to be opened. The problem is extracting the characters from  the file.
0
 
AxterCommented:
>>This is not needed since it just represents the name of the file to be opened. The problem is extracting the
>>characters from  the file.

Then extract the characters using char* API, and not wchar_t type.

       char sz[8192] = {0};  // big enough for very long lines
         if (pf==NULL) return ws;
       while (fgets(sz, sizeof(wsz), pf) != NULL)
       {
               //Convert sz to a wide character string using mbtowc API
               ws += wsz;
       }
0
 
DimkovAuthor Commented:
does mbtowcs support UTF-8?
0
 
AxterCommented:
      char sz[8192] = {0};  // big enough for very long lines
         if (pf==NULL) return ws;
       while (fgets(sz, sizeof(sz), pf) != NULL)
       {
               //Convert sz to a wide character string using mbtowc API
               wchar_t wsz[sizeof(sz)] = {0};
               mbtowc(wsz,sz,strlen(sz));
               ws += wsz;
       }
0
 
AxterCommented:
>>does mbtowc support UTF-8?

Yes.  mb stands for multi-byte.
0
 
AxterCommented:
FYI:
There's also an opposite function, called wctomb, which converts it from UNICODE to UTF-8.
0
 
DimkovAuthor Commented:
nope, it doesn't work.
but i found
                  wchar_t wsz[sizeof(sz)] = {0};
                  MultiByteToWideChar(CP_UTF8,0,sz,strlen(sz+1),wsz,sizeof(wsz));
                  ws += wsz;
which works fine.
Pls just give me a line how to remove the BOM flag, since it is there in the file, and destroys the XML structure
0
 
DimkovAuthor Commented:
nooo even MultiByteToWideChar is not working :(
it does not give the value correct
now, i am stuck :(
0
 
AxterCommented:
>>nooo even MultiByteToWideChar is not working :(

Can you give us a small UTF-8 string that is in your file, and that is not being read correctly?

0
 
DimkovAuthor Commented:
it is not a problem, but the text area here does not support it: I can send it to you by mail or put it on rapidshare
0
 
AxterCommented:
Can you point to the specific string in this file that is causing the problem?

I might have to finish trying this out later tonight, because I have to go off line in a few minutes.
0
 
DimkovAuthor Commented:
in
mes:Name>Ministrstvo za javno upravo</mes:Name>
  <mes:Street>Tr~aaka cesta 21</mes:Street>
  <mes:PostalCode>1000</mes:PostalCode>

in Tr~aaka there are 2 of these characters

I will be here all night.. I have to finish the project by tomorrow :)
0
 
DimkovAuthor Commented:
Axter, by using:
wstring getXml (wstring str)
{
      FILE * pFile;
      long lSize;
      char * buffer1;
      size_t result;
      wstring resultingString;

      pFile = _wfopen ( str.c_str() , L"rb" );


      if (!pFile) return resultingString;
      // obtain file size:
      fseek (pFile , 0 , SEEK_END);
      lSize = ftell (pFile);
      rewind (pFile);
      char bom[3];
      fread(bom, 1, 3, pFile);
      bool dali=true;
      if (bom[0]!=-17 && bom[1]!=-69 && bom[2]!=-65)
      {
            dali=false;
            rewind(pFile);
      }

      // allocate memory to contain the whole file:
      buffer1 = (char*) malloc (sizeof(char)*lSize);
      // copy the file into the buffer:
      result = fread (buffer1,1,lSize,pFile);
      if (!dali)
      {
            buffer1[lSize]='\0';
      }
      else
      {
            buffer1[lSize-3]='\0';
      }

      fclose (pFile);
      wchar_t *dest=(wchar_t*) malloc(lSize*sizeof(wchar_t));
    MultiByteToWideChar(CP_UTF8, 0, buffer1, -1, dest, lSize);
    resultingString=dest;
      return resultingString;
}
I managed to get a valid string in buffer1. But when i send it to MultiByteToWideChar the result I get is wrong...
i don't think this can be solved :(
0
 
AxterCommented:
http://rapidshare.com/files/60532191/eVrocanje-bianco.xml.html

Is the above link suppose to have the string you posted?
<mes:Street>Tr~aaka cesta 21</mes:Street>

I don't see anything that looks like the above string in the link.
0
 
DimkovAuthor Commented:
it is there, line 19

<mes:PhysicalAddress>
  <mes:Name>Ministrstvo za javno upravo</mes:Name>
  <mes:Street>Tr~aaka cesta 21</mes:Street>
  <mes:PostalCode>1000</mes:PostalCode>
  <mes:City>Ljubljana</mes:City>
  </mes:PhysicalAddress>

ne mes:Street node. Unfortunately, when posted, www.experts-exchange.com also distords the text
0
 
DimkovAuthor Commented:
Axter, i sloved the problem by adding another function:

BSTR UTF8toUTF16(const char * pSrc, int cbSrc = -1)
{
  BSTR ret = NULL;

  DWORD cwch;

  if (cbSrc < 0)
    cbSrc = strlen(pSrc);

  // Get output size
  if (cwch = MultiByteToWideChar(CP_UTF8, 0, pSrc, cbSrc + 1, NULL, 0))
  {
    //cwch--;
    ret = SysAllocStringLen(NULL, cwch);

    if(ret)
    {
      // Convert from UTF8 into WideString
      if(!MultiByteToWideChar(CP_UTF8, 0, pSrc, cbSrc + 1, ret, cwch))
      {
        SysFreeString(ret);//must clean up
        ret = NULL;
      }
    }
  }

  return ret;
}
which converts char * to bstr. Afterwards the casting to wstring is no problem
I will ask this question to be closed
0
 
AxterCommented:
>>I will ask this question to be closed

Since this function is taking a char* type, I'm assuming you took my original advice, which is to read it into char* buffer.
And this function is also converting it from char* to wide string, which is also part of my original advice.

Considering your solution incorporates suggestions given, IMHO, an answer should be accepted.
0
 
AxterCommented:
I just ran a test, and your modified method does not yield different results, from the original method I proposed.

Here's example code:

BSTR UTF8toUTF16(const char * pSrc, int cbSrc = -1)
{
      BSTR ret = NULL;

      DWORD cwch;

      if (cbSrc < 0)
            cbSrc = strlen(pSrc);

      // Get output size
      if (cwch = MultiByteToWideChar(CP_UTF8, 0, pSrc, cbSrc + 1, NULL, 0))
      {
            //cwch--;
            ret = SysAllocStringLen(NULL, cwch);

            if(ret)
            {
                  // Convert from UTF8 into WideString
                  if(!MultiByteToWideChar(CP_UTF8, 0, pSrc, cbSrc + 1, ret, cwch))
                  {
                        SysFreeString(ret);//must clean up
                        ret = NULL;
                  }
            }
      }

      return ret;
}

BSTR ModifiedProposedMethod(const wstring& str)
{
      FILE * pf = _wfopen(str.c_str(), L"rt");
      char sz[32000] = {0};  // big enough for very long lines
      if (pf==NULL) return NULL;
      size_t QtyRead = 0;
      if ( ( QtyRead = fread(sz, 1, sizeof(sz), pf)) > 0)
      {
            fclose(pf);
            return UTF8toUTF16(sz, QtyRead);
      }
      fclose(pf);
      return NULL;
}


wstring OriginalProposedMethod(const wstring& str)
{
      FILE * pf = _wfopen(str.c_str(), L"rt");
      wstring ws;
      char sz[8192] = {0};  // big enough for very long lines
      if (pf==NULL) return ws;
      size_t QtyRead = 0;
      while ( ( QtyRead = fread(sz, 1, sizeof(sz), pf)) > 0)
      {
            wchar_t wsz[sizeof(sz)] = {0};
            int t1 = MultiByteToWideChar(CP_UTF8, 0, sz, QtyRead, wsz, sizeof(sz));
            size_t t2 = wcslen(wsz);
            ws += wsz;
      }
      fclose(pf);
      return ws;
}

int main()
{

      BSTR s1            = ModifiedProposedMethod(L"C:\\TMP\\TeX.txt");
      wstring s2x      = OriginalProposedMethod(L"C:\\TMP\\TeX.txt");
      const wchar_t *s2 = s2x.c_str();


      wcout << s1 << endl;
      wcout << s2 << endl;

      wcout <<  endl;
      system("pause");
      return 0;
}
0
 
DimkovAuthor Commented:
are you testing in VS 2003 or vs2005?
0
 
AxterCommented:
>>are you testing in VS 2003 or vs2005?

I'm testing this on VS 2003
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 13
  • 11
Tackle projects and never again get stuck behind a technical roadblock.
Join Now