Link to home
Start Free TrialLog in
Avatar of Mani Pazhana
Mani PazhanaFlag for United States of America

asked on

SQL DTS package ---XML file generation

Hello Experts,

I am generating an xml file from my DTS package. The xml file is of UTF-8 encoding format.

The end users are complaining the file as Hexa value characters at the very begining of the xml file, which i dont see from IE.

Here is my DTS active x vbscript:

----------------------------------------------------
'**********************************************************************
'  Visual Basic ActiveX Script
'************************************************************************
 

Option Explicit

 

Function Main()

      Const FileName = "C:\IESFileDataConversionXML\ProcessInputFile\GenerateXMLFile.xml"
      Dim cn, cmd, stmOut, stmIn
      Set cn = CreateObject("ADODB.Connection")

With cn
          .Open = "Provider=SQLOLEDB.1;Data Source=INFOTECHD99\MANI;Initial Catalog=IESData;user id =sa;password=test"
 End With

      Set stmOut = CreateObject("ADODB.Stream")

With stmOut
          .Open
          .Type = 2               ' adTypeText
      .Charset = "UTF-8"
          .WriteText "<?xml version=""1.0"" encoding=""UTF-8""?>"
          .WriteText "<IPA_ZFI_COPA_FB50>"

End With

      Set cmd = CreateObject("ADODB.Command")
With cmd

          Set .ActiveConnection = cn
          .CommandType = 4            ' adCmdStoredProc
          .CommandText = "sp_GetXmlHeaderForDTS"
          Set stmIn = CreateObject("ADODB.Stream")

   With stmIn

              .Open
              .Type = 2               ' adTypeText

   End With

          .Properties("Output Stream").Value = stmIn
          .Execute , , 1024               ' adExecuteStream
      stmOut.WriteText stmIn.ReadText
With stmOut               
      .WriteText "<Body>"
End With


          .CommandText = "sp_GetXmlBodyForDTS"
          .Execute , , 1024               ' adExecuteStream
          stmOut.WriteText stmIn.ReadText
          stmIn.Close
          Set stmIn = Nothing

End With

      Set cmd = Nothing

With stmOut

               .WriteText "</Body>"
              .WriteText "</IPA_ZFI_COPA_FB50>"
              .SaveToFile FileName, 2         ' adSaveCreateOverWrite
          .Close
End With

Set stmOut = Nothing
cn.Close
Set cn = Nothing

Main = DTSTaskExecResult_Success

End Function

--------------------------------------------------------------------------

any idea what is wrong with my code above, or to fix this issue.


Avatar of sankarbha
sankarbha

just try viewing the generated xml file using notepad.  If you do find any hexa then post few lines here and that would be really help for us.
Avatar of Mani Pazhana

ASKER

I did not see these Hex values(EFBBBF) when i open in IE or when i open in Notepad or XML notepad.

When the users process the file with MQ series tool, they are complaning about this issue.

Here is my XML output sample:

-------------------------------------------


<?xml version="1.0" encoding="UTF-8"?>
<IPA_ZFI_COPA_FB50>
<Header>
<InterfaceID>INTF_0439B</InterfaceID>
<InterfaceType>REQUEST-850</InterfaceType>
<SourceSystem>
<Name>OA-ITM</Name>
<DocumentID>20060915094046</DocumentID>
<Date>2006-09-15</Date>
<Time>09:40:46</Time>
</SourceSystem>
<TargetSystem>
<Name>SAP</Name>
<Date>2006-09-15</Date>
<Time>09:40:46</Time>
</TargetSystem>
</Header>
</IPA_ZFI_COPA_FB50>

-------------------------------------------------

It is a problem when they process the file with MQ series tool.

Any idea?

Thanks
I dont know abt MQ series, lets c what Experts say
Silly check -- make sure they're looking for CRLFs and not just CRs or LFs... if the latter, the other will show up as a hex character on every line. If at the very beginning of the file, it could be a FF, or just a bunch of returns before the first line.
They are seeing like this in the file when they open through MQ sereis tool.:

EFBBBF<?xml version="1.0" encoding="UTF-8"?>


I did not see anything like this Notpad or IE.

--------------------------------------------------------------------------------

Also i tried to validate the XML file using this tool:
http://validator.w3.org/

I got this error below:

------------------------------------

Result: Failed validation  
File: C:\IESFileDataConversionXML\SendOutputFile\INTF_0439B_20060918_113751_00013.xml
Encoding: us-ascii
Doctype:  

Sorry, I am unable to validate this document because on line 1 it contained one or more bytes that I cannot interpret as us-ascii (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
-----------------------------------


any idea?

Thanks


That jogged something in my memory... the EFBBBF is the byte order marker for UTF-8 encoding -- basically it tells whatever application is reading the file that the file is using the UTF-8 character set, which is why it doesn't show up in notepad, IE, etc. because it's suppressed as informational.

Unfortunately I was never able to figure out how to prevent the ADODB.stream object from writing this out, so when I needed a "clean" file I just opened the file a second time and dropped the first three characters (or two characters, depending on your character set).
Can you post some sample code?

I am using VBScript.

thanks.
All i need is to remove the Hex values from the file programatically using VBScript:


Before Removing:
EFBBBF<?xml version="1.0" encoding="UTF-8"?>

After removing:
<?xml version="1.0" encoding="UTF-8"?>


sample code please.


Thanks


SOLUTION
Avatar of netfootprint
netfootprint

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Anthony Perkins
For the record, you are talking about the UNICODE UTF-16 BOM (Byte Order Mark).  This is the way that Xml processors can recognize the encoding when there is no encoding declaration and is fully covered in the Xml specification.

For example, if it starts with 0xFF 0xFE or 0xFE 0xFF then it is considered UTF-16 encoding.

Also, UTF-8 does not require a BOM.
netfootprint,

>>basically it tells whatever application is reading the file that the file is using the UTF-8 character set, which is why it doesn't show up in notepad, IE, etc. because it's suppressed as informational. <<
Actually,  somehow I don't believe that Notepad would qualify as being Xml aware:)  I suspect that it is there in Notepad, just not displayed.  It can be easily seen in any Hex editor.
Interesting... so notepad will suppress some unprintable characters, while others it will display the "unprintable character" character. I learn something new every day on this site!
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Works great. Thanks a lot for your help.
To netfootprint,
I used the VBscript to remove the Hex value data, and the users can now process the xml file with there MQseries tool.

To acperkins,
Thanks for your clarification.

Thanks a lot for your help guys.