Link to home
Start Free TrialLog in
Avatar of Wm Allen Smith
Wm Allen Smith

asked on

Clearing Non-printing Characters from SQL Server Data from Large File for Import

I have a 9GB csv file that was dumped from a SQL Server environment. While trying to import a small sample of it into an Oracle table, I noticed that in each field, the characters were separated by a non-printing character, a square I saw this only while using the SQL Developer Import wizard during the mapping phase.

Normally, with a smaller file, I would run it through Excel, using the CLEAN function to remove the non-printing characters. But I cannot open the main file in excel or notepad because it is about 50 million rows.

Is there a way to scrub this data either in SQL Server or Oracle to remove the non-printing characters so that it can be successfully imported into Oracle?

Thanks!
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

>> the characters were separated by a non-printing character, a square I saw this

That is called a glyph.  It is what is commonly used when the font/character set you are using cannot be properly displayed.

You probably don't need to scrub anything.  Find out the HEX value of the delimiting character and use SQL Loader to load the data.  I'm not a huge fan of GUIs but even SQL Developer might allow you to specify the delimiting character.

If the delimiter is a Hex 07 then the control file would contain:
fields terminated by X'07'

If you can post a few records and expected results we can probably post a working example.
Avatar of Wm Allen Smith

ASKER

HI and thank you for your response. I am working on getting  you a couple of the records. What I can show you is how the records look during the import process. See the attached file. I think this is the result of a space glyph Hex 0001. Do you concur?
>>See the attached file.

No attachment.

>>I think this is the result of a space glyph Hex 0001. Do you concur?

If the attachment is an image of a glyph, then it probably won't help.  I would need something showing me the hex value of the character.
Here is the attachment of  the screen image.
Mystery-characters.docx
Those images are too small for my aging eyes to read.

I could see enough to think I might have misunderstood the original question.

A had assumed the "fields" were separated not the characters within a field.

Anyway:  I don't use SQL Developer so I'm not sure exactly why it is showing the glyphs.  I would still like to see the raw data.  

It might be a character set issue.  It could still be a data issue in the way the CSV was created.  It might be the way the file was provided to you.
Try  using  Word's Zoom under the View Menu. And yes, the problem is within the field, not between the fields.
ASKER CERTIFIED SOLUTION
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ok great I wil try that
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Great insight, I learned something new as well.