• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 641
  • Last Modified:

Excel URLDownloadToFile Imports HTML as Scientific Notation Instead of Text

I have a VBA module that uses the URLDownloadtoFile function to grab some table data from a company website and paste it into excel.

The function works fine, but it converts some of my text to scientific. As I am using the function instead of copy and pasting the data, I cannot specify the column as text, etc.

Does anyone know of a potential solution?

(My workaround solution is to download the file to a word document and then copy the word table into excel. However, this is a less than optimal fix.)

Thanks!
0
ashleyna
Asked:
ashleyna
  • 3
  • 2
1 Solution
 
pcelbaCommented:
URLDownloadtoFile function takes the HTML page "as is" and saves it to a disk file. So depends on the HTML page encoding it can appear as "Scientific Notation" to you...

The page encoding can be UTF8, Windows CP 1250, 1252, etc. You have to look inside and decode it to your target encoding (CP 1252 does not need any decoding because it is English).

If you download this page the you may see following tag:
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
which says the page is encoded in UTF-8 which is quite readable but e.g. http://www.baiyujia.com/ having charset=gb2312 is almost unreadable...

MS Word can recognize the encoding so it interprets the page correctly.

Possible VBA implementation for CP conversion is described e.g. here: http://www.pcreview.co.uk/forums/convert-string-one-codepage-another-t3082087.html
or here: https://groups.google.com/forum/#!topic/vbahelp/TjfCsyb1P7s
0
 
ashleynaAuthor Commented:
The page downloads without tags when I use .xls extension. However, it converts data like 228E67 to scientific notation. The only output that preserves the text is when I export to a .doc file.

The HTML does not download with tags for any extension except .txt
0
 
pcelbaCommented:
URLDownloadtoFile does not recognize any file type. So if you set "XLS" as the output file extension it simply sets this extension without regard to the actual file contents... Setting the file extension cannot remove HTML tags from the output file.

So if you are saving the file as XLS you should be sure the file is really an Excel sheet.

OTOH Excel or Word is attempting to recognize what is the right file contents and opens the file the best way it can.

If you really want to see what is inside the downloaded file then change its extension to TXT and open it in Notepad. (Suppose extensions are displayed in your Operating system.)

228E67 can be any Unicode or UTF-8 character or just character from the upper half  of ASCII table etc.
0
 
ashleynaAuthor Commented:
If anyone else comes across this question, one way I was able to convert scientific values back into the original text with the following formula:

=LEFT(TEXT(C495,"#"),5)&"E"&TEXT(LEN(TEXT(C495,"#"))-5,"0##")

The formula converts value like: 4.5336E+113 to 45336E109, which was the original text value.
0
 
pcelbaCommented:
Thanks. I know better what was the problem now but I still don't know what was the source file format imported to Excel.

Excel automatically converts "number like" text values into numbers. To suppress this conversion you have to make changes in the input file which is not so easy in some cases...

This discussion can tell more: http://stackoverflow.com/questions/2420931/stop-excel-from-changing-cell-contents-ever

Another hint useful when importing text from CSV file is to format the text as formula:
="123456"
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now