Solved

How to save a web page with its url as file name

Posted on 2011-02-22
10
252 Views
Last Modified: 2012-05-11
Hi,

I want to save a web page with its url being the file name. However, there are quite a lot of characters that can't be accepted by either Ubuntu or Windows file systems. Please see below:

http://www.bestbuy.com/site/HP+-+Laptop+/+AMD+Phenom%26%23153%3B+II+Processor+/+15.6%22+Display+/+3GB+Memory+/+320GB+Hard+Drive+-+Biscotti/1945374.p?id=1218301987141&skuId=1945374

I want to know how to convert such a url to an acceptable file name.

Thanks
0
Comment
Question by:wsyy
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 47

Assisted Solution

by:for_yan
for_yan earned 62 total points
ID: 34958500
0
 
LVL 40

Accepted Solution

by:
gurvinder372 earned 63 total points
ID: 34958526
0
 

Author Comment

by:wsyy
ID: 34958570
for_yan,

what about the url contains chinese character, and how the encoding with "utf-8" will affect the result?

is the "utf-8" picked by randomly? or should i detect the encoding of the url first?
0
PeopleSoft Has Never Been Easier

PeopleSoft Adoption Made Smooth & Simple!

On-The-Job Training Is made Intuitive & Easy With WalkMe's On-Screen Guidance Tool.  Claim Your Free WalkMe Account Now

 
LVL 47

Expert Comment

by:for_yan
ID: 34958584
You don't need to use UTF:
This explanation is from the first link which gurvinder posted:


The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
The special characters ".", "-", "*", and "_" remain the same.
The space character " " is converted into a plus sign "+".
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
For example using UTF-8 as the encoding scheme the string "The string ü@foo-bar" would get converted to "The+string+%C3%BC%40foo-bar" because in UTF-8 the character ü is encoded as two bytes C3 (hex) and BC (hex), and the character @ is encoded as one byte 40 (hex).



0
 

Author Comment

by:wsyy
ID: 34958668
for_yan, thanks for more inputs.

if the url contains chinese words, i do want to keep the chinese words in the file name. do i need to use "gb2312" or "gb18030"? or I can just keep using "utf-8".

the reason I ask is that I don't know if the url (as an input from other application) is encoded in utf-8 or not.
0
 
LVL 47

Expert Comment

by:for_yan
ID: 34958698
I'm not sure, you can give it a try. Are  chinese charcaters OK to be in the file names?
0
 

Author Comment

by:wsyy
ID: 34962294
yes. it is ok to have chinese characters in file name.
0
 
LVL 47

Expert Comment

by:for_yan
ID: 34962321
Then just try both ways - I cannot try myself - I don't have chinese characters
0
 
LVL 20

Expert Comment

by:Sathish David Kumar N
ID: 34964899
use big5
0
 
LVL 20

Expert Comment

by:Sathish David Kumar N
ID: 34964909
Big5
0

Featured Post

[Live Webinar] The Cloud Skills Gap

As Cloud technologies come of age, business leaders grapple with the impact it has on their team's skills and the gap associated with the use of a cloud platform.

Join experts from 451 Research and Concerto Cloud Services on July 27th where we will examine fact and fiction.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
Suggested Courses

615 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question