?
Solved

How to save a web page with its url as file name

Posted on 2011-02-22
10
Medium Priority
?
269 Views
Last Modified: 2012-05-11
Hi,

I want to save a web page with its url being the file name. However, there are quite a lot of characters that can't be accepted by either Ubuntu or Windows file systems. Please see below:

http://www.bestbuy.com/site/HP+-+Laptop+/+AMD+Phenom%26%23153%3B+II+Processor+/+15.6%22+Display+/+3GB+Memory+/+320GB+Hard+Drive+-+Biscotti/1945374.p?id=1218301987141&skuId=1945374

I want to know how to convert such a url to an acceptable file name.

Thanks
0
Comment
Question by:wsyy
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 47

Assisted Solution

by:for_yan
for_yan earned 248 total points
ID: 34958500
0
 
LVL 40

Accepted Solution

by:
Gurvinder Pal Singh earned 252 total points
ID: 34958526
0
 

Author Comment

by:wsyy
ID: 34958570
for_yan,

what about the url contains chinese character, and how the encoding with "utf-8" will affect the result?

is the "utf-8" picked by randomly? or should i detect the encoding of the url first?
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
LVL 47

Expert Comment

by:for_yan
ID: 34958584
You don't need to use UTF:
This explanation is from the first link which gurvinder posted:


The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
The special characters ".", "-", "*", and "_" remain the same.
The space character " " is converted into a plus sign "+".
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
For example using UTF-8 as the encoding scheme the string "The string ü@foo-bar" would get converted to "The+string+%C3%BC%40foo-bar" because in UTF-8 the character ü is encoded as two bytes C3 (hex) and BC (hex), and the character @ is encoded as one byte 40 (hex).



0
 

Author Comment

by:wsyy
ID: 34958668
for_yan, thanks for more inputs.

if the url contains chinese words, i do want to keep the chinese words in the file name. do i need to use "gb2312" or "gb18030"? or I can just keep using "utf-8".

the reason I ask is that I don't know if the url (as an input from other application) is encoded in utf-8 or not.
0
 
LVL 47

Expert Comment

by:for_yan
ID: 34958698
I'm not sure, you can give it a try. Are  chinese charcaters OK to be in the file names?
0
 

Author Comment

by:wsyy
ID: 34962294
yes. it is ok to have chinese characters in file name.
0
 
LVL 47

Expert Comment

by:for_yan
ID: 34962321
Then just try both ways - I cannot try myself - I don't have chinese characters
0
 
LVL 20

Expert Comment

by:Sathish David Kumar N
ID: 34964899
use big5
0
 
LVL 20

Expert Comment

by:Sathish David Kumar N
ID: 34964909
Big5
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
Suggested Courses
Course of the Month5 days, 11 hours left to enroll

589 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question