Luiza1
asked on
How to replace %20 to "-" in Dreamweaver only in the internal links?
I have a problem in Dreamweaver with documents that have blank spaces. Dreamweaver puts %20 in the links to those documents and then it does not recognize those links anymore, it sees them as broken and files as orphaned. This is not good because as soon as you move the document, Dreamweaver does not updtae the link and then it really stops working on the page. What is the best solution for this? I would like to replace all the %20 in the internal links to "-". How can I do this at once without changing %20 in external links?
example of internal link to be modified:
<a href="_doc/_doc/grants/gra nts_after_ revis07/co nv_cadre_p artenariat _EN_no%20m odif%20LS_ TC%20versi on.DOC">[e n]</a>
example of internal link to be modified:
<a href="_doc/_doc/grants/gra
search and replace.
search and replace entire site. and when you come upon this change to what you want. you could use regular expressions like (.)%20(.) or just search for %20 should work also...
ASKER
Yes there is a find and replace option in Dreamweaver, I know that already. But we have a large website with over 4000 pages, and as I cannot do a replace all because the external links have to stay unchanged, this would mean that I have to go page by page, link by link. I am looking for a way to select only broken internal links and then do a replace all. Can anyone tell me how to do this?
example of external link NOT to be modified by the replace:
<A class=bleulien href="https://intracomm.cec.eu.int/budg/budgacc/en/accounting-modern/implementationDG/transition/guarantees%20Note%20for%20Balance%20Validation.doc"></A>
example of external link NOT to be modified by the replace:
<A class=bleulien href="https://intracomm.cec.eu.int/budg/budgacc/en/accounting-modern/implementationDG/transition/guarantees%20Note%20for%20Balance%20Validation.doc"></A>
yes, again, you could use the regular expression that only changes %20 when its found in a relative address as opposed to an address that starts with https: or if you're using absolute address for internal links, then search for a pattern that includes your sitename: i.e. https://mysite/(.)%20(.)
yes, again, you could use the regular expression that only changes %20 when its found in a relative address as opposed to an address that starts with http:: (if these signify external sites...) or if you're using absolute address for internal links as well, then search for a pattern that includes your sitename: i.e. https://mysite/(.)%20(.)
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
ok, but how does this regular expression that distinguishes only relative links look like then exactly?
if you don't know how to write regular expressions, I will help if you, but you must be able to identify a different pattern between internal and external links, otherwise, whatever method you choose will be page by page.
ASKER
Ok, I realise we must find a pattern. I heard of regular expressions before but I just don't see the pattern. Let's try with some examples perhaps.
External (not to be modified)
https://intracomm.cec.eu.int/budg/budgacc/en/accounting-modern/implementationDG/transition/guarantees%20Note%20for%20Balance%20Validation.doc
https://intracomm.cec.eu.int/budg/budgacc/fr/comptabilite/manual%20comptable/version%20html/Fiches%20du%20manuel/sommaire-immob-incorp.htm
Faulty links (%20 not needed at all, to be deleted)
../../leg/ir/leg-030-25_ir 2003_en.ht ml#93%20%2 0
Internal links (%20 to be modified in "-")
_doc/_pdf/D%2051%20du%2012 012jan07_c ontroles_c omptables. pdf
_doc/_doc/closure2007/DG%2 0general%2 0closure%2 0instructi ons%202007 .doc
_doc/2006/lignes%20directr ices_de.do c
the _doc folders exist in many subfolders across the entire site. Normally we have a rule to keep all documents of each section in its own _doc subfolder, but sometimes people forget to do that.
Is this information clear enough, can you create a regular expression from this?
External (not to be modified)
https://intracomm.cec.eu.int/budg/budgacc/en/accounting-modern/implementationDG/transition/guarantees%20Note%20for%20Balance%20Validation.doc
https://intracomm.cec.eu.int/budg/budgacc/fr/comptabilite/manual%20comptable/version%20html/Fiches%20du%20manuel/sommaire-immob-incorp.htm
Faulty links (%20 not needed at all, to be deleted)
../../leg/ir/leg-030-25_ir
Internal links (%20 to be modified in "-")
_doc/_pdf/D%2051%20du%2012
_doc/_doc/closure2007/DG%2
_doc/2006/lignes%20directr
the _doc folders exist in many subfolders across the entire site. Normally we have a rule to keep all documents of each section in its own _doc subfolder, but sometimes people forget to do that.
Is this information clear enough, can you create a regular expression from this?
well you may have to do two search and replaces...
one with a patter that starts like
[.][.]/(.)%20(.)
_doc(.)%20(.)
one with a patter that starts like
[.][.]/(.)%20(.)
_doc(.)%20(.)
if i'm correct, (.) searches for any character...may need a + behind it
can you test for one of your internal links using that?
can you test for one of your internal links using that?
ASKER
Ok, sorry but I got no results. I tried the first one: [.][.]/(.)%20(.)
and I did not get any results. I used the page with a lot of faulty links:
../../leg/ir/leg-030-13_ir 2003_fr.ht ml#43%20%2 0
Then I tried _doc(.)%20(.) and got no results either.
The problem with (.) is that it stands for any character but only specifically one char and not more. While we need something for any character with a varying number of characters it stands for. Do you know if something like that exists in regular expression?
and I did not get any results. I used the page with a lot of faulty links:
../../leg/ir/leg-030-13_ir
Then I tried _doc(.)%20(.) and got no results either.
The problem with (.) is that it stands for any character but only specifically one char and not more. While we need something for any character with a varying number of characters it stands for. Do you know if something like that exists in regular expression?
so as my earlier post stated, you may need a + behind the (.)...
[.][.]/(.)+%20(.)+
[.][.]/(.)+%20(.)+
hopefully we get something this time
ASKER
Ok, with + it finds them all.
_doc(.)+%20 is working also good
So we are getting the right results, but now how do we do a replace all since the whole link is now selected by using this search starting from _doc till %20 and we only need to replace %20 part?
_doc(.)+%20 is working also good
So we are getting the right results, but now how do we do a replace all since the whole link is now selected by using this search starting from _doc till %20 and we only need to replace %20 part?
good question...lol...well now we have to use the regular expression substitution method...this could be tricky...
ASKER
Yes I agree, this whole operation is tricky. Couldn't Dreamweaver provide a simpler way? I know you have a link checker option, if there was a way to select there only internal broken links and then do a replace all, that would ideal. Do you know if something like this could be possible?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Ok, that's very good. I tried it and it works. There is only still a little problem. When there is a link with multiple %20, only the last one is being replaced and then rest stays unmodified.
Example:
_doc/_doc/grants/grants_af ter_revis0 7/conv_sub _fonct_EN_ with%20mod if%20LS_TC %20version .DOC
will become:
_doc/_doc/grants/grants_af ter_revis0 7/conv_sub _fonct_EN_ with%20mod if%20LS_TC -version.D OC
This has to do with the search (_doc(.)+)(%20) that selects the whole link until the last %20
Any ideas how to solve this so that all %20 are immediately replaced?
Example:
_doc/_doc/grants/grants_af
will become:
_doc/_doc/grants/grants_af
This has to do with the search (_doc(.)+)(%20) that selects the whole link until the last %20
Any ideas how to solve this so that all %20 are immediately replaced?
ok...i didn't take into account mutiple %20 again...back to the drawing board...
if i'm not mistaken, try this on one line
[(_doc(.)+)(%20)]*
and see if it picks up anything in the search
[(_doc(.)+)(%20)]*
and see if it picks up anything in the search
if it does, try a replace on one page with the $1TheTextWeWant <---obviously you change that to what you were using... and see what it does
[(_doc(.)+)(%20)]+ use plus instead of *
ASKER
No, I get no results with that one.
ASKER
No, with this one [(_doc(.)+)(%20)]+ I get all the characters on the page selected that include d, o, c...
uhhhmmmm....I'll have to review my regular expression...unfortunately i don't have dreamweaver to test the RE, but i shall return with a solution.
ASKER
Ok, thank you for your help so far. I shall check back tomorrow, hopefully to find a solution to this tricky problem.
gotcha...
I have an idea that might work, but first can you confirm the following?
All external links are http links. (Do you have any ftp: mail: etc...)
I need to craft it first, but the idea would be to build a command file that loops through all link nodes, checks to see if thay are http, if they are ignore them, if not replace all %20 with one space.
All external links are http links. (Do you have any ftp: mail: etc...)
I need to craft it first, but the idea would be to build a command file that loops through all link nodes, checks to see if thay are http, if they are ignore them, if not replace all %20 with one space.
ASKER
Ok, but not all external links are http links. There are some that just link to another intranet site within our company and then they just use that site's folder names. Here is with that our external links start with:
http:
https:
mailto:
/home/
/../home/
/sg_vista/
/security/
Does this help or is it too much to use it in the find command?
http:
https:
mailto:
/home/
/../home/
/sg_vista/
/security/
Does this help or is it too much to use it in the find command?
I just got in to work...I'm back on the case...i can finish reg exp this morning...
I feared that might be the case Luiza. My idea won't work then. I bet you Silemone will come through with a home run though! Good luck.
trying...
ASKER
Hi, how are you doing? Could you make any progress with the solution so far to be able to delete mutliple %20 at once?
I also have a small question concerning the current regular expression: (_doc(.)+)(%20)
It works fine as long as the different links are separated by a space or enter. But as soon as there are 2 links right next to each other, they will be both selected together by the find, example:
<p> [<a href="_doc/_doc/grants/gra nts_after_ revis07/sp ecificacti ons.DOC">e n</a>]</p> <p><a href="_doc/_doc/grants/gra nts_after_ revis07/se cond%20gra nt%20payme nt.DOC">[e n]</a></p>
I don't know if that can cause problems, so I'm just letting you know. What do you think?
I also have a small question concerning the current regular expression: (_doc(.)+)(%20)
It works fine as long as the different links are separated by a space or enter. But as soon as there are 2 links right next to each other, they will be both selected together by the find, example:
<p> [<a href="_doc/_doc/grants/gra
I don't know if that can cause problems, so I'm just letting you know. What do you think?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Yes I tried that on a few pages and it seemed to work fine, but I did have to run it 5 times in total. There are a lot of pages in total, so it will take a lot longer if I have to do it 5 times. Are you sure there isn't a better way to replace them all at once, like maybe by using a windows script? That's how I managed to remove all the spaces from the documents names. Do you think using Dreamweaver is the best option here?
ASKER
thank you very much for the solution, it's just a shame we could not find a way to immediately replace all the %20 at once.
Agreed...I'm sorry I couldn't find more time to complete. Even my friend who's a regular expression guru was having problems with it. I apologize for not being able to complete the task completely.
And, yes using a window's script or programming language would have been awesome. I just thought we were restricted to the tools of dreamweaver. That would have been ten times easier. I didn't know you were a coder. But at least you did learn how to use regular expressions in Dreamweaver. Again, my apologies.
ASKER
Sure no problem. That's a great idea, it's still not too late. I have to perform this at the end of this week so there is still time to find a better solution. I will launch a new question asking for a windows script solution. Thanks.