We help IT Professionals succeed at work.

Extract data from URL

fredericgilbert
on
Hi,

Objective: extract data stored in an URL

a)URL: http://www.site.com/folder/file?variableA=A1A2A3B9B7B6C4C5C6&variableB=Z1Z2Z3X9X7X6Y4Y5Y6&startDate=2011-11-01&endDate=2011-11-30

Data to extract from:
- name after the "?"  
- variableA
- variableB
- startDate
- endDate

Values to return:
- file
- A1A2A3B9B7B6C4C5C6
- Z1Z2Z3X9X7X6Y4Y5Y6
- 2011-11-01
- 2011-11-30

Note: the values except the dates can be of different length

Thanks very much in advance for the time you take to review this question

fg
  20111206-EE-extractFromURL.xlsx
Comment
Watch Question

Most Valuable Expert 2013

Commented:
Hello fredericgilbert, see attached formula solution with this formula in B4

=TRIM(RIGHT(SUBSTITUTE(LEFT(A1,FIND("?",A1)-1),"/",REPT(" ",99)),99))

the following formula in B5 copied to B6

=REPLACE(LEFT(A$1,FIND("&",A$1,FIND(A5,A$1))-1),1,FIND(A5,A$1)+LEN(A5),"")

and this formula in B7 copied to B8

=MID(A$1,FIND(A7,A$1)+LEN(A7)+1,10)

Note: I removed the ":" from A7 and A8.....

regards, barry
27481294.xlsx
Top Expert 2010
Commented:
Another approach, assuming the first URL is in a1.

1) Add the regexpfindsubmatch function from my article http://www.experts-exchange.com/A_1336.html

2) Use a formula like this:

=regexpfindsubmatch(a1,"(variableA=)([^&]+)",1,2)

That approach works for variableA, variableB, startDate, and endDate. For "name before?":

=regexpfindsubmatch(a1,"/([^/?]+)\?",1,1)
Hi

Another UDF..

Function XTRACT(r As Range, Pos As Long) As String
    Dim x
    x = Split(r.Value, "=")
    XTRACT = Split(x(Pos), "&")(0)
End Function

In B2 and copied across,

=XTRACT($A2,COLUMNS($B2:B2))

Kris

Author

Commented:
Hi all,

first, thanks to all for your time.
I have tested your solutions.

Here are my takes:

To barryhoudini: your solution works OK but requires too much steps to fit into a production environment.  I need simplicity when getting things done. Don't get me wrong, you are an expert and I am not. Thanks for your contribution.

To krishnarkc: the ultimately simple solution. However, in this case, simple means less precise.
If I need to extract only one specific element from a very long url, your function requires to extract everything at first.
Nevertheless, I will keep this one handy as it is one of most simple excel function for "quick and dirty" extraction purposes I came across. Kudo!

To matthewspatrick: perfection is always self-reinforcing. Armed with your provided solution and explanations, I will be able to adapt it to many usages in the future. Also, thanks for the time you took to build that in-depth article. You earned my vote and I accept your solution.  

FG

Author

Commented:
Hi MatthewsPatrick,

Well done!
Also, I will study carefully the article in reference.

Thx

FG
Hi,

BTW,

you could try

=XTRACT($A2,2) for second element

=XTRACT($A2,4) for 4th

Most Valuable Expert 2013

Commented:
Hello FG,

Appreciate your honest feedback - I suspected that you would want a more streamlined approach - Patrick is your man for regex solutions as you correctly identified - added mine as a possible option...and I enjoyed the challenge.

Yours is a model approach to question asking, I think. Concise, clear question, with examples (a crucial element that many miss) and a considered evaluation of the suggested solutions. If I could give points for questions rather than answers you'd get the maximum here! Thank you

regards, barry