Link to home
Start Free TrialLog in
Avatar of AlexPonnath
AlexPonnathFlag for United States of America

asked on

How can i find all input names and the coresponding values in text file

I am in need of parsing field name and values from an html form to add to my db. I know i can go and do a find
 "input name='" then start another find to find the closing "'" and get the data via mid function then do the same
 for value via find "value='"
 I was wondering if there is an easier way to loop the doc and extract all input names and the associated values ?

 Below is a sample of what my page to parse looks like

<input name='a_glare'
                        value='B'
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  
        </td>



                 <td align="center">


                    <input name='a_testani'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  

                 </td>

                 <td align="center">

                    <input name='a_tksig'
                        value='EC'
                        class='inputbox-highlighted-false'
                        size='2'
                        maxlength='2'>  


                 </td>

                 <td align="center">

                    <input name='a_sacnon'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  

                 </td>

                 <td align="center">

                    <input name='a_ot'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  

                 </td>


                 <td align="center">

                    <input name='a_ovlp'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  
Avatar of Michael Fowler
Michael Fowler
Flag of Australia image

ASKER CERTIFIED SOLUTION
Avatar of Ian
Ian
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi there AlexPonnath,

If you can get regular expressions going under program control, then you would just need to itterate over the html page,  feeding in the first pattern match (name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$) and select the match 2 and match 4 from the result. Note depending on the routines,  a match number 0 is usually returned which is the whole pattern. (In addition to match1, match2, match3 and match4).

Ian
Avatar of AlexPonnath

ASKER

Thanks, I have ultraedit which supports the regular expressions and it works as advertised. Great job, I ended up with
exactly what you said after running the 3 passes over the file. Now I just have to figure out how I can do this in my code.

I am a bit confused on your last comment on match 2 and 4 , I assume match1 is the pattern match (name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$) but not sure about 2 and 4 and how I would access them

Thanks
Sorry, I didnt explain the numbering scheme very well.

Each set of parens ( up to matching ) is a kept match, numbered 1, 2, ...  

The numbering is by the order of the left paren, so that you have a method of uniquly numbering even with nested matches.

So in
name=    ('|")    (\w*)   \1    \s*   value=   ('|")    (\w*)    \3     .*    $
----------    ===    ====   ---    ----    ---------   ===    ====    ---     ---    --
                   1         2                                       3          4

the bits underlined with ===  are kept in numbered sequence, the bits underlined with ---- are not kept (except the whole string that is matched is available as number 0.

For better doco, you will need to search the web.  There is loads of doco about regular expressions.  Just be warned that the complicated bits can vary between implementations.  All the basis stuff is pretty much the same the world over!
If running under program control, I would just do the match

/name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$/

and retrieve sub-match 2 and sub-match 4.    => itterate over the whole source HTML document.

[[[ Often the program functions want the string enclosed in  /  and  / as I have done here.  Read the notes on the PCRE functions you will use to see what it wants. ]]]


The replacement

\n#$2\t$4
and successive matches
^[^#].*$\R
and
\R\R

were there because with an editor you don't have other storage to put found stuff. Under program control you can just pick the matches off and store in an array or whatever.
Not sure what how the result of the PCRE you would use would go.

Maybe it will return an array of strings.

X[0]  ->  string which matches the whole name= ..... value= ...'   bit
X[1] ->   single/double quote
X[2]  ->  <name>
X[3]  ->  single/double quote
X[4]  ->  <value>

so for each itterated match, just get the returned X, and save X[2] and X[4], throw away the rest.

.