Solved

How can i find all input names and the coresponding values in text file

Posted on 2014-11-13
7
284 Views
Last Modified: 2014-11-29
I am in need of parsing field name and values from an html form to add to my db. I know i can go and do a find
 "input name='" then start another find to find the closing "'" and get the data via mid function then do the same
 for value via find "value='"
 I was wondering if there is an easier way to loop the doc and extract all input names and the associated values ?

 Below is a sample of what my page to parse looks like

<input name='a_glare'
                        value='B'
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  
        </td>



                 <td align="center">


                    <input name='a_testani'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  

                 </td>

                 <td align="center">

                    <input name='a_tksig'
                        value='EC'
                        class='inputbox-highlighted-false'
                        size='2'
                        maxlength='2'>  


                 </td>

                 <td align="center">

                    <input name='a_sacnon'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  

                 </td>

                 <td align="center">

                    <input name='a_ot'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  

                 </td>


                 <td align="center">

                    <input name='a_ovlp'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  
0
Comment
Question by:AlexPonnath
  • 5
7 Comments
 
LVL 23

Expert Comment

by:Michael74
ID: 40441739
0
 
LVL 8

Accepted Solution

by:
ShannonEE earned 500 total points
ID: 40441741
Hi there AlexPonnath,

The eaziest way is under program control using a languiage which uses Perl Compatiable Regular expressions.

I assume you are not familiar with any languages where you use PCRE  (or you wouldn't haver asked this question).

However you can use a text editor to get the same results.

My suggestion for windows is   notepad++,   available from http://notepad-plus-plus.org/.  You can't use m$ notepad!!!  Many other editors exist for OSX and nix environments that do PCRE.

If using notepad++, open up a file with the web page you want to analyise.

(Make sure the cursor is on the first line before the first character).

then either ^H (or Search  -> Replace from the menus).

In the "Find what" field enter the following line (EXACTLY as shown, no extra blanks!)
name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$

In the "replace with" field enter the following line
\n#$2\t$4

Click on "Regular expression" at the bottom of the dialog box.

Click on "Replace All"

It will find all the   name -> value  pairs and put them on a line by themselves starting with a # character.

Now you have to get rid of all the remaining rubbish.


In the "Find what" field enter the following line
^[^#].*$\R

Clear the  "replace with"  field.

(Make sure the cursor is on the first line before the first character).
Click on "Replace all"

(removes all the lines that don't start with #)


In the "Find what" field enter the following line
\R\R

In the  "replace with"  field enter the following line
\n

(Make sure the cursor is on the first line before the first character).
Click on "Replace all"  (You will need to do this a few time until it tells you there were no replacements.

(This removes all the blank lines)

You then have a file with one line with each name value pair as below
# <name> <tab char> <value>

What you do from there is up to your processing requirements.

(for example you can remove the # characters, you can add in text surrounding the name and values, etc.)

This assumes that both the name and value are "words - that is composed of A-Z, a-z, 0-9 and underscore.  No balnks, no extras like   +-!@$%&   characters. The pattern for the match would need adjusting if you want a different rule for value and/or name.

Ian



Explaination of the regular expression:
Note blanks are important, I expand it here only to explain.

name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$
=>
name=    ('|")    (\w*)   \1    \s*   value=   ('|")    (\w*)    \3     .*    $
name=       - find these characters exactly
('|")             - next you MUST find either single or double quote. Remember that match as match number 1
(\w*)           - next you MUST match successive word characters only, but keep matching while there are
                      word characters to match.  Remember that as match 2
\1                - next exactly find the match 1 character (either single or doulbe quote
\s*             -  next match (as many as possible white space chars (blank, tabls, CR, LF)
value=       - exactly match these characters
('|")            - again match single or double quote. Remember as match 3
(\w*)          - again match word characters. Remember as match 4.
\3               - match the single/double quote found in match 3
.*               -  match any number (zero or more) characters on the rest of the line.
$                - stop at the end of line (but don't gobble it up)

For the replacement
\n#$2\t$4
=>
\n   #   $2   \t   $4
\n             - start a new line
#              - put in a hatch character (this could be another matker if you wanted)
$2            - put in the second match (the name part).
\t             -  put in a tab character  (you could have a comma it you wanted
$4            - put in the 4th match (the value part)

Note in the replacement part you need to use  $2 unlike  \2 that would be used in the find pattern.

=======

Also
^[^#].*$\R
=>
^  [^#]  .*   $   \R
^          - start at the begining of the line
[^#]     - find one character that is not a hatch character
.*         - find a many as possible "any" characters on the same line
$          - match up to the end of line
\R        - gobble up the end of line (\R means CR, or LF, or CR+LF or LF+CR)

====
0
 
LVL 8

Expert Comment

by:ShannonEE
ID: 40441770
Hi there AlexPonnath,

If you can get regular expressions going under program control, then you would just need to itterate over the html page,  feeding in the first pattern match (name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$) and select the match 2 and match 4 from the result. Note depending on the routines,  a match number 0 is usually returned which is the whole pattern. (In addition to match1, match2, match3 and match4).

Ian
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 

Author Comment

by:AlexPonnath
ID: 40441943
Thanks, I have ultraedit which supports the regular expressions and it works as advertised. Great job, I ended up with
exactly what you said after running the 3 passes over the file. Now I just have to figure out how I can do this in my code.

I am a bit confused on your last comment on match 2 and 4 , I assume match1 is the pattern match (name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$) but not sure about 2 and 4 and how I would access them

Thanks
0
 
LVL 8

Expert Comment

by:ShannonEE
ID: 40442013
Sorry, I didnt explain the numbering scheme very well.

Each set of parens ( up to matching ) is a kept match, numbered 1, 2, ...  

The numbering is by the order of the left paren, so that you have a method of uniquly numbering even with nested matches.

So in
name=    ('|")    (\w*)   \1    \s*   value=   ('|")    (\w*)    \3     .*    $
----------    ===    ====   ---    ----    ---------   ===    ====    ---     ---    --
                   1         2                                       3          4

the bits underlined with ===  are kept in numbered sequence, the bits underlined with ---- are not kept (except the whole string that is matched is available as number 0.

For better doco, you will need to search the web.  There is loads of doco about regular expressions.  Just be warned that the complicated bits can vary between implementations.  All the basis stuff is pretty much the same the world over!
0
 
LVL 8

Expert Comment

by:ShannonEE
ID: 40442021
If running under program control, I would just do the match

/name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$/

and retrieve sub-match 2 and sub-match 4.    => itterate over the whole source HTML document.

[[[ Often the program functions want the string enclosed in  /  and  / as I have done here.  Read the notes on the PCRE functions you will use to see what it wants. ]]]


The replacement

\n#$2\t$4
and successive matches
^[^#].*$\R
and
\R\R

were there because with an editor you don't have other storage to put found stuff. Under program control you can just pick the matches off and store in an array or whatever.
0
 
LVL 8

Expert Comment

by:ShannonEE
ID: 40442029
Not sure what how the result of the PCRE you would use would go.

Maybe it will return an array of strings.

X[0]  ->  string which matches the whole name= ..... value= ...'   bit
X[1] ->   single/double quote
X[2]  ->  <name>
X[3]  ->  single/double quote
X[4]  ->  <value>

so for each itterated match, just get the returned X, and save X[2] and X[4], throw away the rest.

.
0

Featured Post

How our DevOps Teams Maximize Uptime

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us. Read the use case whitepaper.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question