Solved

How can i find all input names and the coresponding values in text file

Posted on 2014-11-13
7
253 Views
Last Modified: 2014-11-29
I am in need of parsing field name and values from an html form to add to my db. I know i can go and do a find
 "input name='" then start another find to find the closing "'" and get the data via mid function then do the same
 for value via find "value='"
 I was wondering if there is an easier way to loop the doc and extract all input names and the associated values ?

 Below is a sample of what my page to parse looks like

<input name='a_glare'
                        value='B'
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  
        </td>



                 <td align="center">


                    <input name='a_testani'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  

                 </td>

                 <td align="center">

                    <input name='a_tksig'
                        value='EC'
                        class='inputbox-highlighted-false'
                        size='2'
                        maxlength='2'>  


                 </td>

                 <td align="center">

                    <input name='a_sacnon'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  

                 </td>

                 <td align="center">

                    <input name='a_ot'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  

                 </td>


                 <td align="center">

                    <input name='a_ovlp'
                        value=''
                        class='inputbox-highlighted-false'
                        size='1'
                        maxlength='1'>  
0
Comment
Question by:AlexPonnath
  • 5
7 Comments
 
LVL 23

Expert Comment

by:Michael74
Comment Utility
0
 
LVL 8

Accepted Solution

by:
ShannonEE earned 500 total points
Comment Utility
Hi there AlexPonnath,

The eaziest way is under program control using a languiage which uses Perl Compatiable Regular expressions.

I assume you are not familiar with any languages where you use PCRE  (or you wouldn't haver asked this question).

However you can use a text editor to get the same results.

My suggestion for windows is   notepad++,   available from http://notepad-plus-plus.org/.  You can't use m$ notepad!!!  Many other editors exist for OSX and nix environments that do PCRE.

If using notepad++, open up a file with the web page you want to analyise.

(Make sure the cursor is on the first line before the first character).

then either ^H (or Search  -> Replace from the menus).

In the "Find what" field enter the following line (EXACTLY as shown, no extra blanks!)
name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$

In the "replace with" field enter the following line
\n#$2\t$4

Click on "Regular expression" at the bottom of the dialog box.

Click on "Replace All"

It will find all the   name -> value  pairs and put them on a line by themselves starting with a # character.

Now you have to get rid of all the remaining rubbish.


In the "Find what" field enter the following line
^[^#].*$\R

Clear the  "replace with"  field.

(Make sure the cursor is on the first line before the first character).
Click on "Replace all"

(removes all the lines that don't start with #)


In the "Find what" field enter the following line
\R\R

In the  "replace with"  field enter the following line
\n

(Make sure the cursor is on the first line before the first character).
Click on "Replace all"  (You will need to do this a few time until it tells you there were no replacements.

(This removes all the blank lines)

You then have a file with one line with each name value pair as below
# <name> <tab char> <value>

What you do from there is up to your processing requirements.

(for example you can remove the # characters, you can add in text surrounding the name and values, etc.)

This assumes that both the name and value are "words - that is composed of A-Z, a-z, 0-9 and underscore.  No balnks, no extras like   +-!@$%&   characters. The pattern for the match would need adjusting if you want a different rule for value and/or name.

Ian



Explaination of the regular expression:
Note blanks are important, I expand it here only to explain.

name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$
=>
name=    ('|")    (\w*)   \1    \s*   value=   ('|")    (\w*)    \3     .*    $
name=       - find these characters exactly
('|")             - next you MUST find either single or double quote. Remember that match as match number 1
(\w*)           - next you MUST match successive word characters only, but keep matching while there are
                      word characters to match.  Remember that as match 2
\1                - next exactly find the match 1 character (either single or doulbe quote
\s*             -  next match (as many as possible white space chars (blank, tabls, CR, LF)
value=       - exactly match these characters
('|")            - again match single or double quote. Remember as match 3
(\w*)          - again match word characters. Remember as match 4.
\3               - match the single/double quote found in match 3
.*               -  match any number (zero or more) characters on the rest of the line.
$                - stop at the end of line (but don't gobble it up)

For the replacement
\n#$2\t$4
=>
\n   #   $2   \t   $4
\n             - start a new line
#              - put in a hatch character (this could be another matker if you wanted)
$2            - put in the second match (the name part).
\t             -  put in a tab character  (you could have a comma it you wanted
$4            - put in the 4th match (the value part)

Note in the replacement part you need to use  $2 unlike  \2 that would be used in the find pattern.

=======

Also
^[^#].*$\R
=>
^  [^#]  .*   $   \R
^          - start at the begining of the line
[^#]     - find one character that is not a hatch character
.*         - find a many as possible "any" characters on the same line
$          - match up to the end of line
\R        - gobble up the end of line (\R means CR, or LF, or CR+LF or LF+CR)

====
0
 
LVL 8

Expert Comment

by:ShannonEE
Comment Utility
Hi there AlexPonnath,

If you can get regular expressions going under program control, then you would just need to itterate over the html page,  feeding in the first pattern match (name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$) and select the match 2 and match 4 from the result. Note depending on the routines,  a match number 0 is usually returned which is the whole pattern. (In addition to match1, match2, match3 and match4).

Ian
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 

Author Comment

by:AlexPonnath
Comment Utility
Thanks, I have ultraedit which supports the regular expressions and it works as advertised. Great job, I ended up with
exactly what you said after running the 3 passes over the file. Now I just have to figure out how I can do this in my code.

I am a bit confused on your last comment on match 2 and 4 , I assume match1 is the pattern match (name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$) but not sure about 2 and 4 and how I would access them

Thanks
0
 
LVL 8

Expert Comment

by:ShannonEE
Comment Utility
Sorry, I didnt explain the numbering scheme very well.

Each set of parens ( up to matching ) is a kept match, numbered 1, 2, ...  

The numbering is by the order of the left paren, so that you have a method of uniquly numbering even with nested matches.

So in
name=    ('|")    (\w*)   \1    \s*   value=   ('|")    (\w*)    \3     .*    $
----------    ===    ====   ---    ----    ---------   ===    ====    ---     ---    --
                   1         2                                       3          4

the bits underlined with ===  are kept in numbered sequence, the bits underlined with ---- are not kept (except the whole string that is matched is available as number 0.

For better doco, you will need to search the web.  There is loads of doco about regular expressions.  Just be warned that the complicated bits can vary between implementations.  All the basis stuff is pretty much the same the world over!
0
 
LVL 8

Expert Comment

by:ShannonEE
Comment Utility
If running under program control, I would just do the match

/name=('|")(\w*)\1\s*value=('|")(\w*)\3.*$/

and retrieve sub-match 2 and sub-match 4.    => itterate over the whole source HTML document.

[[[ Often the program functions want the string enclosed in  /  and  / as I have done here.  Read the notes on the PCRE functions you will use to see what it wants. ]]]


The replacement

\n#$2\t$4
and successive matches
^[^#].*$\R
and
\R\R

were there because with an editor you don't have other storage to put found stuff. Under program control you can just pick the matches off and store in an array or whatever.
0
 
LVL 8

Expert Comment

by:ShannonEE
Comment Utility
Not sure what how the result of the PCRE you would use would go.

Maybe it will return an array of strings.

X[0]  ->  string which matches the whole name= ..... value= ...'   bit
X[1] ->   single/double quote
X[2]  ->  <name>
X[3]  ->  single/double quote
X[4]  ->  <value>

so for each itterated match, just get the returned X, and save X[2] and X[4], throw away the rest.

.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Today I had a very interesting conundrum that had to get solved quickly. Needless to say, it wasn't resolved quickly because when we needed it we were very rushed, but as soon as the conference call was over and I took a step back I saw the correct …
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now