regex matching - multiple lines

Posted on 2007-03-28
Last Modified: 2013-12-26
I've been scratching my head a bit on this one (regex isn't my strong suit).  I am assigning output from a executable to a variable in my script as so:

data=`$XM list -l | sed s/\(//g | sed s/\)//g`; #strip parens

I want to pull out lines matching:


from $data, but I can't figure out exactly how to go about that via a regex.  As the input is not coming in via a file, I don't know how to grab one line at a time and I can't save the output temporarily to a file as that is one of my restrictions.  There are multiple occurrences of the below block in the output I need to parse and I don't know beforehand how many there will be.

How can I take this info and parse it into an array holding the tokens?  The tokens being defined as the line beginning with domid and the line beginning with cpu_time.  I'm under the impression that bash does not have multi-dimensional arrays so I have to store these in 2 arrays, right?

    (domid 14)
    (uuid 04500ade-a703-23b7-e6f8-31e41c588c00)
    (vcpus 1)
    (cpu_weight 1.0)
    (memory 160)
    (shadow_memory 0)
    (maxmem 160)
    (features )
    (on_poweroff destroy)
    (on_reboot restart)
    (on_crash destroy)
            (kernel /home/users/
            (root '/dev/xvda1 ro')
            (backend 0)
            (script vif-bridge)
            (bridge xen-br0)
            (mac aa:00:79:64:33:ce)
            (backend 0)
            (dev xvda1:disk)
            (uname file:/home/users/
            (mode w)
            (backend 0)
            (dev xvda9:disk)
            (uname file:/home/users/
            (mode w)
    (state -b----)
    (shutdown_reason poweroff)
    (cpu_time 20205.5002826)
    (online_vcpus 1)
    (up_time 3047639.48897)
    (start_time 1171852894.06)
Question by:lomidien
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 10

Expert Comment

ID: 18807671
Your regex would be /[^[domain][cpu_time]]/

Author Comment

ID: 18808017
The line that I'm parsing is the line above, but with 10-50 repetitions.  There are no line endings because I've stripped \n using sed.  How can I apply the above to tokenize entries of:

domid & cpu_time

and stuff them into an array(s) so that I can retried them like:

dom[n] & cpu[n]

It may be that I'm going about this wrong, I'm not sure, but I'm trying to put this in a bash script like so:

(irrelevant lines stripped)


XM="/usr/sbin/xm"; #location of xm executable

data=`$XM list -l | sed s/\(//g | sed s/\)//g`;

#i want to iterate over the input (like the above block) and pull
#substrings matching "domid NUMBER" and "cpu_time NUMBER" and 'tokenize'
#those into an array here


I hope that this is clear, I'm a java programmer by nature and bash scripting isn't exactly my forte. :)
LVL 84

Expert Comment

ID: 18809713
dom=(`$XM list -l | sed s/\(//g | sed s/\)//g | grep domid`)
cpu=(`$XM list -l | sed s/\(//g | sed s/\)//g | grep cpu_time`)
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Author Comment

ID: 18814274

That works assuming that there is only 1 block like I posted.  The input consists of this block repeated as many as 50 times and I need to pull out each match.  I want to stuff the matching portions into an array where each element will hold a single matching entry.

LVL 84

Accepted Solution

ozo earned 500 total points
ID: 18817272
dom=(`$XM list -l | sed s/\(//g | sed s/\)//g | grep domid`) does stuff the matching portions into an array where each element will hold a single matching entry.
echo ${dom[0]} would show the first entry ${dom[*]} would show all ig them

Expert Comment

ID: 18990291
# Assumes well formatted input
$XM list -l  | awk > /tmp/blah$$ '
    /^[     ]*\(domid/ {
        gsub("\\(",""); gsub(")","");
        id="domid[" n++ "]=" $2;
    /^[     ]*\(cpu_time / {
        gsub("\\(",""); gsub(")","");
        time="cpu_time[" n "]=" $2;
    /^[     ]*\(domain/ && id != "" {
        print id "; " time;
        id=""; time="";
    END {
        if (id != "") {
            print id "; " time;
. /tmp/blah$$

echo ${domid[0]} etc.

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
In a recent question ( here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

732 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question