Regular Expression for PICS

Posted on 2004-11-21
Last Modified: 2010-04-15

I have been trying (unsuccessfullly) for a short time to right a regex to parse a RAT file as used to load content rating information.

Forgetting the file header, the data is grouped like this:

(category (transmit-as "SS~~000") (name "Age Range")
   (min 1) (max 2) (label-only true) (integer true)
     (name "All Ages")
     (value 1))
     (name "Older Children")
     (value 2)))

  (transmit-as "og")
  (name "Other Topics - Material that might be perceived as setting a bad example for young children.")
   (name "")
   (description "Do not allow access to sites that contain images, portrayals or descriptions that might be perceived as setting a bad example for young children.")
   (value 0) )
   (name "")
   (description "Allow access to sites that contain images, portrayals or descriptions that might be perceived as setting a bad example for young children.")
   (value 1) ))

(Yes the catagories can end with either a ) )) or a ))) just to make it fun)

The above two categroies are from different RAT files, but I would like a single expression (if at all possible) to look after either format.

I have a regex that matches Categories only, but I am stuck on one that will match the labels.  If this could include Capture-By-Name (?<name>) it would be good.
I would like to match:

<label> <name> <description> <value>
<label> <name> <description> <value>

I hope this makes sense?  I'm being lazy here - I could spend another few hours on this, but it is Zzzz time - so I'm offerring 500 to the first person who supplies the required regex for me :-)

Many thanks
Question by:Si-clone
    LVL 7

    Accepted Solution

    You might try something like:

         public struct ARec {
            public string CATEGORY;
            public string LABEL;
            public string VALUE;
            public string DESCRIPTION;

               Regex regex = new Regex(
                    + @"|description\s\""(?<DESCRIPTION>[0-9A-Za-z\s\-\.\,]+)|value\s"
                    + @"(?<VALUE>[0-9A-Za-z\s\-\.]+)",
                    | RegexOptions.Multiline
                    | RegexOptions.ExplicitCapture
                    | RegexOptions.IgnorePatternWhitespace
                    | RegexOptions.Compiled

                MatchCollection matches = regex.Matches(pic);
                bool isCategory = false;
                bool isComplete = false;
                ARec[] recs = new ARec[10];
                int count = 0;
                foreach ( Match match in matches ) {
                    if ( isComplete ) {
                        isComplete = false;

                    if ( match.Value.IndexOf("category") >= 0 ) {
                        isCategory = true;

                    if ( match.Groups["NAME"].Success ) {
                        if ( isCategory ) {
                            recs[count].CATEGORY = match.Groups["NAME"].Value;
                            isCategory = false;
                        } else {
                            recs[count].LABEL = match.Groups["NAME"].Value;
                    } else if ( match.Groups["DESCRIPTION"].Success ) {
                        recs[count].DESCRIPTION = match.Groups["DESCRIPTION"].Value;
                    } else if  ( match.Groups["VALUE"].Success ) {
                        recs[count].VALUE = match.Groups["VALUE"].Value;
                        isComplete = true;

    This will give you an array of structures with the fields of category, label, value and description... of course a class may server your purpose better...


    Author Comment

    Thanks for your post.  This was exactly what I was looking for, but showed me to open my eyes to other avenues.


    Featured Post

    Why You Should Analyze Threat Actor TTPs

    After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

    Join & Write a Comment

    Article by: Ivo
    Anonymous Types in C# by Ivo Stoykov Anonymous Types are useful when  we do not need to follow usual work-flow -- creating object of some type, assign some read-only values and then doing something with them. Instead we can encapsulate this read…
    Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
    In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
    Here's a very brief overview of the methods PRTG Network Monitor ( offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

    754 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now