Solved

c# regular expression to parse MS Open XML fragment

Posted on 2012-04-04
5
308 Views
Last Modified: 2012-06-27
I am looking for an accurate way to parse the following, I am assuming regex or maybe XML parser. This is for a c# application.

examples:

"w:name w:val=\"CALC_OFFICEADDRFULL\"
"w:enabled w:val=\"true\"
"w:calcOnExit w:val=\"false\"
"w:type w:val=\"regular\"

expected result (key value or something)

name,CALC_OFFICEADDRFULL
enabled,true
calcOnExit,false
type,regular

Much appreciated,

-Markus
0
Comment
Question by:markusr13
  • 3
  • 2
5 Comments
 
LVL 23

Expert Comment

by:wdosanjos
ID: 37805958
Please try the following:
var rxKey = new Regex(@"(?<=w:)\w+(?=\s)");
var rxValue = new Regex("(?<=w:val=\\\\\").+(?=\\\\\")");
var tests = new string[]
{
	"\"w:name w:val=\\\"CALC_OFFICEADDRFULL\\\"",
	"\"w:enabled w:val=\\\"true\\\"",
	"\"w:calcOnExit w:val=\\\"false\\\"",
	"\"w:type w:val=\\\"regular\\\""
};

foreach (var test in tests)
{
	Console.WriteLine("{0},{1}", rxKey.Match(test).Value, rxValue.Match(test).Value);
}

Open in new window

Output:
name,CALC_OFFICEADDRFULL
enabled,true
calcOnExit,false
type,regular

Open in new window

0
 
LVL 23

Expert Comment

by:wdosanjos
ID: 37806007
Here is another (faster) option with simple substrings:
var tests = new string[]
{
	"\"w:name w:val=\\\"CALC_OFFICEADDRFULL\\\"",
	"\"w:enabled w:val=\\\"true\\\"",
	"\"w:calcOnExit w:val=\\\"false\\\"",
	"\"w:type w:val=\\\"regular\\\""
};

foreach (var test in tests)
{
	var key = test.Substring(3, test.IndexOf(" ") - 3);

	int i = test.IndexOf("\\\"") + 2;
	var value = test.Substring(i, test.Length - i - 2);
	
	Console.WriteLine("{0},{1}", key, value);
}

Open in new window

0
 

Author Comment

by:markusr13
ID: 37807754
Sorry,

The debugger through in the \'s (and i had an extra quote)

try

w:name w:val="CALC_OFFICEADDRFULL"
w:enabled w:val="true"
w:calcOnExit w:val="false"
w:type w:val="regular"

-Markus
0
 
LVL 23

Accepted Solution

by:
wdosanjos earned 500 total points
ID: 37807787
Not a problem.  There you go.

Option 1: (Regex)
var rxKey = new Regex(@"(?<=w:)\w+(?=\s)");
var rxValue = new Regex("(?<=w:val=\").+(?=\")");

var tests = new string[]
{
"w:name w:val=\"CALC_OFFICEADDRFULL\"",
"w:enabled w:val=\"true\"",
"w:calcOnExit w:val=\"false\"",
"w:type w:val=\"regular\""
};

foreach (var test in tests)
{
    Console.WriteLine("{0},{1}", rxKey.Match(test).Value, rxValue.Match(test).Value);
}

Open in new window


Option 2: (Substring)
var tests = new string[]
{
"w:name w:val=\"CALC_OFFICEADDRFULL\"",
"w:enabled w:val=\"true\"",
"w:calcOnExit w:val=\"false\"",
"w:type w:val=\"regular\""
};

foreach (var test in tests)
{
	var key = test.Substring(2, test.IndexOf(" ") - 2);

	int i = test.IndexOf("\"") + 1;
	var value = test.Substring(i, test.Length - i - 1);
	
	Console.WriteLine("{0},{1}", key, value);
}

Open in new window

0
 

Author Comment

by:markusr13
ID: 37807789
points increased due to my data error.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
asp.net bundle 8 36
Problem to picture file 3 42
Windows Service to Receive TCP Packets 4 50
Different Delete Messages 7 23
This article is for Object-Oriented Programming (OOP) beginners. An Interface contains declarations of events, indexers, methods and/or properties. Any class which implements the Interface should provide the concrete implementation for each Inter…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now