C# Regex removing VB(A) comments wanted.

Let code variable value be a fragment of VB(A) code like:
const string code = @"
Option explicit 'comment
sub t
dim X 'comment with cont _
inuation _
end of comment
X = ""It's mine!"" ' inside "" is not a comment designation
end sub
";

Open in new window

The following code shall remove all comments
Console.WriteLine(Regex(code,<pattern I am asking about>,<null or $1 or what?>))

Open in new window

The result shall be as follows:
Option explicit 
sub t
dim X 
X = "It's mine!" 
end sub

Open in new window

I expect something working (a little bit) better than my:
Debug.Print(Regex.Replace(code, @"(?<![""].*)(?s)'.*?[^_](\r\n)", "$1"));
LVL 1
midfdeAsked:
Who is Participating?
 
midfdeConnect With a Mentor Author Commented:
Excellent observations, various emotions, not enough (for me) evidences. Therefore I had to solve my problem with the following code:
using System;
using System.Text.RegularExpressions;
using System.Diagnostics;

namespace removeVBComments {
    class Program {
        const string VBA = @"
Option explicit 'comment
sub t
dim X 'comment with cont _
inuation _
end of comment
X = ""It's mine!"" ' inside "" is not a comment designation
end sub
";
        static void Main(string[] args) {
            string txt = VBA;
            Debug.Print("11)" + txt + Regex.Replace(txt, @"(?<!^[^""]*""(?:[^""]*""[^""]*"")*[^""]*)(?s)'.*?[^_](\r\n)", "$1"));
        }
    }
}

Open in new window

Its output is this:
11)
Option explicit 'comment
sub t
dim X 'comment with cont _
inuation _
end of comment
X = "It's mine!" ' inside " is not a comment designation
end sub

Option explicit 
sub t
dim X 
X = "It's mine!" 
end sub

Open in new window

Heed line (guess what?) 7 and line 13. The RE distinguishes the context of apostrophes.
I thank participants for a fruitful discussion that made me concentrate.  We all know that the easiest answer to any "Is it possible...?" question is "No!!!". Do we not?
0
 
käµfm³d 👽Commented:
This is not a good problem to tackle with regex. You really need a parser to parse a grammar such as a programming language.
0
 
midfdeConnect With a Mentor Author Commented:
No, I do not. I just want to know whether RE is powerful enough to solve the problem, hence my question.
0
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

 
käµfm³d 👽Commented:
It might be powerful enough, but the pattern would be very complicated.
0
 
Dan CraciunIT ConsultantCommented:
This
Debug.Print(Regex.Replace(code, @"'[\w\s""]*\r\n", "\r\n"))

Open in new window

works well with your sample data.

HTH,
Dan
0
 
midfdeAuthor Commented:
>>ID: 39880198 ...works well
No, it does not.

>>ID: 39880116: ...very complicated.
Something more specific than just emotions please?
0
 
Dan CraciunIT ConsultantCommented:
Tested my regex using RegexBuddy using .NET compatibility.
Where are you testing, so we can use the same software?
0
 
midfdeAuthor Commented:
Dan:
The following code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.IO;
using System.Diagnostics;

namespace removeVBComments {
    class Program {
        const string VBA = @"
Option explicit 'comment
sub t
dim X 'comment with cont _
inuation _
end of comment
X = ""It's mine!"" ' inside "" is not a comment designation
end sub
";
        static void Main(string[] args) {
            string txt = VBA;
            Debug.Print("11)" + txt + Regex.Replace(txt,  @"'[\w\s""]*\r\n", "\r\n") + "-----");
        }
    }
}

Open in new window

results in
11)
Option explicit 'comment
sub t
dim X 'comment with cont _
inuation _
end of comment
X = "It's mine!" ' inside " is not a comment designation
end sub

Option explicit 
dim X 
X = "It's mine!" 
-----

Open in new window

0
 
aikimarkCommented:
Your statement
X = ""It's mine!"" ' inside "" is not a comment designation

Open in new window

is not correct syntax.

Did you mean to use
X = """It's mine!""" ' inside "" is not a comment designation

Open in new window


Is this VBA or VBScript code?
0
 
käµfm³d 👽Connect With a Mentor Commented:
The only one being emotional, unfortunately, is you. Ask around:  Regex is intended to parse a regular language, not a context-free language. Regex does not handle recursive constructs, which is what you would need in order to test for nested quotation marks. There are simply too many variations of code that you need to account for in order to capture every place where a comment can be.

The only reason that I say it "might be powerful enough" is because the regex engine that is built into .NET has additional functionality that is not a part of theoretical regular expressions. But as I said, even with these extra options, the pattern you write would be overly complicated, and damn near incomprehensible.
0
 
midfdeAuthor Commented:
To:ID: 39880684
Please see line 7 in results ID: 39880597

To: ID: 39880693
Dear kaufmed.
With all due respect I sincerely comprehend as (generally inevitable) emotions anything like "too many" and "overly complicated" in such a simple context as: "Is it possible? Prove it please."
0
 
aikimarkCommented:
You might cover two of the three cases you posted with the following:
1. Use regexp (or equivalent string replace function) to remove the line continuations.
replace: _\r\n
with: "" -- empty string

2. Then replace with this pattern
replace: '[^"]*?[^_]\r\n
with: \r\n -- carriage return & line feed

==========================
that leaves you with the statement that I've identified as not syntactically correct.
0
 
midfdeAuthor Commented:
Sorry, aikimark, I do not actually understand if your consideration has much to do with my initial request, which is to remove comments (without changing anything else) from syntactically correct VB(A) with a [one-liner] single C# regular expression. All comments. Nothing but comments please.
0
 
aikimarkCommented:
which is to remove comments (without changing anything else)
The requirement to not change anything else was not explicitly stated in your question.

As I have stated twice, your line 7 is not syntactically correct.  I have posted a correct version.
0
 
midfdeAuthor Commented:
>>The requirement to not change anything else was not explicitly stated in your question.
Sorry, it's my fault. I thought if I ask e.g. "How to remove a file?" it by default means "not to change anything else" (or else DEL *.* or even FORMAT C: might do).

>>...line 7 is not syntactically correct
MS Access "thinks" otherwise -- please see the image and compare line 7 (results!) above with penultimate line in the code on the picture.
MS Access does not find any syntax errors.
0
 
käµfm³d 👽Commented:
The code you originally posted is not the same code as what is displayed in your screenshot.
0
 
midfdeAuthor Commented:
Originally I posted C# const code whose value is VBA code (oh, see line 7 above).
0
 
aikimarkCommented:
@midfde

What version of the .Net framework are you using?  There is an updated compiler interface that should allow you to get a compiler's-eye-view of any VB.Net or C# code.
http://www.hanselman.com/blog/AnnouncingTheNewRoslynpoweredNETFrameworkReferenceSource.aspx
http://msdn.microsoft.com/en-us/vstudio/roslyn.aspx
0
 
midfdeAuthor Commented:
Thank you, aikimark.

Q: How to... using RegEx?
A: There is an updated compiler...

Looks irrelevant to me, sorry, as well as "What version...?" (BTW, it's a normal one, ".Net 4 Framework Client Profile").
0
 
aikimarkCommented:
* I don't think a single regular expression pattern will do what you want.

* I think that you would be able to solve your problem if you used the compiler to parse your code.
0
 
midfdeAuthor Commented:
>> I don't think a single regular expression pattern will do what you want.
... because of what???

I thought I might save some efforts by consulting with an expert who is fluent in RE language. My expectations are not met so far, and I'll try to solve my puzzle myself again. Later. I believe it is possible. I'll let this respectable forum know my results.
0
 
käµfm³d 👽Commented:
You have a fundamental flaw in your thinking of what regex is intended to be used for. You're not alone. Many people have tried to apply regex in scenarios where it does not make sense. As I mentioned above, it *may* be possible, and only because the regex implementations that we work with in today's programming languages have extra features that are not defined by theoretical regular expressions. If you would like an example of how something simple to understand becomes complicated when defining it in regex, take a look at a pattern in my article under the section, "Tokenizer on Steroids." What you are now trying to do is much more complicated than what I was doing in that part of the article.

The simple fact is:  You are not hearing the answer you want to hear, and so you think no one is helping you. You need to come to terms with the idea that sometimes "it won't work" or "it's not a good idea" is the right answer.
0
 
midfdeAuthor Commented:
>>You have a fundamental flaw in your thinking
Sorry, my question is not about my (or your for that matter) thinking.

>>...sometimes "it won't work" or "it's not a good idea" is the right answer.
Sometimes it is, but not in computing when it is not corroborated. It is not an answer at all unless it is followed by convincing "because..." consideration. (Remember Fermat's theorem?)

>>... it does not make sense
Some people deemed such a simple "thingy" as Turing machine may not make practical sense (because they just knew it was so).

Your reference is very good though. Thanks kaufmed.
0
 
aikimarkConnect With a Mentor Commented:
... because of what???
Because I've used Regexp, answered EE questions with regexp solutions, written articles that include regexp components, and given presentations at user groups and developer conferences.  Like kaufmed, I know its limitations and applicable problem contexts.

Have you had any formal training with grammars and lexical processing (or equivalent self-training/experience)?  If not, you can do a little reading on those subjects to confirm what kaufmed and I are asserting -- regexp is not the tool to use for the problem you have presented to us.

Since you are in the .Net environment (C#), I'm suggesting that the compiler interface might provide a solution path to your problem.  There are alternatives, but they are more complicated and would likely require more effort to implement.
0
 
aikimarkCommented:
and now...a little humor:
http://xkcd.com/1313/
http://xkcd.com/1171/

And Jeff Atwood's blog (omage/love letter) to regular expressions.
http://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/
Note #4:
Regular expressions are not Parsers.
0
 
midfdeAuthor Commented:
Splendid!

>>Regular expressions are not Parsers.
Excellent observation. It is as good as "C-language is not a parser."
0
 
käµfm³d 👽Commented:
Excellent observation

I would have assumed that to be implied in my very first comment   : \

This is not a good problem to tackle with regex. You really need a parser...
0
 
aikimarkCommented:
very good.

does your pattern also remove the comments when they are the only thing on one or more lines or follow nothing but space/tab characters?
0
 
midfdeAuthor Commented:
You better try it and let me know.
0
 
aikimarkCommented:
I see the following:
11)
Option explicit 'comment
sub t
' the purpose of the routine is
dim X 'comment with cont _
inuation _
end of comment
X = "It's mine!" ' inside " is not a comment designation
end sub

Option explicit 'comment
sub t
' the purpose of the routine is
dim X 'comment with cont _
inuation _
end of comment
X = "It's mine!" ' inside " is not a comment designation
end sub

Open in new window

0
 
midfdeAuthor Commented:
Unbelievable! (You certainly copied and pasted the code. Didn't you?)
Sysinfo returns
OS Name:                   Microsoft Windows 8.1
OS Version:                6.3.9600 N/A Build 9600
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Standalone Workstation
OS Build Type:             Multiprocessor Free
on the computer I am running MSVS 2010 on.
Please see the attached image.
It works on my computer (W8, MSVS 2010)
0
 
aikimarkCommented:
I replaced debug.print with  console.write
after Copy/paste your solution code snippet:
using System;
using System.Text.RegularExpressions;
using System.Diagnostics;

namespace removeVBComments {
    class Program {
        const string VBA = @"
Option explicit 'comment
sub t
' the purpose of the routine is
dim X 'comment with cont _
inuation _
end of comment
X = ""It's mine!"" ' inside "" is not a comment designation
end sub
";
        static void Main(string[] args) {
            string txt = VBA;
            Console.WriteLine("11)" + txt + Regex.Replace(txt, @"(?<!^[^""]*""(?:[^""]*""[^""]*"")*[^""]*)(?s)'.*?[^_](\r\n)", "$1"));
        }
    }
}

Open in new window

I did this run in a virtual environment (Mono 2.10.2.0).  Not sure of the sysinfo in that particular environment, but I'm pretty sure it isn't Win8.
http://www.compileonline.com/compile_csharp_online.php

I see that your environment does delete the single line comment.
0
 
midfdeAuthor Commented:
I think the test of web page fails here.
0
 
midfdeAuthor Commented:
Good for discussants, good for me!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.