Regex.Match question C#

Hello,

I have a mission to scrape comments from this website:
http://www.youtube.com/watch_ajax?v=LfLFMf4chBc&action_get_comments=1&p=0&commentthreshold=-5&commentfilter=0&page_size=10

The problem is, comments make me trouble:
What I have is:
if (!String.IsNullOrEmpty(input))
                {
                    MatchCollection collection = Regex.Matches(input, "<div class=\"watch-comment-body\"><div >([^']*?)</div>");
                    if (collection.Count > 1)
                    {
                        for (int i = 0; i < collection.Count; )
                        {
                            currentComment = (Regex.Match(collection[i].Value, "<div >([^']*?)</div>").Groups[1].Value);                          
                            if (currentComment != "Show" || currentComment != "Hide")
                            {
                                cache = cache + "#" + currentComment;
                                counter();
                            }
                            i++;
                        }
//...process data
}

But Regex.Match will always return empty match, since the code of website is:
                <div class="watch-comment-body">
                    <div >

                        the comment text I want to get is here
                    </div>

Its divided in lines, and that is my problem. Any ideas on how I could fix this issue?
GVNPublic123Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Gary DavisDir Internet SvcsCommented:
There is white space before the 2nd div but your 1st pattern does not account for it so it fails the match. Remove the <div>:
 Regex.Matches(input, "<div class=\"watch-comment-body\">([^']*?)</div>");
 
Gary Davis
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
GVNPublic123Author Commented:
It didnt work. I think its still wrong according to the syntax on website and how Regex operates.
0
Gary DavisDir Internet SvcsCommented:
I often use a tool called Expresso to help debug and test out regular expressions. It is free and at http://www.ultrapico.com/.
I ran this reg exp:
"<div class=\"watch-comment-body\">([^']*?)</div>
 against your sample data and it matched, setting the found group to the data from the 2nd <div> upto but not including the 1st (and only) </div>. Maybe not what you wanted but it did match and being on multiple lines did not matter.
Changing the regex to this:
<div class=\"watch-comment-body\">\s*<div>\s*([^']*?)\s*</div>
Will get just the string "the comment text I want to get is here" and maybe that's what you want.
One thing to point out. Your 2nd <div> in your example has a space after the v which will cause a problem. Code for it if you need to.
Gary
 
0
Amazon Web Services

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

GVNPublic123Author Commented:
<div class=\"watch-comment-body\">([^']*?)</div> = works, but results in many white spaces (empty lines).

<div class=\"watch-comment-body\">\s*<div>\s*([^']*?)\s*</div> = failes to compile with error:
Error      1      Unrecognized escape sequence      (underlines s in \s")
0
Gary DavisDir Internet SvcsCommented:
When you need backslashes in strings, use the @ before the 1st quote:
Regex.Matches(input, @"<div class=\"watch-comment-body\">\s*<div>\s*([^']*?)\s*</div>");
Or double up the backslashes.
 
0
GVNPublic123Author Commented:
I tried @, but than the \ before " doesnt do the trick anymore.  \"watch-comment-body\" How do I fix that?
0
Gary DavisDir Internet SvcsCommented:
OK. Leave the @ off and double up the backslash for the \s. You are then escaping the backslash (\\) just like you are escaping the quote (\").
 
Regex.Matches(input, "<div class=\"watch-comment-body\">\\s*<div>\\s*([^']*?)\\s*</div>");
0
GVNPublic123Author Commented:
Yes, but by doing  MatchCollection collection = Regex.Matches(input, "<div class=\"watch-comment-body\">\\s*<div>\\s*([^']*?)\\s*</div>"); I get no matches
0
Gary DavisDir Internet SvcsCommented:
Did you remove the space after the V in the data in: <div >?
Or add a space or \\s* in the pattern.
0
GVNPublic123Author Commented:
Ah, screw that, it doesnt work either.

Now I have:
MatchCollection collection = Regex.Matches(input, "<div class=\"watch-comment-body\">([^']*?)</div>");

But than on the end, I get the matches, and when I write to file the file is like:
empty space                                                                             comment
empty space                                                                             comment

I tried to do match[i].Trim();, but it didnt work. Please help.
0
GVNPublic123Author Commented:
Ah, implementing the new string and trimming it solved the problem. Thank you very much guys!
0
GVNPublic123Author Commented:
Good answer
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C#

From novice to tech pro — start learning today.