raterus
asked on
Parse Search Text
I'm looking for help on something that seems like it would be highly discussed, but I can't seem to find much on it.
I'm interested in turning this mess
blah, A&Z this "some word", another word
into "searchable" tokens, e.g.
blah
A&Z
this
some word
another
word
The main delimiters here are whitespace and comma's, with quotes allowing for multiple words in a token.
Certainly I can hack my way through this, but I'm looking for some good advice from someone who's already done this.
--Michael
I'm interested in turning this mess
blah, A&Z this "some word", another word
into "searchable" tokens, e.g.
blah
A&Z
this
some word
another
word
The main delimiters here are whitespace and comma's, with quotes allowing for multiple words in a token.
Certainly I can hack my way through this, but I'm looking for some good advice from someone who's already done this.
--Michael
ASKER
Thank you, however I'm well versed in the Split methods available to me, my main question is around the proper parsing of quoted delimiters, as discussed in my original post.
blah, A&Z this "some word", another word
needs to be split/parsed into
blah
A&Z
this
some word <-- Very important!
another
word
blah, A&Z this "some word", another word
needs to be split/parsed into
blah
A&Z
this
some word <-- Very important!
another
word
in the For Each s In split, you could go something like:
s = s.Replace("""", "")
That will strip out all double quotes...
Jake
s = s.Replace("""", "")
That will strip out all double quotes...
Jake
ASKER
I don't think you are quite understanding what I'm doing, please reread my question/first comment. Removing quotes is NOT my only intent here.
Michael,
When you say "whitespace," are you including tabs and multiple spaces, or just single spaces?
When you say "whitespace," are you including tabs and multiple spaces, or just single spaces?
ASKER
tabs, spaces (any number). This is a "search" box I'm parsing
Ah... so we have to account for the fact that SELECT * FROM Users WHERE Clue>0 returns no rows?
What about the case where the user enters "" or '?
What about the case where the user enters "" or '?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Yes Jeff, exactly. I definitely think there were a few rows returned.
This would have to account for stupid-user tricks, something like this
"bad joe" says he wants to "break my program
would "recover" and likely ignore the third quote.
Right now I'm not going to worry about single quotes, but I want them to be included in the final token incase they are searching for "o'brien" or something.
This would have to account for stupid-user tricks, something like this
"bad joe" says he wants to "break my program
would "recover" and likely ignore the third quote.
Right now I'm not going to worry about single quotes, but I want them to be included in the final token incase they are searching for "o'brien" or something.
Does
"bad joe" says he wants to "break my program
return
"bad joe"
says
he
wants
to
"break my program"
OR
"bad joe"
says
he
wants
to
break
my
program
??
"bad joe" says he wants to "break my program
return
"bad joe"
says
he
wants
to
"break my program"
OR
"bad joe"
says
he
wants
to
break
my
program
??
Dim fieldValues As String() = ParseLine(TextBox1.Text)
Private Shared Function ParseLine(ByVal oneLine As String) As String()
Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")* (?![^""]*" "))"
Dim r As System.Text.RegularExpress ions.Regex = _
New System.Text.RegularExpress ions.Regex (pattern)
Return r.Split(oneLine)
End Function
Private Shared Function ParseLine(ByVal oneLine As String) As String()
Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*
Dim r As System.Text.RegularExpress
New System.Text.RegularExpress
Return r.Split(oneLine)
End Function
ASKER
the latter Jeff
Ronald, thanks for the regex. Unfortunately here's the "token's" it ended up parsing out for me from...looks more like it is splitting on commas
blah, A&Z this "some word", another word
--
blah
A&Z this "some word"
another word
Ronald, thanks for the regex. Unfortunately here's the "token's" it ended up parsing out for me from...looks more like it is splitting on commas
blah, A&Z this "some word", another word
--
blah
A&Z this "some word"
another word
ASKER
@Sancler, oh I'm not ignoring you, it's actually a nice idea I may take a good look at. However, I'm hoping to have someone do it for me, thus achieving the state of true laziness :-)
**Articles posted that go into this discussion, concerning a .Net language will get assist points.**
**Articles posted that go into this discussion, concerning a .Net language will get assist points.**
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
oh... and add "return output" :)
ASKER
I'm extensively testing Chaosian's and Fernando's approaches
I can say both work very well for what I'm doing....
I can say both work very well for what I'm doing....
just a small adjustment and it should work
Dim fieldValues As String() = ParseLine(TextBox1.Text)
Private Shared Function ParseLine(ByVal oneLine As String) As String()
Dim pattern As String = "[ ,](?=(?:[^""]*""[^""]*"")* (?![^""]*" "))"
Dim r As System.Text.RegularExpress ions.Regex = _
New System.Text.RegularExpress ions.Regex (pattern)
Return r.Split(oneLine)
End Function
Dim fieldValues As String() = ParseLine(TextBox1.Text)
Private Shared Function ParseLine(ByVal oneLine As String) As String()
Dim pattern As String = "[ ,](?=(?:[^""]*""[^""]*"")*
Dim r As System.Text.RegularExpress
New System.Text.RegularExpress
Return r.Split(oneLine)
End Function
Ronald
I think that wants "+" after "[ ,]". Otherwise - for me anyway - it gives an empty string in the places where it encounters both space and comma.
;-)
Roger
I think that wants "+" after "[ ,]". Otherwise - for me anyway - it gives an empty string in the places where it encounters both space and comma.
;-)
Roger
Yep, you are right Sancler, it does. I just filtered out the empty string afterwards :-)
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
All three solutions worked, I opted to go with Chaosian's. Why I don't really know :-)
Dim delimStr As String = " ,.:"
Dim delimiter As Char() = delimStr.ToCharArray()
Dim words As String = "one two,three:four."
Dim split As String() = Nothing
Console.WriteLine("The delimiters are -{0}-", delimStr)
Dim x As Integer
For x = 1 To 5
split = words.Split(delimiter, x)
Console.WriteLine(ControlC
Dim s As String
For Each s In split
Console.WriteLine("-{0}-",
Next s
Next x