Best way to detect two separate words in a Sorted ArrayList of words in C# .NET 3.5

Hi all,

Looking for a bit of guidance on how to detect separated words in a sorted list in c#.

I've looked into a few options using regular expressions but not having much success.

Say I have the following sorted ArrayList of words:


Best Words,
Collaboration,
Hello,
Two-words,

I want to be able to detect the words 'best word' and 'two-words' apart from the rest. Is there any other way besides maybe regular expressions to do this? Something more efficient. Also some guidance on what regular expression to use will be very helpful.

                   
           Regex compoundsWords = new Regex(@"\W+", RegexOptions.IgnoreCase);
           
     
           Match cad = compoundsWords.Match(words.ToString());
           if (cad.Success)
           {
               return true;
           }
           else
           {
               return false;
           }
lp84Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

leonov_alexCommented:
You need ensure that two words means "non-word symbol between word symbols"

Regex compoundsWords = new Regex(@"\w\W+\w", RegexOptions.IgnoreCase);
0
MathiyazhaganCommented:
yes, you can acheive this with regular expressions.The pattern to be matched is : \w+[\s-]+\w
it denotes a word(\w)  followed by atleast one (+) character from the list of characters of white space or hyphen ([\s-]),in turn, which is follwed by one word (+\w).

if you can want to have some more separtors like underscore or comma, you can add it to list like  :  \w+[\s-_,]+\w


Some excellent artilces on how and when to use Regular Expressions are :
http://www.radsoftware.com.au/articles/regexlearnsyntax.aspx
http://www.exforsys.com/tutorials/csharp/regular-expressions-and-csharp-.net.html
http://geekswithblogs.net/rahul/archive/2005/08/16/50330.aspx
 
Hope this helps.
ArrayList arLst = new ArrayList();
            arLst.Add("Best Words");
            arLst.Add("Collaboration");
            arLst.Add("Two-words");
            arLst.Add("Hello");
            arLst.Add("Hello World Sample");
            arLst.Sort();

            foreach (string sWord in arLst)
            {
                Match match = Regex.Match(sWord, @"\w+[\s-]+\w");
                if (match.Success)
                    Console.WriteLine(sWord);
            }

Open in new window

0
lp84Author Commented:
Excellent.

Question regarding the foreach that I just thought of, regarding performance issues. This will require an iterative linear search through the array list.

Is it possible to somehow conduct a binary search to find a match?

0
Rowby Goren Makes an Impact on Screen and Online

Learn about longtime user Rowby Goren and his great contributions to the site. We explore his method for posing questions that are likely to yield a solution, and take a look at how his career transformed from a Hollywood writer to a website entrepreneur.

MathiyazhaganCommented:
hope , Binary search is not possible on this context. because, BinarySearch is performed with two elements in the array list based on it is position .but, we are searching for every single element which is matching a pertcular pattern. so. linear iteration is the possible way.
0
leonov_alexCommented:
Binary search not applicable for this task.

To increase performance you need faster regex (@"\W\w").

Also you may pre-check strings before adding in ArrayList, and sort by this property or using custom sort function.

How many string in array list in production? How many searches after list created?
0
lp84Author Commented:
Strings are retrieved from a WS and stored in the ArrayList in this method I created

        public static ArrayList getTerms(String node)
        {

            ArrayList terms = new ArrayList();
            XmlNodeList elemList = xmlDocOfTerms.GetElementsByTagName(node);
            for (int i = 0; i < elemList.Count; i++)
            {
                terms.Add(elemList[i].InnerXml);
            }
            return terms;
        }

So your suggesting once the strings are parsed in the XML, maybe create two sorts of lists? one for compounded strings and one for single strings?

currently i'm just sorting it with terms.Sort().
With the custom sort function, either way a linear search would be required right? Say I created a custom sort method, wouldn't all the terms be sorted say like this:

"Aaa Word"
"bbb Word"
"ccc Word"
"ddd word"
"hello"
"term"
"example"

Either way a linear search will have to search through that entire list?
Not sure exactly what you mean.

Could be 100 strings in the array list, unknown of how many of them are two words or one word.

Thanks for your help
0
leonov_alexCommented:
Liner search throw 100 strings with compiled regex many times faster than fetching from WS and parsing with XmlDocument (btw XDocument faster then XmlDocument).

Yes. Creating separate lists while parsing may helps you.

Yes. If you will store elements in List<string> container (typed version ArrayList), you can use Sort(IComparer<string>) method, where IComparer<string> implemented class with custom sort function.

Its two different ways
0
lp84Author Commented:
Im afraid you don't understand what I mean. The list of strings is actually generated from XML document which is retrieved from a WS call.

I see, so if I sort the list using a custom comparator, and then invoke the binarySearch function with this comparator that will be a way of finding terms. I think this is much more flexible then creating two separate lists.

However, regarding performance I am not too sure which way is better.

Anyone have any suggestions?
0
leonov_alexCommented:
If you show case of using binarySearch, I'll try show way you need.

I do not understand particular purpose searching in the ArrayList. ArrayList not designed for searching. There are many types for searching.
0
lp84Author Commented:
I think I understand what you mean now. You mean use a sort comparator method and then use linear search to parse all the compound terms first. This will increase performance since all compound terms will be processed first? Is this correct. I was trying binarySearch but as suggested not possible. Are there any other search suggestions that may increase performance?
0
leonov_alexCommented:
Yes. Pre-sort is some kind optimization for liner search.

If you need speed up searching, I need to know what you going to do with results of searching. This knowledge important to design right data preparing before search.
0
käµfm³d 👽Commented:
If you sort first or just start searching first, you are going to have to iterate the entire list. All you are trying to do is detect if a string has a special symbol in it (e.g. space, hyphen, other punctuation, etc.) which deems the word as being compound. Why not iterate through the list (unsorted) searching each string's characters for your an element in your list of special symbols?
public static ArrayList getTerms(String node)
{
    ArrayList terms = new ArrayList();
    XmlNodeList elemList = xmlDocOfTerms.GetElementsByTagName(node);
    char[] searchList = new char[3] { ' ', ',', '-' };

    for (int i = 0; i < elemList.Count; i++)
    {
        if (elemList[i].InnerText.IndexOfAny(searchList) > 0)  // Ignore first index because we need at least one char. to form a "word"
        {
            terms.Add(elemList[i].InnerXml);
        }
    }
    return terms;
}

Open in new window

0
lp84Author Commented:
I will be using the results found in the search in conjuction with microsoft visual tools for office to use the find.execute method to find the specific string in the word document, and then once found I will delete it from the list
0
leonov_alexCommented:
Then you better to use LinkedList. It provides faster deletion from collection.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.