Link to home
Start Free TrialLog in
Avatar of Daniel Lugo Pino
Daniel Lugo Pino

asked on

Removing items in a List<T> that contain one item of an array in any order or position

I have been tasked with adding some very specific searching behaviour to a MVC application using Entity Framework.  

The searching should be able to get:

Entries with two or more words that are not just exactly next to each other. If they type in "The Revenge":
  • The Revenge
  • The Horrible Revenge
  • Revenge of the Machines

Parts of words. If they type in "Smith":
  • Mr and Mrs Smith
  • Meet the Smithsons
  • The Incredible Blacksmith

I have written the following method in my controller class:

public ActionResult Index(string searchString)
        {

            var films = from s in db.films
                        select s;

            if (!string.IsNullOrEmpty(searchString) && !searchString.Any(x => Char.IsWhiteSpace(x)))
            {
                searchString.Trim();
                films = films.Where(s => s.title.Contains(searchString)
                    || s.title.StartsWith(searchString)
                    || s.title.EndsWith(searchString)
                    || s.genre.Contains(searchString)
                    || s.genre.StartsWith(searchString)
                    || s.genre.EndsWith(searchString)
                    || s.synopsis.Contains(searchString)
                    || s.synopsis.StartsWith(searchString)
                    || s.synopsis.EndsWith(searchString)
            }
            else if (!string.IsNullOrEmpty(searchString) && searchString.Any(x => Char.IsWhiteSpace(x)))
            {
                searchString.Trim();
                string[] strings = searchString.Split(' ');
                var finalFilms = new List<Films>();

                foreach (var splitString in strings)
                {
                    finalFilms.AddRange(items.Where(s => s.synopsis.Contains(splitString)
                    || s.genre.Contains(splitString)
                    || s.title.Contains(splitString)));
                }

                finalIFilms.RemoveAll(i =>
                {
                    int count = strings.Count(s => i.synopsis?.Contains(s) ?? false);
                    return count > 0 && count < strings.Length;
                });

                films = finalIFilms.ToList().AsQueryable();
            } 
            return View(films);
        }

Open in new window


I am getting the correct results when searching, but also getting too many other entries which are wrong. Using "The Revenge" example again. Entries which should not be returned but still are:

  • A Revenge Movie
  • The Incredible Movie


The Films model looks like this:
[Table("tblFilms")]
    public class Films
    {
        [Key]
        [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
        public int filmID { get; set; }

        [Display(Name = "Films: ")]
        public string title { get; set; }

        [Display(Name = "Genre: ")]
        public string genre { get; set; }

        [Display(Name = "Synopsis: ")]
        public string synopsis { get; set; }

Open in new window


Is there something obvious I am doing wrong when removing the undesired entries in the else if part of my controller? Or am I approaching the searching functionality wrong?

Something that might be worth noting is that most of the searching will be done on the 'Synopsis' field. Which tends to be hundreds of words long. There are also thousands of entries in tblFilms, if performance is a concern.
Avatar of Dmitry G
Dmitry G
Flag of New Zealand image

One thing I noticed that might be not quite correct:

                foreach (var splitString in strings)
                {
                    finalFilms.AddRange(items.Where(s => s.synopsis.Contains(splitString)
                    || s.genre.Contains(splitString)
                    || s.title.Contains(splitString)));
                }

Open in new window


What happens here? You split your string, e.g., "The Revenge", to two strings, then you check if synopsis etc. contains "The". If yes - add to results. The you do same for "Revenge". These two results definitely contain one of search words. But not necessarily both! That's why you have
  • A Revenge Movie
  • The Incredible Movie
.

Really there are other similar inconsistencies in the algorithm.
You can really simplify your logic using LINQ:
return View((from film in db.films
             let terms = !string.IsNullOrEmpty(searchString) ? searchString.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries) : new[] { "" }
             where terms.All(term => film.Genre.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1) ||
             terms.All(term => film.Synopsis.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1) ||
             terms.All(term => film.Title.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1)
             select film).ToList().AsQueryable()));

Open in new window


Proof of concept -
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace EE_Q29054800
{
    class Program
    {
        static List<Film> films = new List<Film>()
        {
            new Film { ID = 0, Genre = "Fiction", Synopsis = "The greatest love story ever told", Title = "The Revenge" },
            new Film { ID = 1, Genre = "Non-Fiction", Synopsis = "The coldest story ever told", Title = "A Revenge Movie" },
            new Film { ID = 2, Genre = "Fiction", Synopsis = "The greatest story ever told", Title = "The Incredible Movie" },
            new Film { ID = 3, Genre = "Science Fiction", Synopsis = "The funniest love story ever told", Title = "Revenge of the machines" },
            new Film { ID = 4, Genre = "Non-Fiction", Synopsis = "The saddest love story ever told", Title = "The Horrible Revenge" },
            new Film { ID = 5, Genre = "Fiction", Synopsis = "The greatest spy story ever told", Title = "Mr and Mrs Smith" },
            new Film { ID = 6, Genre = "Non-Fiction", Synopsis = "The best documentary about smithing", Title = "The Dying Art" },
            new Film { ID = 7, Genre = "Science Fiction", Synopsis = "The best android family in town", Title = "Meet The Smithsons" },
            new Film { ID = 8, Genre = "Action", Synopsis = "He has a hammer, an anvil and a bad attitude", Title = "The Incredible Blacksmith" }
        };

        static void Main(string[] args)
        {
            films.FindMatches("The Revenge").ConvertToDataTable("The Revenge").PrintToConsole();
            Console.WriteLine();
            films.FindMatches("smith").ConvertToDataTable("Smith").PrintToConsole();
            Console.WriteLine();
            films.FindMatches(new[] { "story", "told" }).ConvertToDataTable("Story Told").PrintToConsole();
            Console.ReadLine();
        }
    }

    class Film
    {
        public int ID { get; set; }
        public string Title { get; set; }
        public string Genre { get; set; }
        public string Synopsis { get; set; }
    }

    static class Extensions
    {
        public static DataTable ConvertToDataTable<T>(this IEnumerable<T> source, string name = "")
        {
            DataTable table = new DataTable(name);
            var properties = TypeDescriptor.GetProperties(typeof(T));
            foreach (PropertyDescriptor property in properties)
            {
                if (property.PropertyType.IsGenericType && property.PropertyType.GetGenericTypeDefinition().Equals(typeof(Nullable<>)))
                    table.Columns.Add(property.Name, property.PropertyType.GetGenericArguments()[0]);
                else
                    table.Columns.Add(property.Name, property.PropertyType);
            }

            object[] values = new object[properties.Count];
            foreach (var item in source)
            {
                for (int i = 0; i < properties.Count; i++)
                    values[i] = properties[i].GetValue(item);
                table.Rows.Add(values);
            }
            return table;
        }

        public static IEnumerable<Film> FindMatches(this IEnumerable<Film> source, string term, char delimeter = ' ')
        {
            return FindMatches(source, term.Split(new[] { delimeter }, StringSplitOptions.RemoveEmptyEntries));
        }

        public static IEnumerable<Film> FindMatches(this IEnumerable<Film> source, IEnumerable<string> terms)
        {
            return (from film in source
                    where terms.All(term => film.Genre.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1) ||
                    terms.All(term => film.Synopsis.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1) ||
                    terms.All(term => film.Title.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1)
                    select film);
        }

        public static void PrintToConsole(this DataTable table)
        {
            var width = (25 * table.Columns.Count) + table.Columns.Count;
            Console.WriteLine("Table Name: {0}", table.TableName);
            width.DrawHorizontalSeperator('=');
            Console.WriteLine("|{0}|", string.Join("|", table.Columns.Cast<DataColumn>().Select(x => string.Format("  {0}  ", x.ColumnName).PadRight(25))));
            width.DrawHorizontalSeperator('=');
            foreach (var row in table.Rows.Cast<DataRow>())
                Console.WriteLine("|{0}|", string.Join("|", row.ItemArray.Select(x => string.Format("  {0}  ", x.ToString()).PadRight(25))));
            width.DrawHorizontalSeperator('-');
        }

        static void DrawHorizontalSeperator(this int width, char seperator)
        {
            Console.WriteLine(new string(seperator, width));
        }
    }
}

Open in new window

Produces the following output -User generated image
-saige-
This question needs an answer!
Become an EE member today
7 DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.