Daniel Lugo Pino
asked on
Removing items in a List<T> that contain one item of an array in any order or position
I have been tasked with adding some very specific searching behaviour to a MVC application using Entity Framework.
The searching should be able to get:
Entries with two or more words that are not just exactly next to each other. If they type in "The Revenge":
Parts of words. If they type in "Smith":
I have written the following method in my controller class:
I am getting the correct results when searching, but also getting too many other entries which are wrong. Using "The Revenge" example again. Entries which should not be returned but still are:
The Films model looks like this:
Is there something obvious I am doing wrong when removing the undesired entries in the else if part of my controller? Or am I approaching the searching functionality wrong?
Something that might be worth noting is that most of the searching will be done on the 'Synopsis' field. Which tends to be hundreds of words long. There are also thousands of entries in tblFilms, if performance is a concern.
The searching should be able to get:
Entries with two or more words that are not just exactly next to each other. If they type in "The Revenge":
- The Revenge
- The Horrible Revenge
- Revenge of the Machines
Parts of words. If they type in "Smith":
- Mr and Mrs Smith
- Meet the Smithsons
- The Incredible Blacksmith
I have written the following method in my controller class:
public ActionResult Index(string searchString)
{
var films = from s in db.films
select s;
if (!string.IsNullOrEmpty(searchString) && !searchString.Any(x => Char.IsWhiteSpace(x)))
{
searchString.Trim();
films = films.Where(s => s.title.Contains(searchString)
|| s.title.StartsWith(searchString)
|| s.title.EndsWith(searchString)
|| s.genre.Contains(searchString)
|| s.genre.StartsWith(searchString)
|| s.genre.EndsWith(searchString)
|| s.synopsis.Contains(searchString)
|| s.synopsis.StartsWith(searchString)
|| s.synopsis.EndsWith(searchString)
}
else if (!string.IsNullOrEmpty(searchString) && searchString.Any(x => Char.IsWhiteSpace(x)))
{
searchString.Trim();
string[] strings = searchString.Split(' ');
var finalFilms = new List<Films>();
foreach (var splitString in strings)
{
finalFilms.AddRange(items.Where(s => s.synopsis.Contains(splitString)
|| s.genre.Contains(splitString)
|| s.title.Contains(splitString)));
}
finalIFilms.RemoveAll(i =>
{
int count = strings.Count(s => i.synopsis?.Contains(s) ?? false);
return count > 0 && count < strings.Length;
});
films = finalIFilms.ToList().AsQueryable();
}
return View(films);
}
I am getting the correct results when searching, but also getting too many other entries which are wrong. Using "The Revenge" example again. Entries which should not be returned but still are:
- A Revenge Movie
- The Incredible Movie
The Films model looks like this:
[Table("tblFilms")]
public class Films
{
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public int filmID { get; set; }
[Display(Name = "Films: ")]
public string title { get; set; }
[Display(Name = "Genre: ")]
public string genre { get; set; }
[Display(Name = "Synopsis: ")]
public string synopsis { get; set; }
Is there something obvious I am doing wrong when removing the undesired entries in the else if part of my controller? Or am I approaching the searching functionality wrong?
Something that might be worth noting is that most of the searching will be done on the 'Synopsis' field. Which tends to be hundreds of words long. There are also thousands of entries in tblFilms, if performance is a concern.
You can really simplify your logic using LINQ:
Proof of concept -
-saige-
return View((from film in db.films
let terms = !string.IsNullOrEmpty(searchString) ? searchString.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries) : new[] { "" }
where terms.All(term => film.Genre.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1) ||
terms.All(term => film.Synopsis.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1) ||
terms.All(term => film.Title.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1)
select film).ToList().AsQueryable()));
Proof of concept -
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace EE_Q29054800
{
class Program
{
static List<Film> films = new List<Film>()
{
new Film { ID = 0, Genre = "Fiction", Synopsis = "The greatest love story ever told", Title = "The Revenge" },
new Film { ID = 1, Genre = "Non-Fiction", Synopsis = "The coldest story ever told", Title = "A Revenge Movie" },
new Film { ID = 2, Genre = "Fiction", Synopsis = "The greatest story ever told", Title = "The Incredible Movie" },
new Film { ID = 3, Genre = "Science Fiction", Synopsis = "The funniest love story ever told", Title = "Revenge of the machines" },
new Film { ID = 4, Genre = "Non-Fiction", Synopsis = "The saddest love story ever told", Title = "The Horrible Revenge" },
new Film { ID = 5, Genre = "Fiction", Synopsis = "The greatest spy story ever told", Title = "Mr and Mrs Smith" },
new Film { ID = 6, Genre = "Non-Fiction", Synopsis = "The best documentary about smithing", Title = "The Dying Art" },
new Film { ID = 7, Genre = "Science Fiction", Synopsis = "The best android family in town", Title = "Meet The Smithsons" },
new Film { ID = 8, Genre = "Action", Synopsis = "He has a hammer, an anvil and a bad attitude", Title = "The Incredible Blacksmith" }
};
static void Main(string[] args)
{
films.FindMatches("The Revenge").ConvertToDataTable("The Revenge").PrintToConsole();
Console.WriteLine();
films.FindMatches("smith").ConvertToDataTable("Smith").PrintToConsole();
Console.WriteLine();
films.FindMatches(new[] { "story", "told" }).ConvertToDataTable("Story Told").PrintToConsole();
Console.ReadLine();
}
}
class Film
{
public int ID { get; set; }
public string Title { get; set; }
public string Genre { get; set; }
public string Synopsis { get; set; }
}
static class Extensions
{
public static DataTable ConvertToDataTable<T>(this IEnumerable<T> source, string name = "")
{
DataTable table = new DataTable(name);
var properties = TypeDescriptor.GetProperties(typeof(T));
foreach (PropertyDescriptor property in properties)
{
if (property.PropertyType.IsGenericType && property.PropertyType.GetGenericTypeDefinition().Equals(typeof(Nullable<>)))
table.Columns.Add(property.Name, property.PropertyType.GetGenericArguments()[0]);
else
table.Columns.Add(property.Name, property.PropertyType);
}
object[] values = new object[properties.Count];
foreach (var item in source)
{
for (int i = 0; i < properties.Count; i++)
values[i] = properties[i].GetValue(item);
table.Rows.Add(values);
}
return table;
}
public static IEnumerable<Film> FindMatches(this IEnumerable<Film> source, string term, char delimeter = ' ')
{
return FindMatches(source, term.Split(new[] { delimeter }, StringSplitOptions.RemoveEmptyEntries));
}
public static IEnumerable<Film> FindMatches(this IEnumerable<Film> source, IEnumerable<string> terms)
{
return (from film in source
where terms.All(term => film.Genre.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1) ||
terms.All(term => film.Synopsis.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1) ||
terms.All(term => film.Title.IndexOf(term, StringComparison.OrdinalIgnoreCase) > -1)
select film);
}
public static void PrintToConsole(this DataTable table)
{
var width = (25 * table.Columns.Count) + table.Columns.Count;
Console.WriteLine("Table Name: {0}", table.TableName);
width.DrawHorizontalSeperator('=');
Console.WriteLine("|{0}|", string.Join("|", table.Columns.Cast<DataColumn>().Select(x => string.Format(" {0} ", x.ColumnName).PadRight(25))));
width.DrawHorizontalSeperator('=');
foreach (var row in table.Rows.Cast<DataRow>())
Console.WriteLine("|{0}|", string.Join("|", row.ItemArray.Select(x => string.Format(" {0} ", x.ToString()).PadRight(25))));
width.DrawHorizontalSeperator('-');
}
static void DrawHorizontalSeperator(this int width, char seperator)
{
Console.WriteLine(new string(seperator, width));
}
}
}
Produces the following output --saige-
This question needs an answer!
Become an EE member today
7 DAY FREE TRIALMembers can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Open in new window
What happens here? You split your string, e.g., "The Revenge", to two strings, then you check if synopsis etc. contains "The". If yes - add to results. The you do same for "Revenge". These two results definitely contain one of search words. But not necessarily both! That's why you have
Really there are other similar inconsistencies in the algorithm.