Solved

Remove items from one list that exist in another list

Posted on 2014-03-27
6
434 Views
Last Modified: 2014-03-28
I have two lists of email addresses.

List A contains a comprehensive list of all the email addresses
List B contains a smaller subset of ListA's email addresses

I need to remove all of the items from List A that appear in List B.

So basically, I need to obtain List C which is all of the items in List A that do NOT appear in List B.

What's a simple way to do this? I only need to do it once, and the lists are small (2000 items each), so I'm open to pretty much anything.

I can do it in pretty much whatever tools you think would be easiest to use - Notepad, Excel, Notepad++, Bash script, Linux commands, PHP script, regular expressions, VB... whatever you like.
0
Comment
Question by:Frosty555
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 39

Accepted Solution

by:
nutsch earned 200 total points
ID: 39960689
put listA in column A of a worksheet, list B in column D of a workseet

in cell B1, put the following formula and copy it down
=countif(D:D,A1)>0

this will give you a true / false for matches in list B

you can sort and copy, or Data \ AUtofilter to either delete the trues, or copy the falses to a new list C.

Thomas
0
 
LVL 13

Assisted Solution

by:Carl Bohman
Carl Bohman earned 100 total points
ID: 39960688
Assuming your big list is called "a" and your small list is called "b", this set of commands should do it:

sort a > a.sorted
sort b > b.sorted
diff a.sorted b.sorted | grep "^<" | sed 's/^..//' > outputfile

Open in new window

0
 
LVL 48

Assisted Solution

by:Tintin
Tintin earned 100 total points
ID: 39960731
With a bash script, it's trival.

#!/bin/bash
grep -vf listb.txt lista.txt >listc.txt

Open in new window

0
Resolve Critical IT Incidents Fast

If your data, services or processes become compromised, your organization can suffer damage in just minutes and how fast you communicate during a major IT incident is everything. Learn how to immediately identify incidents & best practices to resolve them quickly and effectively.

 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 100 total points
ID: 39960790
Easy in Powershell too:
Compare-Object (Get-Content "X:\your\path\listA.txt") (Get-Content "X:\your\path\listB.txt") | Select-Object InputObject | Out-File "X:\your\path\listC.txt"

Open in new window

HTH,
Dan
0
 
LVL 8

Expert Comment

by:Naresh Patel
ID: 39960801
0
 
LVL 31

Author Comment

by:Frosty555
ID: 39962590
Tried out each of your answers and they all worked.

nutsch's answer with using Excel gives you the most visual cues that you really did do it right which was nice for a one-time operation and ultimately was the way I ended up doing it. It is N^2 complexity, though so beyond a few thousand rows you'll quickly run into performance issues. Worked nicely in this case, though.

Tintin's answer was definitely the simplest. However, you have to be careful because ListB.txt is now a collection of Grep patterns, not literal strings. I would have to escape all the "." characters in listb.txt for it to be completely correct. In this case, though, it appears to work.

The "sort" and the Powershell solutions appear to work too but admittedly I don't fully understand how it works, because I don't do much work in Powershell and the diff and sed commands are some of the few linux commands I still haven't wrapped my head around.
0

Featured Post

Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A little background as to how I came to I design this code: Around 5 years ago I designed an add-in that formatted Excel files to a corporate standard, applying different cell colours and font type depending on whether the cells contained inputs,…
Excel can be a tricky bit of software to get your head around. Whilst you’ll be able to eventually get to grips with the basic understanding of how to get by, there are a few Excel tips that not everybody will even know about let alone know how to d…
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

737 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question