Solved

Remove items from one list that exist in another list

Posted on 2014-03-27
6
424 Views
Last Modified: 2014-03-28
I have two lists of email addresses.

List A contains a comprehensive list of all the email addresses
List B contains a smaller subset of ListA's email addresses

I need to remove all of the items from List A that appear in List B.

So basically, I need to obtain List C which is all of the items in List A that do NOT appear in List B.

What's a simple way to do this? I only need to do it once, and the lists are small (2000 items each), so I'm open to pretty much anything.

I can do it in pretty much whatever tools you think would be easiest to use - Notepad, Excel, Notepad++, Bash script, Linux commands, PHP script, regular expressions, VB... whatever you like.
0
Comment
Question by:Frosty555
6 Comments
 
LVL 39

Accepted Solution

by:
nutsch earned 200 total points
ID: 39960689
put listA in column A of a worksheet, list B in column D of a workseet

in cell B1, put the following formula and copy it down
=countif(D:D,A1)>0

this will give you a true / false for matches in list B

you can sort and copy, or Data \ AUtofilter to either delete the trues, or copy the falses to a new list C.

Thomas
0
 
LVL 13

Assisted Solution

by:Carl Bohman
Carl Bohman earned 100 total points
ID: 39960688
Assuming your big list is called "a" and your small list is called "b", this set of commands should do it:

sort a > a.sorted
sort b > b.sorted
diff a.sorted b.sorted | grep "^<" | sed 's/^..//' > outputfile

Open in new window

0
 
LVL 48

Assisted Solution

by:Tintin
Tintin earned 100 total points
ID: 39960731
With a bash script, it's trival.

#!/bin/bash
grep -vf listb.txt lista.txt >listc.txt

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 100 total points
ID: 39960790
Easy in Powershell too:
Compare-Object (Get-Content "X:\your\path\listA.txt") (Get-Content "X:\your\path\listB.txt") | Select-Object InputObject | Out-File "X:\your\path\listC.txt"

Open in new window

HTH,
Dan
0
 
LVL 8

Expert Comment

by:itjockey
ID: 39960801
0
 
LVL 31

Author Comment

by:Frosty555
ID: 39962590
Tried out each of your answers and they all worked.

nutsch's answer with using Excel gives you the most visual cues that you really did do it right which was nice for a one-time operation and ultimately was the way I ended up doing it. It is N^2 complexity, though so beyond a few thousand rows you'll quickly run into performance issues. Worked nicely in this case, though.

Tintin's answer was definitely the simplest. However, you have to be careful because ListB.txt is now a collection of Grep patterns, not literal strings. I would have to escape all the "." characters in listb.txt for it to be completely correct. In this case, though, it appears to work.

The "sort" and the Powershell solutions appear to work too but admittedly I don't fully understand how it works, because I don't do much work in Powershell and the diff and sed commands are some of the few linux commands I still haven't wrapped my head around.
0

Featured Post

Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
Do you use a spreadsheet like Microsoft's Excel?  Have you ever wanted to link out to a non excel file on your computer or network drive?  This is the way I found to do it!
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question