?
Solved

Remove items from one list that exist in another list

Posted on 2014-03-27
6
Medium Priority
?
487 Views
Last Modified: 2014-03-28
I have two lists of email addresses.

List A contains a comprehensive list of all the email addresses
List B contains a smaller subset of ListA's email addresses

I need to remove all of the items from List A that appear in List B.

So basically, I need to obtain List C which is all of the items in List A that do NOT appear in List B.

What's a simple way to do this? I only need to do it once, and the lists are small (2000 items each), so I'm open to pretty much anything.

I can do it in pretty much whatever tools you think would be easiest to use - Notepad, Excel, Notepad++, Bash script, Linux commands, PHP script, regular expressions, VB... whatever you like.
0
Comment
Question by:Frosty555
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 39

Accepted Solution

by:
nutsch earned 800 total points
ID: 39960689
put listA in column A of a worksheet, list B in column D of a workseet

in cell B1, put the following formula and copy it down
=countif(D:D,A1)>0

this will give you a true / false for matches in list B

you can sort and copy, or Data \ AUtofilter to either delete the trues, or copy the falses to a new list C.

Thomas
0
 
LVL 13

Assisted Solution

by:Carl Bohman
Carl Bohman earned 400 total points
ID: 39960688
Assuming your big list is called "a" and your small list is called "b", this set of commands should do it:

sort a > a.sorted
sort b > b.sorted
diff a.sorted b.sorted | grep "^<" | sed 's/^..//' > outputfile

Open in new window

0
 
LVL 48

Assisted Solution

by:Tintin
Tintin earned 400 total points
ID: 39960731
With a bash script, it's trival.

#!/bin/bash
grep -vf listb.txt lista.txt >listc.txt

Open in new window

0
Not sure which OpenStack Certification to get?

So you’ve realized you might want to get certified in OpenStack, but you’re not sure what the benefits might be or even which one you should take. You know there are several certification courses you can choose from, but how do you know which one is right for you?

 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 400 total points
ID: 39960790
Easy in Powershell too:
Compare-Object (Get-Content "X:\your\path\listA.txt") (Get-Content "X:\your\path\listB.txt") | Select-Object InputObject | Out-File "X:\your\path\listC.txt"

Open in new window

HTH,
Dan
0
 
LVL 8

Expert Comment

by:Naresh Patel
ID: 39960801
0
 
LVL 31

Author Comment

by:Frosty555
ID: 39962590
Tried out each of your answers and they all worked.

nutsch's answer with using Excel gives you the most visual cues that you really did do it right which was nice for a one-time operation and ultimately was the way I ended up doing it. It is N^2 complexity, though so beyond a few thousand rows you'll quickly run into performance issues. Worked nicely in this case, though.

Tintin's answer was definitely the simplest. However, you have to be careful because ListB.txt is now a collection of Grep patterns, not literal strings. I would have to escape all the "." characters in listb.txt for it to be completely correct. In this case, though, it appears to work.

The "sort" and the Powershell solutions appear to work too but admittedly I don't fully understand how it works, because I don't do much work in Powershell and the diff and sed commands are some of the few linux commands I still haven't wrapped my head around.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
The Windows functions GetTickCount and timeGetTime retrieve the number of milliseconds since the system was started. However, the value is stored in a DWORD, which means that it wraps around to zero every 49.7 days. This article shows how to solve t…
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question