Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Identifying difference between two text files (linux)

Posted on 2009-05-20
10
Medium Priority
?
929 Views
Last Modified: 2013-12-16
Hi,

I have two text files, A and B. What I'm after is a nice little command line utility that can output:

- What is in B that is not in A.
- What is in A that is not in B.
- What is common to both.

... as three separate simple text file reports. I'm not after a standard 'diff' output which has the "<" or the ">" type comments, just nice simple clear lists.

Anyone help?

Thanks,
B
0
Comment
Question by:bcops
  • 5
  • 3
  • 2
10 Comments
 
LVL 85

Expert Comment

by:ozo
ID: 24429507
if the files are sorted, you can use
comm
0
 

Author Comment

by:bcops
ID: 24429634
Thanks, but I think it needs to be more sophisticated than that.

The data is collections of URL's. So, the files are of the format:

ColumnA: source URL
ColumnB: destination URL
Comma separated

So sorting doesn't necessarily help.
Thanks though.


0
 
LVL 85

Expert Comment

by:ozo
ID: 24429680
so what would in B that is not in A. and  in A that is not in B and  common to both mean?

would sorting help after replacing commas with newlines?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 48

Expert Comment

by:Tintin
ID: 24429700
Just to clarify, files A and B have entries like the following?

http://example.com/page1.html,http://example.com/page2.html

0
 

Author Comment

by:bcops
ID: 24429706

1) ozo:
so what would in B that is not in A. and  in A that is not in B and  common to both mean?
>> It would mean which lines, which pairings of source/destn url were in B and not A, etc

would sorting help after replacing commas with newlines?
>> Not really. It's the identification of different pairings I'm after.

2) Tintin: yup - your example is correct.

Thanks,
B

0
 
LVL 85

Expert Comment

by:ozo
ID: 24429765
I'm still not clear on why you say that a comm of sorted files doesn't help.
Could you give examples of file A, file B, and the three separate simple text file reports that you would want to produce from them?
0
 
LVL 48

Accepted Solution

by:
Tintin earned 1400 total points
ID: 24429852
comm -32 a b >in-a-not-in-b
comm -31 a b '>in-b-not-in-a
comm -21 a b >common-to-bot
0
 
LVL 85

Expert Comment

by:ozo
ID: 24429895
That was my first suggestion, but then it was said that it needs to be more sophisticated, and that comm would not help even if the files were sorted.
0
 

Author Comment

by:bcops
ID: 24430196
hi ozo and tintin,

Tintin's latest suggestion does broadly seem to work, one or two items that slip through - so thanks.
Can't find any documentation to explain what -32, -31, and -21 do. What do they do?

ozo - sorry if this is what you meant. You weren't explicit enough for me though ....

B
0
 
LVL 85

Assisted Solution

by:ozo
ozo earned 600 total points
ID: 24430239
I assumed you had access to
man comm

NAME
     comm -- select or reject lines common to two files

SYNOPSIS
     comm [-123] file1 file2

DESCRIPTION
     The comm utility reads file1 and file2, which should be sorted lexically,
     and produces three text columns as output: lines only in file1; lines
     only in file2; and lines in both files.

     The filename ``-'' means the standard input.

     The following options are available:

     -1      Suppress printing of column 1.

     -2      Suppress printing of column 2.

     -3      Suppress printing of column 3.

But you were not explicit about why that wouldn't work or what you would need that was more sophisticated.t, or the one or two items that slip through
0

Featured Post

Free recovery tool for Microsoft Active Directory

Veeam Explorer for Microsoft Active Directory provides fast and reliable object-level recovery for Active Directory from a single-pass, agentless backup or storage snapshot — without the need to restore an entire virtual machine or use third-party tools.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses
Course of the Month14 days, 18 hours left to enroll

577 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question