Identifying difference between two text files (linux)

Hi,

I have two text files, A and B. What I'm after is a nice little command line utility that can output:

- What is in B that is not in A.
- What is in A that is not in B.
- What is common to both.

... as three separate simple text file reports. I'm not after a standard 'diff' output which has the "<" or the ">" type comments, just nice simple clear lists.

Anyone help?

Thanks,
B
bcopsAsked:
Who is Participating?
 
TintinConnect With a Mentor Commented:
comm -32 a b >in-a-not-in-b
comm -31 a b '>in-b-not-in-a
comm -21 a b >common-to-bot
0
 
ozoCommented:
if the files are sorted, you can use
comm
0
 
bcopsAuthor Commented:
Thanks, but I think it needs to be more sophisticated than that.

The data is collections of URL's. So, the files are of the format:

ColumnA: source URL
ColumnB: destination URL
Comma separated

So sorting doesn't necessarily help.
Thanks though.


0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
ozoCommented:
so what would in B that is not in A. and  in A that is not in B and  common to both mean?

would sorting help after replacing commas with newlines?
0
 
TintinCommented:
Just to clarify, files A and B have entries like the following?

http://example.com/page1.html,http://example.com/page2.html

0
 
bcopsAuthor Commented:

1) ozo:
so what would in B that is not in A. and  in A that is not in B and  common to both mean?
>> It would mean which lines, which pairings of source/destn url were in B and not A, etc

would sorting help after replacing commas with newlines?
>> Not really. It's the identification of different pairings I'm after.

2) Tintin: yup - your example is correct.

Thanks,
B

0
 
ozoCommented:
I'm still not clear on why you say that a comm of sorted files doesn't help.
Could you give examples of file A, file B, and the three separate simple text file reports that you would want to produce from them?
0
 
ozoCommented:
That was my first suggestion, but then it was said that it needs to be more sophisticated, and that comm would not help even if the files were sorted.
0
 
bcopsAuthor Commented:
hi ozo and tintin,

Tintin's latest suggestion does broadly seem to work, one or two items that slip through - so thanks.
Can't find any documentation to explain what -32, -31, and -21 do. What do they do?

ozo - sorry if this is what you meant. You weren't explicit enough for me though ....

B
0
 
ozoConnect With a Mentor Commented:
I assumed you had access to
man comm

NAME
     comm -- select or reject lines common to two files

SYNOPSIS
     comm [-123] file1 file2

DESCRIPTION
     The comm utility reads file1 and file2, which should be sorted lexically,
     and produces three text columns as output: lines only in file1; lines
     only in file2; and lines in both files.

     The filename ``-'' means the standard input.

     The following options are available:

     -1      Suppress printing of column 1.

     -2      Suppress printing of column 2.

     -3      Suppress printing of column 3.

But you were not explicit about why that wouldn't work or what you would need that was more sophisticated.t, or the one or two items that slip through
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.