Link to home
Start Free TrialLog in
Avatar of bcops
bcops

asked on

Identifying difference between two text files (linux)

Hi,

I have two text files, A and B. What I'm after is a nice little command line utility that can output:

- What is in B that is not in A.
- What is in A that is not in B.
- What is common to both.

... as three separate simple text file reports. I'm not after a standard 'diff' output which has the "<" or the ">" type comments, just nice simple clear lists.

Anyone help?

Thanks,
B
Avatar of ozo
ozo
Flag of United States of America image

if the files are sorted, you can use
comm
Avatar of bcops
bcops

ASKER

Thanks, but I think it needs to be more sophisticated than that.

The data is collections of URL's. So, the files are of the format:

ColumnA: source URL
ColumnB: destination URL
Comma separated

So sorting doesn't necessarily help.
Thanks though.


so what would in B that is not in A. and  in A that is not in B and  common to both mean?

would sorting help after replacing commas with newlines?
Just to clarify, files A and B have entries like the following?

http://example.com/page1.html,http://example.com/page2.html

Avatar of bcops

ASKER


1) ozo:
so what would in B that is not in A. and  in A that is not in B and  common to both mean?
>> It would mean which lines, which pairings of source/destn url were in B and not A, etc

would sorting help after replacing commas with newlines?
>> Not really. It's the identification of different pairings I'm after.

2) Tintin: yup - your example is correct.

Thanks,
B

I'm still not clear on why you say that a comm of sorted files doesn't help.
Could you give examples of file A, file B, and the three separate simple text file reports that you would want to produce from them?
ASKER CERTIFIED SOLUTION
Avatar of Tintin
Tintin

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That was my first suggestion, but then it was said that it needs to be more sophisticated, and that comm would not help even if the files were sorted.
Avatar of bcops

ASKER

hi ozo and tintin,

Tintin's latest suggestion does broadly seem to work, one or two items that slip through - so thanks.
Can't find any documentation to explain what -32, -31, and -21 do. What do they do?

ozo - sorry if this is what you meant. You weren't explicit enough for me though ....

B
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial