comparing two pipe delimited files

Posted on 2012-09-18
Last Modified: 2012-10-04

  I was looking for a quick but reliable way to compare two flat files A and B. Both are pipe delimited. They have the same columns and hence same data format and ordering.
I am looking to see if file A differs from file B in any columns 1-199. Both have a total of 200 columns and the last column values are known to be different.

Is there a way to do this in unix?
Question by:LuckyLucks
    LVL 26

    Accepted Solution

    This should do it...
    cut -d \| -f 1-199 < file1 > /tmp/cut1
    cut -d \| -f 1-199 < file2 > /tmp/cut2
    diff -ub /tmp/cut1 /tmp/cut2

    Open in new window

    Alternately, it would be simple to write up a perl script (or likely awk) to do this.

    Author Comment

    >head -10 results.txt
    < |||||||||||||||||||||||||||||||||||
    < |-12.055556|0.090909|-0.818182|||||228.000000|0.000000|15.000000|1205.000000||0.655589|-2.217391|-1.565217|-6.141079||||||||||||||||||-51.000000|
    > |||||||||||||||||||||||||||||||||||1
    > 778|-12.055556|0.090909|-0.818182||||1|228.000000|0.000000|15.000000|1205.000000||0.655589|-2.217391|-1.565217|-6.141079||||||||||||||||||-51.000000|2
    < |||||||||||||||||||||||||||||||||||
    < |||||||||||||||||||||||||||||||||||

    WOuld you have an idea how to read this difference. I compared 98-133.
    LVL 26

    Expert Comment

    The top section (above ---) is from file 1 (indicated by the < at the start of the line) and the bottom section is from file 2 (indicated by the >).

    You could try "diff -cb" instead for a different output format you might like better.

    If you want to know specifically which fields differ, you'll have to use perl, awk, or some other programming language (sort of in the case of awk).

    Featured Post

    IT, Stop Being Called Into Every Meeting

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    Join & Write a Comment

    How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
    Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
    Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
    Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

    734 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now