Compare strings in python

I'm trying to compare two strings in python and output the index and character(s) that are different (the strings are actually two paths on the server)

I have a work-around that prints the letters as list items, which is still difficult to proof. I tried using a zip function to compare two lists but got an error (see code below)  
path1 = 'RichMP113-AIMIS_HourlyDiag.dat'
pathA = []
path2 = 'RichMP133-AIMIS_HourlyDiag.dat'
pathB = []

if path1 == path2:
    print 'paths are the same'
else:
    for element in path1:
        pathA.append(element)

    for item in path2:
        pathB.append(item)

# this is the painful part...

    print 'path A:', pathA[0:10]
    print 'path B:', pathB[0:10]
    print
    print 'path A:', pathA[10:20]
    print 'path B:', pathB[10:20]
    print
    print 'path A:', pathA[20:30]
    print 'path B:', pathB[20:30]

# I tried this:
    for a, b in zip(pathA, pathB):
        print 'compare A {0} to B {1}.'.format(a, b)

# but it gives the error 'string object has no attribute format'

Open in new window

sara_bellumAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

SuperdaveCommented:
Change the print to
        print 'compare A %s to B %s' % (a, b)

0
SuperdaveCommented:
I think that "format" is a Python 3 thing.
0
sara_bellumAuthor Commented:
Thanks a bunch!!

Is there a way to index the values of the 2 lists and selectively print out the index/values that don't match?
0
Exploring SharePoint 2016

Explore SharePoint 2016, the web-based, collaborative platform that integrates with Microsoft Office to provide intranets, secure document management, and collaboration so you can develop your online and offline capabilities.

SuperdaveCommented:
Do you mean comparing corresponding elements?  It looks like you are on the right track; keep a counter for the index and compare the characters:

    i=0
    for a, b in zip(pathA, pathB):
        if a<>b:
             print 'Difference at position %d: %s vs %s' % (i,a,b)
        i += 1

You could also use the strings as arguments for the zip:

    for a, b in zip(path1, path2):

That should give exactly the same results because strings are sequences too.  Then you don't need to bother making the lists.
0
peprCommented:
The .format() is available also in Python 2.6.  I do recommend to install it anyway -- if you do not have any special reason not to do so.  The version 2.6 is a transiotion version between Python 2 and Python 3.  It is a production version. You should not observe any problems with it.  The 2.7 is in the beta now and it will be the last Python 2.

Or you can use the older approach to formatting strings with the %s placeholders in the format, the % char after the template and the tuple with values after -- as Superdave has correctly shown.

You should not use <> this kind of operator as it is deprecated.  You should get used to to != operator.

Try the snippet below:
path1 = 'RichMP113-AIMIS_HourlyDiag'
path2 = 'RichMP133-AIMIS_HourlyDiag.dat'

lst = []     # for remembering positions with difference
if path1 == path2:
    print 'paths are the same'
else:
    for n, (a, b) in enumerate(zip(path1, path2)):
        if a != b:
            print 'Pos %3d  %s -- %s' % (n, a, b)
            lst.append(n)
            
if lst:
    L = max(len(path1), len(path2))  # the length of the longer one
    x = [' '] * L                    # the list of that much spaces
    for pos in lst:                  # replace the space if different here
        x[pos] = '^'    # you can use "more visible" char like | or @
    s = ''.join(x)                   # join to a single string
    
    print repr(path1)
    print repr(path2)
    print ' ' + s       # marker line; the space because repr adds single quotes

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
peprCommented:
There is no need to convert a string into a list of characters if you only need to access the characters.  The usage of zip() seems to be appropriate.  You can also wrap it into enumerate().  This way you get the position.  However, notice how you have to unfold the items into variables.  The parentheses are neccessary as the enumerate produces tuples with index and another tuple with two chars... like (0, ('R', 'R'))

This way, one level of unfolding is

n, t =  (0, ('R', 'R'))    # then n == 0 and t contains ('R', 'R')

To be precise, you should follow the structure of the item:

(n, (a, b)) = (0, ('R', 'R'))

However, the outer parentheses on the left can be removed.

n, (a, b) = (0, ('R', 'R'))


0
peprCommented:
Notice also that I intentionally changed the path1 to be shorter.  You should take into account that zip() ignores the rest of the longer string here.  Then you should also test if the lengths are the same.

You get the otput like this with the above snippet:

C:\tmp\___python\sara_bellum\Q_25880253>a.py
Pos   7  1 -- 3
'RichMP113-AIMIS_HourlyDiag'
'RichMP133-AIMIS_HourlyDiag.dat'
        ^

(The pointer is moved visually, because EE does use a proportional font and the space is more narrow.)

Another question is whether you do not want to compare the parts of paths instead of full path strings.  You can get the list of the parts using os.path.split(you_path_here).
0
SuperdaveCommented:
You can also do:

import difflib
for x in difflib.ndiff('RichMP113-AIMIS_HourlyDiag.dat',
                           'RichMP133-AIMIS_Hou rlyDiag.dat'):
      print x

which gives:
  R
  i
  c
  h
  M
  P
  1
- 1
+ 3
  3
  -
  A
  I
  M
  I
  S
  _
  H
  o
  u
  r
  l
  y
  D
  i
  a
  g
  .
  d
  a
  t
0
sara_bellumAuthor Commented:
This is brilliant, you people made my day yesterday!! Today I finally have a chance to work on it, sorry I'm slow :(  My only question is regarding pepr's script, this line:
> print 'Pos %3d  %s -- %s' % (n, a, b)
I don't see 3d defined anywhere in the script, so I figured it must be part of the enumerate function. However the python docs page I normally refer to shows no reference to 3d (http://docs.python.org/tutorial/datastructures.html)
0
SuperdaveCommented:
It's part of the format string, means 3 character-minimum decimal field (which it gets from n).
0
sara_bellumAuthor Commented:
ok I was just getting ready to post something like d = digit, wasn't sure what the 3 was so thanks for that...the output is the same whether I use 3 or not but no matter - I'm able to include the full path which was my original goal :)
0
sara_bellumAuthor Commented:
Terrific response, thanks very much!!
0
peprCommented:
If you already use the Python 2.6, you may easily solve the problem with comparison of the paths with different lenghts using the new itertools.izip_longest() function.  It fills the shorter sequence by None or by a given fill -- see http://docs.python.org/library/itertools.html?highlight=izip_longest#itertools.izip_longest

After minor update of the above code, you can get the output like that:

0
peprCommented:

c:\tmp\___python\sara_bellum\Q_25880253>c.py
Pos   7  1 -- 3
Pos  26  None -- .
Pos  27  None -- d
Pos  28  None -- a
Pos  29  None -- t
'RichMP113-AIMIS_HourlyDiag'
'RichMP133-AIMIS_HourlyDiag.dat'
        ^                  ^^^^

Open in new window

0
peprCommented:
The updated code looks like this...

Notice the first line with import itertools, and usage of itertools.izip_longest() in the for loop.  Otherwise, the code remained untouched.
import itertools

path1 = 'RichMP113-AIMIS_HourlyDiag'
path2 = 'RichMP133-AIMIS_HourlyDiag.dat'

lst = []     # for remembering positions with difference
if path1 == path2:
    print 'paths are the same'
else:
    for n, (a, b) in enumerate(itertools.izip_longest(path1, path2)):
        if a != b:
            print 'Pos %3d  %s -- %s' % (n, a, b)
            lst.append(n)
            
if lst:
    L = max(len(path1), len(path2))  # the length of the longer one
    x = [' '] * L                    # the list of that much spaces
    for pos in lst:                  # replace the space if different here
        x[pos] = '^'    # you can use "more visible" char like | or @
    s = ''.join(x)                   # join to a single string
    
    print repr(path1)
    print repr(path2)
    print ' ' + s       # marker line; the space because repr adds single quotes

Open in new window

0
sara_bellumAuthor Commented:
Wow, terrific feedback, thanks.  I don't have this option at work - our local server has Python 2.4 and 2.5 in /usr/lib, but we recently moved to offsite hosting to save money and that host only has Python 2.4. My work PC is shared and designed to replicate what we have on the server, so no help there. At home I have more flexibility, but that's on my own time. Will keep this on file for reference of course :-) Thanks much!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux Distributions

From novice to tech pro — start learning today.