• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1910
  • Last Modified:

Compare strings in python

I'm trying to compare two strings in python and output the index and character(s) that are different (the strings are actually two paths on the server)

I have a work-around that prints the letters as list items, which is still difficult to proof. I tried using a zip function to compare two lists but got an error (see code below)  
path1 = 'RichMP113-AIMIS_HourlyDiag.dat'
pathA = []
path2 = 'RichMP133-AIMIS_HourlyDiag.dat'
pathB = []

if path1 == path2:
    print 'paths are the same'
else:
    for element in path1:
        pathA.append(element)

    for item in path2:
        pathB.append(item)

# this is the painful part...

    print 'path A:', pathA[0:10]
    print 'path B:', pathB[0:10]
    print
    print 'path A:', pathA[10:20]
    print 'path B:', pathB[10:20]
    print
    print 'path A:', pathA[20:30]
    print 'path B:', pathB[20:30]

# I tried this:
    for a, b in zip(pathA, pathB):
        print 'compare A {0} to B {1}.'.format(a, b)

# but it gives the error 'string object has no attribute format'

Open in new window

0
sara_bellum
Asked:
sara_bellum
  • 6
  • 5
  • 5
3 Solutions
 
SuperdaveCommented:
Change the print to
        print 'compare A %s to B %s' % (a, b)

0
 
SuperdaveCommented:
I think that "format" is a Python 3 thing.
0
 
sara_bellumAuthor Commented:
Thanks a bunch!!

Is there a way to index the values of the 2 lists and selectively print out the index/values that don't match?
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
SuperdaveCommented:
Do you mean comparing corresponding elements?  It looks like you are on the right track; keep a counter for the index and compare the characters:

    i=0
    for a, b in zip(pathA, pathB):
        if a<>b:
             print 'Difference at position %d: %s vs %s' % (i,a,b)
        i += 1

You could also use the strings as arguments for the zip:

    for a, b in zip(path1, path2):

That should give exactly the same results because strings are sequences too.  Then you don't need to bother making the lists.
0
 
peprCommented:
The .format() is available also in Python 2.6.  I do recommend to install it anyway -- if you do not have any special reason not to do so.  The version 2.6 is a transiotion version between Python 2 and Python 3.  It is a production version. You should not observe any problems with it.  The 2.7 is in the beta now and it will be the last Python 2.

Or you can use the older approach to formatting strings with the %s placeholders in the format, the % char after the template and the tuple with values after -- as Superdave has correctly shown.

You should not use <> this kind of operator as it is deprecated.  You should get used to to != operator.

Try the snippet below:
path1 = 'RichMP113-AIMIS_HourlyDiag'
path2 = 'RichMP133-AIMIS_HourlyDiag.dat'

lst = []     # for remembering positions with difference
if path1 == path2:
    print 'paths are the same'
else:
    for n, (a, b) in enumerate(zip(path1, path2)):
        if a != b:
            print 'Pos %3d  %s -- %s' % (n, a, b)
            lst.append(n)
            
if lst:
    L = max(len(path1), len(path2))  # the length of the longer one
    x = [' '] * L                    # the list of that much spaces
    for pos in lst:                  # replace the space if different here
        x[pos] = '^'    # you can use "more visible" char like | or @
    s = ''.join(x)                   # join to a single string
    
    print repr(path1)
    print repr(path2)
    print ' ' + s       # marker line; the space because repr adds single quotes

Open in new window

0
 
peprCommented:
There is no need to convert a string into a list of characters if you only need to access the characters.  The usage of zip() seems to be appropriate.  You can also wrap it into enumerate().  This way you get the position.  However, notice how you have to unfold the items into variables.  The parentheses are neccessary as the enumerate produces tuples with index and another tuple with two chars... like (0, ('R', 'R'))

This way, one level of unfolding is

n, t =  (0, ('R', 'R'))    # then n == 0 and t contains ('R', 'R')

To be precise, you should follow the structure of the item:

(n, (a, b)) = (0, ('R', 'R'))

However, the outer parentheses on the left can be removed.

n, (a, b) = (0, ('R', 'R'))


0
 
peprCommented:
Notice also that I intentionally changed the path1 to be shorter.  You should take into account that zip() ignores the rest of the longer string here.  Then you should also test if the lengths are the same.

You get the otput like this with the above snippet:

C:\tmp\___python\sara_bellum\Q_25880253>a.py
Pos   7  1 -- 3
'RichMP113-AIMIS_HourlyDiag'
'RichMP133-AIMIS_HourlyDiag.dat'
        ^

(The pointer is moved visually, because EE does use a proportional font and the space is more narrow.)

Another question is whether you do not want to compare the parts of paths instead of full path strings.  You can get the list of the parts using os.path.split(you_path_here).
0
 
SuperdaveCommented:
You can also do:

import difflib
for x in difflib.ndiff('RichMP113-AIMIS_HourlyDiag.dat',
                           'RichMP133-AIMIS_Hou rlyDiag.dat'):
      print x

which gives:
  R
  i
  c
  h
  M
  P
  1
- 1
+ 3
  3
  -
  A
  I
  M
  I
  S
  _
  H
  o
  u
  r
  l
  y
  D
  i
  a
  g
  .
  d
  a
  t
0
 
sara_bellumAuthor Commented:
This is brilliant, you people made my day yesterday!! Today I finally have a chance to work on it, sorry I'm slow :(  My only question is regarding pepr's script, this line:
> print 'Pos %3d  %s -- %s' % (n, a, b)
I don't see 3d defined anywhere in the script, so I figured it must be part of the enumerate function. However the python docs page I normally refer to shows no reference to 3d (http://docs.python.org/tutorial/datastructures.html)
0
 
SuperdaveCommented:
It's part of the format string, means 3 character-minimum decimal field (which it gets from n).
0
 
sara_bellumAuthor Commented:
ok I was just getting ready to post something like d = digit, wasn't sure what the 3 was so thanks for that...the output is the same whether I use 3 or not but no matter - I'm able to include the full path which was my original goal :)
0
 
sara_bellumAuthor Commented:
Terrific response, thanks very much!!
0
 
peprCommented:
If you already use the Python 2.6, you may easily solve the problem with comparison of the paths with different lenghts using the new itertools.izip_longest() function.  It fills the shorter sequence by None or by a given fill -- see http://docs.python.org/library/itertools.html?highlight=izip_longest#itertools.izip_longest

After minor update of the above code, you can get the output like that:

0
 
peprCommented:

c:\tmp\___python\sara_bellum\Q_25880253>c.py
Pos   7  1 -- 3
Pos  26  None -- .
Pos  27  None -- d
Pos  28  None -- a
Pos  29  None -- t
'RichMP113-AIMIS_HourlyDiag'
'RichMP133-AIMIS_HourlyDiag.dat'
        ^                  ^^^^

Open in new window

0
 
peprCommented:
The updated code looks like this...

Notice the first line with import itertools, and usage of itertools.izip_longest() in the for loop.  Otherwise, the code remained untouched.
import itertools

path1 = 'RichMP113-AIMIS_HourlyDiag'
path2 = 'RichMP133-AIMIS_HourlyDiag.dat'

lst = []     # for remembering positions with difference
if path1 == path2:
    print 'paths are the same'
else:
    for n, (a, b) in enumerate(itertools.izip_longest(path1, path2)):
        if a != b:
            print 'Pos %3d  %s -- %s' % (n, a, b)
            lst.append(n)
            
if lst:
    L = max(len(path1), len(path2))  # the length of the longer one
    x = [' '] * L                    # the list of that much spaces
    for pos in lst:                  # replace the space if different here
        x[pos] = '^'    # you can use "more visible" char like | or @
    s = ''.join(x)                   # join to a single string
    
    print repr(path1)
    print repr(path2)
    print ' ' + s       # marker line; the space because repr adds single quotes

Open in new window

0
 
sara_bellumAuthor Commented:
Wow, terrific feedback, thanks.  I don't have this option at work - our local server has Python 2.4 and 2.5 in /usr/lib, but we recently moved to offsite hosting to save money and that host only has Python 2.4. My work PC is shared and designed to replicate what we have on the server, so no help there. At home I have more flexibility, but that's on my own time. Will keep this on file for reference of course :-) Thanks much!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

  • 6
  • 5
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now