count of repetitive consecutive two words from a text file.

How do I count repetitive consecutive two words from a text file.

input.txt file contains

backend error oracle error insufficient space
oracle error
insufficient space insufficient space
complete order etc

output should be
backend error count 1
oracle error count  2
insufficient space count 3
complete order 1
etc 1
Thirupathi LagishettiAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

gelonidaCommented:
Is this example representative enough?

Do you look for predefined word pairs, which you know prior to parsing the text
or do you always group
the first/second third/fourth fifth/6th word of a  line as pairs.

What to do with punctuation characters. can they occur, shall they be stripped off, . . .

This all might have an impact on the best implementation for a robust solution in your context.
0
aikimarkCommented:
what about
error oracle
error insufficient
etc ?
0
aikimarkCommented:
import re
import collections

text = """backend error oracle error insufficient space
oracle error
insufficient space insufficient space
complete order"""

print collections.Counter(re.findall(r'\b(\w+\s+\w+)\b', text))

Open in new window

produces the following output:
Counter({'insufficient space': 3, 'oracle error': 2, 'complete order': 1, 'backend error': 1})

Open in new window

If this is sufficient, all you need to do is replace the string literal with a file read
Hint: With  As
0
Angular Fundamentals

Learn the fundamentals of Angular 2, a JavaScript framework for developing dynamic single page applications.

aikimarkCommented:
...like this
with open('c:\users\mark\downloads\Q_29099534.txt') as f:
    text = f.read(-1)
    print collections.Counter(re.findall(r'\b(\w+\s+\w+)\b', text))

Open in new window

0
Thirupathi LagishettiAuthor Commented:
Hi @aikimark
Thank you for your reply, Below content is also needed, could you pls update the script or give me the hint to achieve the result.

error oracle
error insufficient
0
gelonidaCommented:
Just a small comment:

instead of
with open('c:\users\mark\downloads\Q_29099534.txt')

Open in new window


It's better to write one of these
with open('c:\\users\\mark\\downloads\\Q_29099534.txt')

Open in new window

or
with open(r'c:\users\mark\downloads\Q_29099534.txt')

Open in new window

or
with open('c:/users/mark/downloads/Q_29099534.txt')

Open in new window


in your specific case there's  no issue, but if you had

with open('c:\users\tom\new_downloads\Q_29099534.txt')

Open in new window

then \n and \t would have caused issues as they would have been interpreted as newline character or tab character.
So out of habit it's best do escape all backslashes
0
aikimarkCommented:
import re
import collections

with open('c:\users\mark\downloads\Q_29099534.txt') as f:
    text = f.read(-1)
    text += ' ' + ' '.join(text.split(' ')[1:])
    print collections.Counter(re.findall(r'\b(\w+\s+\w+)\b', text))

Open in new window

produces
Counter({'insufficient space': 3, 'oracle error': 2, 'error insufficient': 1, 'complete order': 1, 'space\ncomplete': 1, 'space\noracle': 1, 'space insufficient': 1, 'backend error': 1, 'error\ninsufficient': 1, 'error oracle': 1}

Open in new window

0
aikimarkCommented:
Since the text contains multiple lines, this tweak will convert to purely space-separated tuples:
import re
import collections

with open(r'c:\users\mark\downloads\Q_29099534.txt') as f:
    text = f.read(-1)
    text += ' ' + ' '.join(text.split(' ')[1:])
    text = re.sub(r'\r\n', ' ', text)
    text = re.sub(r'\n', ' ', text)
    print collections.Counter(re.findall(r'\b(\w+\s+\w+)\b', text))

Open in new window

0
aikimarkCommented:
Maybe using the split() method is more Pythonic.
import re
import collections

with open(r'c:\users\mark\downloads\Q_29099534.txt') as f:
    text = f.read(-1)
    text += ' ' + text.split(' ',1)[1]
    text = re.sub(r'\r\n', ' ', text)
    text = re.sub(r'\n', ' ', text)
    print collections.Counter(re.findall(r'\b(\w+\s+\w+)\b', text))

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Thirupathi LagishettiAuthor Commented:
Thank you so much for your help @aikimark, its really solved my problem.cheers!!!
0
aikimarkCommented:
Someone pointed out that I should use something other than a space character to join these two version of the text string.
import re
import collections

with open(r'c:\users\mark\downloads\Q_29099534.txt') as f:
    text = f.read(-1)
    text += '%%' + text.split(' ',1)[1]
    text = re.sub(r'\r\n', ' ', text)
    text = re.sub(r'\n', ' ', text)
    print collections.Counter(re.findall(r'\b(\w+\s+\w+)\b', text))

Open in new window

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Python

From novice to tech pro — start learning today.