count degrees in the file - python

Iryna253
Iryna253 used Ask the Experts™
on
I have a file with the list of jobs in each line. each job is a string and it is a line. 5th element in it is degree. I need to count different types of degree in the file for each job. I have 5 different degrees in the file.
I wrote below: but it is not counting total for each job. degrees are repeat some jobs. can you help?

with open("100 Jobs - MedDeviceManuf.txt", 'r') as f:
    for job in f:
        degree = job.rstrip().split(',')[5]    
        types_degree = {}
        if degree in types_degree:
            types_degree[degree] += 1
        else:
            types_degree[degree] = 1
       
        print str(types_degree)
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Can you show few lines of the text file?

Author

Commented:
IT Business Analyst,Siemens,Hutchinson KS USA,NA,Full time,bachelors degree,2,SAP,Word,Excel,PowerPoint,Outlook,excellent oral & written communication skills,leadership,team player
Senior IT Business Analyst,Siemens,Tarrytown NY USA,NA,Full time,bachelors degree,5,excellent oral & written communication skills,presentation skills,Business process mapping,Business Requirements Analysis
Business Analyst,Fresenius Medical,Austin TX USA,NA,Full time,bachelors degree,3,analytical skills,organizational skills,excellent oral & written communication skills,SQL,access,business Intelligence software,goal oriented,independent
Most Valuable Expert 2012

Commented:
Put types_degree = {} above your "with" line.
OWASP: Threats Fundamentals

Learn the top ten threats that are present in modern web-application development and how to protect your business from them.

Author

Commented:
Thank you, I did it, and it counted correct now. Now, I want to show this numbers on a pie chart.  I did it the manual way, is there a way of inserting output numbers into the pie chart, so I am not typing them into the code?

Output was {'Ph.D.': 10, 'bachelors degree': 73, 'masters degree': 11, 'associates degree': 1, 'NA': 5}

import matplotlib.pyplot as plt

types_degree = {}
with open("100 Jobs - MedDeviceManuf.txt", 'r') as f:
    for job in f:
        degree = job.rstrip().split(',')[5]
        if degree in types_degree:
            types_degree[degree] += 1
        else:
            types_degree[degree] = 1
print            
print str(types_degree)
       

labels = 'Ph.D', 'Bachelors Degree', 'Masters Degree', 'Associates Degree', 'N/A'
sizes = [10, 73, 11, 1, 5]
colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral', 'red']
explode = (0, 0.1, 0, 0, 0)

plt.pie(sizes, explode=explode, labels=labels, colors=colors,
        autopct='%1.1f%%', shadow=True, startangle=10)
plt.axis('equal')
Most Valuable Expert 2012

Commented:
You should put that in another question to get more eyeballs on it. I generally do server stuff with python, never touched pie charts.

:-)

Author

Commented:
oh ok, I will do. Can you help please with one more question for this .txt file? I am trying to count categories that I identified in each jobs into a dictionary called skillsets. I would like to count how many programming and database is in list of jobs I have in .txt file.

number = {}
with open("100 Jobs - MedDeviceManuf.txt", 'r') as f:
    for job in f:
        skillsets = { 'programming' : ['scripting language', 'r', 'python', 'C'] , 'database' : ['SQL', 'relational database']}
       
        for category in skillsets:
            category = skillsets.keys()
            if category in job.rstrip().split(',')[7:]:
                number[category] += 1
            else:
                number[category] = 1
print            
print str(number)
Most Valuable Expert 2012

Commented:
I don't understand your last question, can you please restate it or give me sample output of what you're getting now and tell me what's wrong with that?

Author

Commented:
Now I am getting an error:
  number[category] = 1

TypeError: unhashable type: 'list'

My output should be something like that:

programming : 3 out of 100 jobs
database: 5 out of 100 jobs

The code should do the following: got to the file, find job (each line), find field  [7] of the line, identify words (like 'scripting language' or 'C') in that field, and add them or identify them to keys/category (programming, data base), finally give me a number of those keys/category founded in all jobs.
Most Valuable Expert 2012

Commented:
First, if the data you presented above is a real sample, then "field 7" (the eighth spot) may or may not even be correct.

Assuming this is a CSV, you have comma separated values in that field.

Regardless, I just wouldn't try it this way. I would likely do this in two passes:

1. First pass: extract / index all the keywords that are in the file (Word, SAP, C, etc...)
2. Second pass: loop through each one to build the counts

This is really a job for a database. But, if you insist on doing it this way, try it with the two passes I described above. The first pass is required because we don't know what all  the unique terms are (or how they are categorized, really). You'll have to get all the uniques and then manually categorieze them into "database" or "Office work" or "programming".

Also, to be lazy, I would categorize each of these terms in separate files on the disk so that when the script loads, I can just load the dictionary from that file.

Then, the second pass will simply read each line in the csv, compare that field to the pre-defined dictionaries that you have already created by analyzing the uniques, and simply incrementing an integer counter.

Author

Commented:
I actually identified all unique skills into the categories that I called "programming" and "data base" and I added them to the dictionary called "skillsets". Now, I need to match those skills to the categories, which I stuck with. Can you recommend  a link where I can read about it please?
Most Valuable Expert 2012
Commented:
No link required.

Don't use a single dictionary with keys. Use multiple dictionaries (one for each) and then just use integer counters.
Suhas .Senior QA Manager

Commented:
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I have recommended this question be closed as follows:

Accept: DrDamnit (https:#a41327206)

If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.

suhasbharadwaj
Experts-Exchange Cleanup Volunteer

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial