Link to home
Start Free TrialLog in
Avatar of Iryna253
Iryna253

asked on

count degrees in the file - python

I have a file with the list of jobs in each line. each job is a string and it is a line. 5th element in it is degree. I need to count different types of degree in the file for each job. I have 5 different degrees in the file.
I wrote below: but it is not counting total for each job. degrees are repeat some jobs. can you help?

with open("100 Jobs - MedDeviceManuf.txt", 'r') as f:
    for job in f:
        degree = job.rstrip().split(',')[5]    
        types_degree = {}
        if degree in types_degree:
            types_degree[degree] += 1
        else:
            types_degree[degree] = 1
       
        print str(types_degree)
Avatar of pepr
pepr

Can you show few lines of the text file?
Avatar of Iryna253

ASKER

IT Business Analyst,Siemens,Hutchinson KS USA,NA,Full time,bachelors degree,2,SAP,Word,Excel,PowerPoint,Outlook,excellent oral & written communication skills,leadership,team player
Senior IT Business Analyst,Siemens,Tarrytown NY USA,NA,Full time,bachelors degree,5,excellent oral & written communication skills,presentation skills,Business process mapping,Business Requirements Analysis
Business Analyst,Fresenius Medical,Austin TX USA,NA,Full time,bachelors degree,3,analytical skills,organizational skills,excellent oral & written communication skills,SQL,access,business Intelligence software,goal oriented,independent
Put types_degree = {} above your "with" line.
Thank you, I did it, and it counted correct now. Now, I want to show this numbers on a pie chart.  I did it the manual way, is there a way of inserting output numbers into the pie chart, so I am not typing them into the code?

Output was {'Ph.D.': 10, 'bachelors degree': 73, 'masters degree': 11, 'associates degree': 1, 'NA': 5}

import matplotlib.pyplot as plt

types_degree = {}
with open("100 Jobs - MedDeviceManuf.txt", 'r') as f:
    for job in f:
        degree = job.rstrip().split(',')[5]
        if degree in types_degree:
            types_degree[degree] += 1
        else:
            types_degree[degree] = 1
print            
print str(types_degree)
       

labels = 'Ph.D', 'Bachelors Degree', 'Masters Degree', 'Associates Degree', 'N/A'
sizes = [10, 73, 11, 1, 5]
colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral', 'red']
explode = (0, 0.1, 0, 0, 0)

plt.pie(sizes, explode=explode, labels=labels, colors=colors,
        autopct='%1.1f%%', shadow=True, startangle=10)
plt.axis('equal')
You should put that in another question to get more eyeballs on it. I generally do server stuff with python, never touched pie charts.

:-)
oh ok, I will do. Can you help please with one more question for this .txt file? I am trying to count categories that I identified in each jobs into a dictionary called skillsets. I would like to count how many programming and database is in list of jobs I have in .txt file.

number = {}
with open("100 Jobs - MedDeviceManuf.txt", 'r') as f:
    for job in f:
        skillsets = { 'programming' : ['scripting language', 'r', 'python', 'C'] , 'database' : ['SQL', 'relational database']}
       
        for category in skillsets:
            category = skillsets.keys()
            if category in job.rstrip().split(',')[7:]:
                number[category] += 1
            else:
                number[category] = 1
print            
print str(number)
I don't understand your last question, can you please restate it or give me sample output of what you're getting now and tell me what's wrong with that?
Now I am getting an error:
  number[category] = 1

TypeError: unhashable type: 'list'

My output should be something like that:

programming : 3 out of 100 jobs
database: 5 out of 100 jobs

The code should do the following: got to the file, find job (each line), find field  [7] of the line, identify words (like 'scripting language' or 'C') in that field, and add them or identify them to keys/category (programming, data base), finally give me a number of those keys/category founded in all jobs.
First, if the data you presented above is a real sample, then "field 7" (the eighth spot) may or may not even be correct.

Assuming this is a CSV, you have comma separated values in that field.

Regardless, I just wouldn't try it this way. I would likely do this in two passes:

1. First pass: extract / index all the keywords that are in the file (Word, SAP, C, etc...)
2. Second pass: loop through each one to build the counts

This is really a job for a database. But, if you insist on doing it this way, try it with the two passes I described above. The first pass is required because we don't know what all  the unique terms are (or how they are categorized, really). You'll have to get all the uniques and then manually categorieze them into "database" or "Office work" or "programming".

Also, to be lazy, I would categorize each of these terms in separate files on the disk so that when the script loads, I can just load the dictionary from that file.

Then, the second pass will simply read each line in the csv, compare that field to the pre-defined dictionaries that you have already created by analyzing the uniques, and simply incrementing an integer counter.
I actually identified all unique skills into the categories that I called "programming" and "data base" and I added them to the dictionary called "skillsets". Now, I need to match those skills to the categories, which I stuck with. Can you recommend  a link where I can read about it please?
ASKER CERTIFIED SOLUTION
Avatar of DrDamnit
DrDamnit
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I have recommended this question be closed as follows:

Accept: DrDamnit (https:#a41327206)

If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.

suhasbharadwaj
Experts-Exchange Cleanup Volunteer