Avatar of Iryna253
Iryna253

asked on 

count degrees in the file - python

I have a file with the list of jobs in each line. each job is a string and it is a line. 5th element in it is degree. I need to count different types of degree in the file for each job. I have 5 different degrees in the file.
I wrote below: but it is not counting total for each job. degrees are repeat some jobs. can you help?

with open("100 Jobs - MedDeviceManuf.txt", 'r') as f:
    for job in f:
        degree = job.rstrip().split(',')[5]    
        types_degree = {}
        if degree in types_degree:
            types_degree[degree] += 1
        else:
            types_degree[degree] = 1
       
        print str(types_degree)
Python

Avatar of undefined
Last Comment
Suhas .
Avatar of pepr
pepr

Can you show few lines of the text file?
Avatar of Iryna253
Iryna253

ASKER

IT Business Analyst,Siemens,Hutchinson KS USA,NA,Full time,bachelors degree,2,SAP,Word,Excel,PowerPoint,Outlook,excellent oral & written communication skills,leadership,team player
Senior IT Business Analyst,Siemens,Tarrytown NY USA,NA,Full time,bachelors degree,5,excellent oral & written communication skills,presentation skills,Business process mapping,Business Requirements Analysis
Business Analyst,Fresenius Medical,Austin TX USA,NA,Full time,bachelors degree,3,analytical skills,organizational skills,excellent oral & written communication skills,SQL,access,business Intelligence software,goal oriented,independent
Avatar of DrDamnit
DrDamnit
Flag of United States of America image

Put types_degree = {} above your "with" line.
Avatar of Iryna253
Iryna253

ASKER

Thank you, I did it, and it counted correct now. Now, I want to show this numbers on a pie chart.  I did it the manual way, is there a way of inserting output numbers into the pie chart, so I am not typing them into the code?

Output was {'Ph.D.': 10, 'bachelors degree': 73, 'masters degree': 11, 'associates degree': 1, 'NA': 5}

import matplotlib.pyplot as plt

types_degree = {}
with open("100 Jobs - MedDeviceManuf.txt", 'r') as f:
    for job in f:
        degree = job.rstrip().split(',')[5]
        if degree in types_degree:
            types_degree[degree] += 1
        else:
            types_degree[degree] = 1
print            
print str(types_degree)
       

labels = 'Ph.D', 'Bachelors Degree', 'Masters Degree', 'Associates Degree', 'N/A'
sizes = [10, 73, 11, 1, 5]
colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral', 'red']
explode = (0, 0.1, 0, 0, 0)

plt.pie(sizes, explode=explode, labels=labels, colors=colors,
        autopct='%1.1f%%', shadow=True, startangle=10)
plt.axis('equal')
Avatar of DrDamnit
DrDamnit
Flag of United States of America image

You should put that in another question to get more eyeballs on it. I generally do server stuff with python, never touched pie charts.

:-)
Avatar of Iryna253
Iryna253

ASKER

oh ok, I will do. Can you help please with one more question for this .txt file? I am trying to count categories that I identified in each jobs into a dictionary called skillsets. I would like to count how many programming and database is in list of jobs I have in .txt file.

number = {}
with open("100 Jobs - MedDeviceManuf.txt", 'r') as f:
    for job in f:
        skillsets = { 'programming' : ['scripting language', 'r', 'python', 'C'] , 'database' : ['SQL', 'relational database']}
       
        for category in skillsets:
            category = skillsets.keys()
            if category in job.rstrip().split(',')[7:]:
                number[category] += 1
            else:
                number[category] = 1
print            
print str(number)
Avatar of DrDamnit
DrDamnit
Flag of United States of America image

I don't understand your last question, can you please restate it or give me sample output of what you're getting now and tell me what's wrong with that?
Avatar of Iryna253
Iryna253

ASKER

Now I am getting an error:
  number[category] = 1

TypeError: unhashable type: 'list'

My output should be something like that:

programming : 3 out of 100 jobs
database: 5 out of 100 jobs

The code should do the following: got to the file, find job (each line), find field  [7] of the line, identify words (like 'scripting language' or 'C') in that field, and add them or identify them to keys/category (programming, data base), finally give me a number of those keys/category founded in all jobs.
Avatar of DrDamnit
DrDamnit
Flag of United States of America image

First, if the data you presented above is a real sample, then "field 7" (the eighth spot) may or may not even be correct.

Assuming this is a CSV, you have comma separated values in that field.

Regardless, I just wouldn't try it this way. I would likely do this in two passes:

1. First pass: extract / index all the keywords that are in the file (Word, SAP, C, etc...)
2. Second pass: loop through each one to build the counts

This is really a job for a database. But, if you insist on doing it this way, try it with the two passes I described above. The first pass is required because we don't know what all  the unique terms are (or how they are categorized, really). You'll have to get all the uniques and then manually categorieze them into "database" or "Office work" or "programming".

Also, to be lazy, I would categorize each of these terms in separate files on the disk so that when the script loads, I can just load the dictionary from that file.

Then, the second pass will simply read each line in the csv, compare that field to the pre-defined dictionaries that you have already created by analyzing the uniques, and simply incrementing an integer counter.
Avatar of Iryna253
Iryna253

ASKER

I actually identified all unique skills into the categories that I called "programming" and "data base" and I added them to the dictionary called "skillsets". Now, I need to match those skills to the categories, which I stuck with. Can you recommend  a link where I can read about it please?
ASKER CERTIFIED SOLUTION
Avatar of DrDamnit
DrDamnit
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of Suhas .
Suhas .
Flag of United States of America image

No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I have recommended this question be closed as follows:

Accept: DrDamnit (https:#a41327206)

If you feel this question should be closed differently, post an objection and the moderators will review all objections and close it as they feel fit. If no one objects, this question will be closed automatically the way described above.

suhasbharadwaj
Experts-Exchange Cleanup Volunteer
Python
Python

Python is a widely used general-purpose, high-level programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in other languages. Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles. It features a dynamic type system and automatic memory management and has a large and comprehensive set of standard libraries, including NumPy, SciPy, Django, PyQuery, and PyLibrary.

6K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo