We help IT Professionals succeed at work.

Python script to read input from a Excel spreadsheet and output random results

wayy2be
wayy2be asked
on
I am trying to learn Python and I have a project at work where we need to have a script read numbers from a spread sheet and then print out the numbers with the highest occuring frequency and then use those numbers in a random output of 10 number combinations (using a mix of the highest occuring numbers and other numbers in the spreadsheet). How can I accomplish this in Python? Preferable the output should be in html format. Any help would be great. Thanks.
Comment
Watch Question

CERTIFIED EXPERT

Commented:
The following links describe how to read from Excel spreadsheet using Python.
http://www.simplistix.co.uk/presentations/python-excel.pdf
http://scienceoss.com/read-excel-files-from-python/


If you need to learn Python quickly, please see Bucky's tutorials on youtube

Commented:
About the excel parser, another option, I think, will be ADO, if you are using windows.
http://sourceforge.net/projects/adodbapi/
http://starship.python.net/crew/mhammond/win32/Downloads.html

About the html, you need to study how to generate the format for your own needs.
http://en.wikipedia.org/wiki/HTML
http://www.w3schools.com/html/

Author

Commented:
I know html, what I am not sure about is python generating the report in html. What about the randomizing?  I assume there is a function for this? Please provide samples.

Commented:
1. if you are generating html for cgi usage,
you can use python cgi:
http://docs.python.org/library/cgi.html
2. about the randomization, use random:
random.random() will give a float 0~1.

#!python
import random
random.seed(None)
ll = [1,2,3,4,5,6,7,8,9,0]
for i in range(100):
    r = random.random()
    print ll[int(10*r)]

Open in new window


randtest.py

Author

Commented:
Thanks all. Can someone please post a working script that will accomplish the task? I am a newbie and this way I can refer back. Thanks

Commented:
HI wayy2be,
You'd better do it by yourself.

Author

Commented:
And I will but I would like to see a working example.
CERTIFIED EXPERT

Commented:
I read your question already many days ago and just ignored it as it was way to fuzzy.
Helping with fuzzy question is frustration for the person who asks and frustrationg for the experts who try to help, just to find out, that the help they provided was not what was really needed.

With what are you still stuck with?

- reading numbers from an excel file?
- information analysis of the retrieved results>
- randomization of the numbers?
- generating html? (can be as simple as print statements)?

The problem is, that you want us to write all the code for you.
If this is really your approach, that you prefer, then I propose following:

Split the question into smaller questions and be sure, that each question is well defined.
If you put too much into one question, then the risk is, that nobody wants to reply.


The questions could be.

 1.) How can I extract data from an Excel file. then please provide us with an example excel file and tell us what kind of cells you want to extract.

2.) how to generate html? show is the python variables lists / dicts / whatever and the HTML file, that you would like to have as resuilt.

3.) how to do sorting / randomsation / etc. of numbers. Show us the data that you have at the beginning and an example result, that you'd like to achieve.
.
Are you running on windows or on Linux.?Do you have either Excel or Openoffice running on the machine, where you want to run the script.

Do you want to run the script from the command line or do you want to run it behind a web server?


Wouldn't it be possible to just export the excel sheet into a CSV file a (you could even do it with an Excel macro) and then you could parse the csv file with the python csv module, which is very simple to use.

In my opinion the code would be easier to understand for a newbie.
I am rather sure, that if you clarify your question you will have a good and helpful answer rather quickly.


Author

Commented:
Thank you for your reply. I am not sure why you interpreted it as fuzzy but I will clarify. I am looking for sample code that will read numbers from a spreadsheet, list the numbers in order of the most frequently apearing numbers and then sort those numbers randomly in groups of 10. As for the html part I was looking for it to print this information out neatly, but it can print to screen or file as well. I am running Linux and Open Office, there is no web server involved.

I am not looking for someone to write the script for me, I would like to see sample code so that I can learn and put this together. I have never been told from anyone on EE to "do it myself", rather surprising.
CERTIFIED EXPERT

Commented:
Thanks for the clarification.

Is it important, that you parse the Excel file or could you just parse an exported CSV file.

As I wrote  in my previous message
- parsing CSV is simpler and the code will also be easier to understand and to manipulate.

If it is important to parse the Excel file directly, then there are multiple options (faster / slower , more / less generic)
There's quite some libraries, which can read excel files (depending on the version of the excel document, which would therefore be important to know. Ideally you would post a small sample file in the format)

Another option would be to write a python script, which starts openoffice, imports the Excel file and fetches the required contents.
This is slow, but you'd be sure, that you can read all Excel formats, that Openoffice can read)

Now the number processing part:

You want to great a list / dict, which tells you which (and how often) numbers occur in a spreadheet, right?

I'll send you a small example in a minute

CERTIFIED EXPERT
Commented:

This is the answer to part 3 of your question:
- how get a list of numbers sorted by the frequency of their appearance
- how to shuffle a list in groups of 10
#!/usr/bin/env python

import random

def read_numbers_from_ascii_file(fname):
    """ generator reading numbers from an ascii file.
        any words or nun number strings will be ignored
    """
    with open(fname) as fin:
        for line in fin:
            numbers = line.strip().split()
            for number in numbers:
                try:
                    value = float(number)
                    print 'read value %f' % value
                    yield value
                except ValueError:
                    pass # here you cold report errors or abort
        
def mk_num_stat(number_it):
    """ fetches number from an iterable
        The iterable can be a list, a typle,
        or any other iterable object, like for example
        the result of read_numbers_from_ascii_file()
        
        The return value is a dict with the numbers as  
        keys and the count as value
    """
    stat = {}
    for num in number_it:
        stat[num] = stat.get(num, 0) + 1
    return stat


def mk_sorted_number_list(num_stat):
    """  just returns a list of numbers ordered by their frequency """
    # as typles will be a list of tuples
    # the first item will be the number, the scond its frequency
    as_tuples = num_stat.items()
    
    # sort tuple by frequency in reverse order
    # Do you have any criterea how you want to sort if the frequency
    # is identical (by size of the number?, you don't care??
    as_tuples.sort(key=lambda val: val[1], reverse=True)
    # return a list with only the values 
    return [ v[0] for v in as_tuples ]

def shuffle_in_groups_of(lst, grp_len):
    """ will traverse a list in groups of grp_len
        and shuffle each sub group
    """
    shuffled = [] # will store the result
    for pos in xrange(0, len(lst), grp_len):
        sublist = lst[pos:pos+grp_len] 
        random.shuffle(sublist)
        shuffled.extend(sublist)
    return shuffled

def main():
    num_gen = read_numbers_from_ascii_file('test.txt')
    num_stat = mk_num_stat(num_gen)
    print 'statistic',num_stat

    num_freq_sorted = mk_sorted_number_list(num_stat)
    print "sorted"
    print num_freq_sorted

    shuffled = shuffle_in_groups_of(num_freq_sorted, 3)
    print 'shuffled'
    print shuffled


if __name__ == '__main__':
    main()

Open in new window

CERTIFIED EXPERT

Commented:
I still think, you would have gotten more replies if you had split up your question into small modular questions, which aren't really that related. (providing some general context however is not a bad idea)

Let me try to explain why:

- Different experts have different knowledge. Any experts which knew about generating html or
about shuffling data would have ignored the question due to the fact, that Excel was part of the question.

- Most experts don't earn any money answering questions (though they earn points). Many of the experts do have real work and limited time. So the shorter the question the higher the probability, that you get an answer.

- Shorter questions are easier to find in the expert data base

- Shorter questions get shorter answers and the answers are therefore mostly easier to understand.

- With shorter question it's easier to focus and you might therefore get more
feedback about multiple ways/suggestions to attack the problem.

I understand, that you were surprised by the answer 'You'd better do it by yourself.'

I guess some experts (at least I) were surprised similiarly by your request
 'Can someone please post a working script that will accomplish the task'
for a question of this complexity.

Please understand as well, that the likeliness of a good answer is much higher if you provide
example input and expected result for sub tasks

- some users (like me for example) might not even have a valid Excel file available and could therefore not even verify whether the code is working. (Especially as Excel version might play a role)

Without checking whether the result is the expected one they don't know either whether the script (or the sub script) is working.



Author

Commented:
I understand what you are saying. I have been a member of EE for a long time and I have always had a great experience. I think it is a valuable resource. I am fully aware that the experts do not receive payment for their answers and only points. I think my question was coherent and short, limited to one paragraph. Many questions on EE are far longer and complex than mine. If you look at my question I never asked for a working script by any means, all I wanted is to know how to do this, what resources to use, code examples if possible.

There are many high quality professional experts on EE, like yourself, that make EE what it is, therefore enabling EE to exist. However receiving a rude answer that "I better do it myself" is in poor taste and has no place in any post on EE. In any event thank you for your assistance and for taking the time to provide assistance and an example for this question.