Solved

filter out unwanted rows

Posted on 2012-03-20
12
314 Views
Last Modified: 2012-03-21
Hello
Trying to remove unwanted rows from txt file.
eg

import csv

data = open("X:/r_dhwm.csv")

ma=[]
r = csv.DictReader(data,['Date','TEST','TEST2','TEST3'])
for a in r:
   
    if "{TEST}" == 'TEST':
        break
    else:
        ma.append("{TEST}".format(**a))


print(ma)
data.close()



RESULT =

['TEST', '', '', '', '', '', '', '', '-0.000412682', '-0.000393387']

What I want is

['-0.000412682', '-0.000393387']
0
Comment
Question by:philsivyer
  • 6
  • 6
12 Comments
 
LVL 28

Expert Comment

by:pepr
ID: 37743050
Can you post a fragment of your csv file?  Can you comment on what data must be ignored?

The csv.DictReader() may be overkill for the purpose.  Anyway, the file must be opened for reading in binary mode.  Also the data is not a good identifier for the file object.

What version of Python do you use?  If it is new enough, you should prefer the with construct (the automatic file object close is done).

The following code
    if "{TEST}" == 'TEST':
        break
    else:
        ma.append("{TEST}".format(**a))

Open in new window


... is equivalent to

    ma.append("{TEST}".format(**a))

Open in new window


as the "{TEST}" == 'TEST' never holds.
0
 

Author Comment

by:philsivyer
ID: 37743119
OK
I have attached a csv file - I want to be able to pick any column and return data to an array but with no headers (as in row 1 of csv file) or any cells that are null or blank (no values).
So, if I want to return data into array from column "TEST1" as in the attached my firsdt value world be: -0.011027536
Regards
0
 
LVL 28

Expert Comment

by:pepr
ID: 37743234
I can see no attachment.
0
 

Author Comment

by:philsivyer
ID: 37743288
Sorry
ee.csv
0
 
LVL 28

Expert Comment

by:pepr
ID: 37744866
Try the following script:

a.py
import csv

def columnNonEmptyValues(csvFileName, columnName):
    with open(csvFileName, 'rb') as f:
         for d in csv.DictReader(f):
             value = d.get(columnName, '')  # default if not present
             if value:
                 yield value


if __name__ == '__main__':

    # Now you can iterate through the values of the chosen column.
    for value in columnNonEmptyValues('ee.csv', 'TEST1'):
        print value

    # Or you can pass the iterator to the list constructor.
    lst = list(columnNonEmptyValues('ee.csv', 'TEST1'))
    print '-' * 70
    print lst

Open in new window


It prints on my console:

c:\tmp\_Python\philsivyer\Q_27640149>python a.py
-0.011027536
-0.004086121
-0.012063901
-0.010020955
-0.002050665
-0.004273241
-0.009383166
-0.013938726
...
-0.055930797
-0.066374625
-0.070332921
-0.05798936
----------------------------------------------------------------------
['-0.011027536', '-0.004086121', '-0.012063901', '-0.010020955', '-0.002050665',
 '-0.004273241', '-0.009383166', '-0.013938726', '-0.03455181', '-0.045187455',
'-0.045987727', '-0.043084829', '-0.040607066', '-0.041589412', '-0.031441661',
'-0.032485216', '-0.015742733', '-0.01418973', '-0.012148488', '-0.010371504', '
-0.008392192', '-0.009299116', '-0.009473792', '-0.00851022', '-0.009960434', '-
0.009133718', '-0.005845811', '-0.005267109', '-0.0102875', '-0.014345424', '-0.
010280687', '-0.010643898', '-0.008002167', '-0.007472288', '-0.006297341', '-0.
017303705', '-0.021429973', '-0.013058342', '-0.002337027', '0.002957021', '-0.0
04385036', '-0.010007584', '-0.019457466', '-0.048419451', '-0.073353606', '-0.0
55930797', '-0.066374625', '-0.070332921', '-0.05798936']

Open in new window


If your Python does not support the with construct, try the older way:

...
def columnNonEmptyValues(csvFileName, columnName):
    f = open(csvFileName, 'rb')
    for d in csv.DictReader(f):
        value = d.get(columnName, '')  # default if not present
        if value:
            yield value
    f.close()

Open in new window

0
 

Author Comment

by:philsivyer
ID: 37746196
Thanks - works a treat.
Question - excuse my ignorance but as a newbie to Python ...

can you explain this bit .. if __name__ == '__main__':

and ...
how does it know how to ignore null values or empty strings and not include the JARF header?

Regards
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:philsivyer
ID: 37746211
Sorry - one more question
How can I write results to txt file

outfile = open("myresults.txt2,"wb")

etc ??

Regards
0
 
LVL 28

Accepted Solution

by:
pepr earned 500 total points
ID: 37746437
> excuse my ignorance but as a newbie to Python ...
> can you explain this bit .. if __name__ == '__main__':


No need to appologise!  The purpose of EE is to find answers, to learn ;)  There is nothing like stupid or ignorant questions.

When you write a Python code and save it in the file like a.py, you can use the a.py or as a script (i.e. self-standing program), or as a module (via import a inside your other Python program).  The if __name__ ... is the way to get the best of both usages.  The __name__ is the attribut that takes the string value '__main__' if the file was used as a script, or the name of the module (source filename without .py) if it was used as a module.

When the Python file is processed, the parts of the code are compiled and executed.  The execution of the def... means creating an object that represents internally a function.  The lines like print... are exectuted "immediately".  This way, the if __name__ prevents the block of code to be executed when the a.py is used as a module.

If my previous example is stored as a.py, try the following b.py that uses a.py as a module:

b.py
import a

# Now you can iterate through the values of the chosen column.
f = open('output.txt', 'w')
for value in a.columnNonEmptyValues('ee.csv', 'TEST1'):
    f.write(value + '\n')
f.close()

# Or the more modern way...
with open('output2.txt', 'w') as f:
    for value in a.columnNonEmptyValues('ee.csv', 'TEST1'):
        f.write(value + '\n')


# Let's demonstrate the values of some arguments.
print '__name__ is', __name__
print 'a.__name__ is', a.__name__
print
print '__file__ is', __file__
print 'a.__file__ is', a.__file__
print
print "Value of __name__ == '__main__' is", __name__ == '__main__'

Open in new window

0
 
LVL 28

Expert Comment

by:pepr
ID: 37746465
If the csv.DictReader(f) does not get the list of columns, it interprets the first line as the record with the column names.

If the d is a dictionary, the d['TEST1'] returns the value for the key 'TEST1'.  But it would fail if the key was not in the dictionary.  The d.get('TEST1', default) is the alternative that returns the same if the key exist or the given default if the item does not exist in the dictionary.

The object behaves as a boolean value in so called boolean context.  In other words, there is a boolean expression expected after the if command.  If it is not a boolean expression, the object interprets itself as a boolean value.  For strings, lists, and other sequence or containers, the empty value is interpreted as False, a non-empty value is interpreted as True.  This way, if the value was or read or set by default as empty string, it is interpreted as False, and it is not yielded by the generator (i.e. ignored).
0
 

Author Comment

by:philsivyer
ID: 37746532
Thanks
Great response - helps a lot
0
 

Author Closing Comment

by:philsivyer
ID: 37746545
Many thanks
0
 
LVL 28

Expert Comment

by:pepr
ID: 37746775
You are welcome.  Have a good day. ;)
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Python Script 1 76
Getting python error while running ansible command 2 105
Python merge rule 4 50
Trying to run a Python File 11 63
Less strange, but still introduction This introduction was added (1st August, 2011) to reflect some reactions.  Firstly, the term basics in the title of the article...  As any other word, it is a symbol with meaning attached to the word by some a…
Flask is a microframework for Python based on Werkzeug and Jinja 2. This requires you to have a good understanding of Python 2.7. Lets install Flask! To install Flask you can use a python repository for libraries tool called pip. Download this f…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now