Solved

Python - how to delete an item from a nested list, file input

Posted on 2008-06-20
13
1,125 Views
Last Modified: 2011-10-03
The python readlines() function gives me a list, output[], from file and I want to delete any line from output[] that is short the required number of list elements (14).  I can print the index values of the lines concerned, but I can't delete the lines themselves - the objects are either unsubscriptable or don't support item deletion. Help!

output = src.readlines()
 
index = 0
for index, lines in enumerate(output):
   lines = lines.split(',', 13)
   if (len(lines) < 14):
       print index  # prints correct index value 
       del index     # or output2[index] etc, fails
   else:
       index +=1

Open in new window

0
Comment
Question by:sara_bellum
  • 4
  • 4
  • 3
  • +1
13 Comments
 
LVL 9

Expert Comment

by:ghostdog74
ID: 21836357
assign to another array. Here's a list comprehension
a = [ l for n,l in enumerate(open("file")) if len(l.split(",")) == 14 ]
print a   

Open in new window

0
 

Author Comment

by:sara_bellum
ID: 21836670
I have added your code to my script but still can't delete the lines that are missing list elements.  I tried several options but my attempts may confuse you, so I simply added some comments to demonstrate what I am trying to do.
file = open('badger_start.dat')
list_all = file.readlines()
 
# check to see if I'm picking up the right errors
list_errors = [ l for n, l in enumerate(file) if len(l.split(",")) < 14 ]
print list_errors # this is hard to read but I know how to fix that
 
for index, line in enumerate(list_all):
#   if the index value matches a line with an error 
#   delete the line
   else:
       index += 1 #the index should only increment for valid lines
 
# check your result
print list_all # hard to read but I know how to fix that
file.close()

Open in new window

0
 
LVL 9

Expert Comment

by:ghostdog74
ID: 21836790
can you post a sample of your badger_start.dat file, and then describe what you actually want to see as output. Its much easier this way
0
Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

 
LVL 15

Expert Comment

by:efn
ID: 21836925
It's not going to work well to delete from a list while you are in the middle of iterating through it.

It is possible to fix your example to work with the technique ghostdog74 suggested.  The idea is that instead of trying to delete the elements you don't want from the list, you construct a new list with only the elements you do want.  In your code, you have all the input lines in the list_all list.  In this case, you don't care about the indexes, so there is no need to use the enumerate function.  You can just write an expression to select the lines you want from the list_all list:

wanted = [ line for line in list_all if len(line.split(",")) >= 14 ]

If you want to construct a list of the error lines as in your example, you can just change the condition tested.

Another way to do it is to iterate over a copy of the list, removing elements from the original list.  list_all[:] makes a copy of the list and you can delete from the original list by value.  This is longer, but perhaps easier to read.

for line in list_all[:] :
    if len(line.split(",")) < 14:
        list_all.remove(line)
0
 
LVL 15

Expert Comment

by:mish33
ID: 21837742
A) As was said do NOT modify list you iterate on
B) enumerate does +=1 for you
C) be careful with variable names
output = src.readlines()
valid = []
for index, line in enumerate(output):
   fields = line.split(',')
   if len(fields) < 14:
       print index  # prints correct index value 
   else:
       valid.append(line)
# use valid lines

Open in new window

0
 
LVL 15

Expert Comment

by:efn
ID: 21838051
> B) enumerate does +=1 for you

Nitpick:  actually, "for ... in ..." does the +=1 for you.  But mish33 did show another way that will work.
0
 
LVL 15

Expert Comment

by:mish33
ID: 21838404
efn: I was referring to index += 1 in the OP code
0
 
LVL 15

Expert Comment

by:efn
ID: 21838542
So was I.  The idea of the "index += 1" statement was apparently to iterate through the list;  I think we agree that this statement is unnecessary.  The "for ... in ..." statement is what iterates through the various lists in all of the code on this page.  The calls to enumerate are really not needed at all, except for debugging displays of index values.  The enumerate function just returns a list that just sits there, not adding anything to anything.  But this is not really an important point.
0
 
LVL 15

Expert Comment

by:mish33
ID: 21838615
enumerate is needed to print indexes of non-complaining lines,
but having both enumerate and manual index counting is um... unnecessary
0
 

Author Comment

by:sara_bellum
ID: 21838933
Thanks very much!  efn and mish33's solutions do work for me, but I'm trying to start at an index value of 4 to exclude the header information from the sample data (the header has a different format).  I tried inserting index = 4 before the for loop(s) but that fails. I had assumed, apparently incorrectly, that I could start a for loop at any index value in the list.   Let me know if there's a simple answer to this, or if I need to start a new question, thanks.
0
 
LVL 15

Accepted Solution

by:
efn earned 500 total points
ID: 21838999
If you use the second approach I suggested, where you make a copy of the list, you can make a copy of everything in the list starting from index 4 if you use list_all[4:] instead of list_all[:].  In this approach, you are removing list elements by value, so it won't matter that the indexes in the list being checked and the list being changed are not the same.

There are, of course, other solutions.
0
 
LVL 15

Expert Comment

by:mish33
ID: 21839166
That approach will keep index (line number) of printed lines right:
output = src.readlines()
valid = []
for index, line in enumerate(output):
   if index < 4: continue  # skip first 4 lines
   fields = line.split(',')
   if len(fields) < 14:
       print index  # prints correct index value 
   else:
       valid.append(line)
# use valid lines

Open in new window

0
 

Author Closing Comment

by:sara_bellum
ID: 31469370
For reasons which pass my understanding, mish33's solution didn't work the 2d etc time I tried it...so I gave all the points to efn - thanks very much!
0

Featured Post

Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question