?
Solved

Sequence splitter in BioPython

Posted on 2009-12-29
2
Medium Priority
?
465 Views
Last Modified: 2012-05-08
Hello everybody !

I'm a beginner in BioPython and I must write my first program for my Master in Biology.

The plot is to parse a big FASTA file (containing more than 10'000 SeqRecords) and slice each sequence in bits of 200 base pairs, first bit from 0 to 200, then 50 to 250, and so on until the end of the sequence.

Here is my current template :

It reads the sequences from one file and copies them to another (devoir_out). The problems :
- The first sequence is missing
- I now want to replace the mother sequence by a list of sub sequences split like I said above (using, I guess the built-in function slice)

From what I learn in the tutorials, the object SeqRecord in BioPython has three elements : the sequence (seq), the id and the description.

Thanks in advance !

# -*- coding: utf-8 -*-

from Bio.SeqRecord import SeqRecord 
from Bio import SeqIO
    #on importe ce qu'il nous faut

def seq_splitter(iterator, size):
    entry= True
    while entry:
        batch = []
        entry=iterator.next()
        batch.append(entry)
        if batch : #si batch est pas vide, renvoie batch et se met en attente
            yield batch

        #prend seq suivante, l'ajoute à batch et retourne batch

handle = open("/Users/nikedon/Documents/python/CDS_Danio.txt") 

records = SeqIO.parse(handle, "fasta") 


out_handle = open("devoir_out.faa", "w") 

for i, item in enumerate(seq_splitter(records, 123)):
    #print "Found %i" %(i)
    #print "There is %i characters in the sequence" %(len(item[0].seq))
    SeqIO.write(records, out_handle, "fasta") 


out_handle.close() 

handle.close()

Open in new window

0
Comment
Question by:Nikedon
2 Comments
 

Accepted Solution

by:
Nikedon earned 0 total points
ID: 26169340
Well... I found a solution (not witout help) but I'm sure there are other ways to do it !

Here it is :
# -*- coding: utf-8 -*-

#############################################################################
#CREER DES CONTIGS SUPERPOSES A PARTIR D'UN FICHIER FASTA
#############################################################################

#on importe ce qu'il nous faut comme modules
from Bio.SeqRecord import SeqRecord 
from Bio import SeqIO
    

#un objet-fichier, permet de manipuler un fichier, ici en read
handle = open("/Users/Nikedon/Documents/python/CDS_Danio.txt")

#on crée un objet-curseur sur une liste d'objets de type SeqRecord
records = SeqIO.parse(handle, "fasta") 

out_handle = open("devoir_out.faa", "w") 

#l'objet records peut être utilisé comme énumerateur, car possède la fonction
#next() jusqu'à ce qu'il y en ait plus...
#(car il ne sait pas encore combien il y en a en tout !)

for i, item in enumerate(records):
    #enumerate returns a number AND the object in the list (a sequence or an iterator)

    pos=0
    #boucle toujours vraie ! Finit avec le break !
    while True:
        tranche=item[pos:pos+200]
        tranche.id= tranche.id + ' %i - %i' %(pos,pos+len(tranche.seq))
        tranche.description='CDS de Danio Rerio, contigs de 200 pb de 50 en 50'
        tranche.name=''
        #pas +200 car la séquence de la tranche ne fait pas forcément 200 pb de long
        SeqIO.write([tranche], out_handle, 'fasta')
        if len(item.seq)-pos <= 200:
            break
        pos=pos+50

    
out_handle.close() 

handle.close()

Open in new window

0
 
LVL 58

Expert Comment

by:harfang
ID: 26174005
Well done! -- (^v°)
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

"The time has come," the Walrus said, "To talk of many things: Of sets--and lists--and dictionaries-- Of variable kinks-- And why you see it changing not-- And why so strange are strings." This part describes how variables and references (see …
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question