Hello everybody !
I'm a beginner in BioPython and I must write my first program for my Master in Biology.
The plot is to parse a big FASTA file (containing more than 10'000 SeqRecords) and slice each sequence in bits of 200 base pairs, first bit from 0 to 200, then 50 to 250, and so on until the end of the sequence.
Here is my current template :
It reads the sequences from one file and copies them to another (devoir_out). The problems :
- The first sequence is missing
- I now want to replace the mother sequence by a list of sub sequences split like I said above (using, I guess the built-in function slice)
From what I learn in the tutorials, the object SeqRecord in BioPython has three elements : the sequence (seq), the id and the description.
Thanks in advance !
# -*- coding: utf-8 -*-
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
#on importe ce qu'il nous faut
def seq_splitter(iterator, size):
batch = 
if batch : #si batch est pas vide, renvoie batch et se met en attente
#prend seq suivante, l'ajoute à batch et retourne batch
handle = open("/Users/nikedon/Documents/python/CDS_Danio.txt")
records = SeqIO.parse(handle, "fasta")
out_handle = open("devoir_out.faa", "w")
for i, item in enumerate(seq_splitter(records, 123)):
#print "Found %i" %(i)
#print "There is %i characters in the sequence" %(len(item.seq))
SeqIO.write(records, out_handle, "fasta")