Bioinfo Data BloQ codeQuest

A. Treatment of Sequences

No. 1

#Extract FASTA Sequences from another file using sequence IDs in another file

#Write two separate scripts with file names FastaToTbl and TblToFasta with the following content:

FastaToTbl:

#!/usr/bin/awk -f
{
if (substr($1,1,1)==">")
if (NR>1)
printf "\n%s\t", substr($0,2,length($0)-1)
else
printf "%s\t", substr($0,2,length($0)-1)
else
printf "%s", $0
}END{printf "\n"}

TblToFasta:

#! /usr/bin/awk -f

{
sequence=$NF
ls = length(sequence)
is = 1
fld = 1
while (fld < NF)
{
if (fld == 1){printf ">"}
printf "%s " , $fld
if (fld == NF-1)
{
printf "\n"
}
fld = fld+1
}
while (is <= ls)
{
printf "%s\n", substr(sequence,is,60)
is=is+60
}
}

#Further explainations check the reference below.

Ref: https://askubuntu.com/questions/832478/how-to-compare-2-files-and-extract-the-particular-sequence-using-id-in-first-fil

No. 2.

#Convert Multi FASTA line to a single FASTA line

$ cat input_file.fasta | awk '{if (substr($0,1,1)==">"){if (p){print "\n";} print $0} else printf("%s",$0);p++;}END{print "\n"}' > singleline.fasta

OR used awk program:

% awk '!/^>/ { printf "%s", $0; n = "\n" }

/^>/ { print n $0; n = "" }

END { printf "%s", n }

' input_file.fasta > singleline.fasta

Ref: https://stackoverflow.com/questions/15857088/remove-line-breaks-in-a-fasta-file

No. 3

# Randommly select sets of FASTA sequences Using python script:

from Bio import SeqIO

from random import sample

v = open(r"/Users/User pc/Desktop/pythonEx/viralRandom.txt", "w")
with open(r"/Users/User pc/Desktop/pythonEx/viralNinput.txt") as f:
seqs = SeqIO.parse(f, "fasta")
samps = ((seq.name, seq.seq) for seq in sample(list(seqs),10000))
for samp in samps:
v.write(''.join(">{}\n{}".format(*samp)+'\n'))
v.write("\n")
v.write("\n\n")

Bioinfo Data BloQ codeQuest

Comments

Post a Comment

Popular posts from this blog

WELCOME TO BIOINFO DATA BLOQ CODEQUEST