Questions tagged [fasta]

FASTA is a software package for sequence alignment of proteins and nucleic acids. FASTA is also the name of the file format used by these programs to represent sequences of peptides or nucleotides. The format is a de facto standard in bioinformatics.

0
votes
1answer
9 views

Why is my regex not working to remove a section of a fasta header

I want to remove everything between the ">" and "Un_" in a heading such as >NW_017859640.1 Esox lucius isolate CL-BC-CA-002 unplaced genomic scaffold, Eluc_V3 Un_scaffold1210 I've tried multiple ...
-1
votes
2answers
41 views

What am i doing wrong while running this code?

First off, I am in no way a programming expert, and am not well versed with python, so forgive me if this is a stupid question. I am trying to run the code below to filter a fasta file down to only ...
0
votes
1answer
35 views

How to get the sequence counts (in fasta) with conditions using python?

I have a fasta file (fasta is a file in which header line starts with > followed by a sequence line corresponding to that header). I want to get the counts for sequences matching TRINITY and total ...
2
votes
7answers
72 views

Remove multiple sequences from fasta file

I have a text file of character sequences that consist of two lines: a header, and the sequence itself in the following line. The structure of the file is as follow: >header1 aaaaaaaaa >header2 ...
1
vote
0answers
49 views

SeqIO.parse Biopython - which file format should I specify?

I am trying to extract information from a multi-fasta file (e.g. C/G/A/T count, CG%) using biopython. I keep running into trouble when I try to iterate over the file for each fasta sequence - I can ...
0
votes
0answers
12 views

random sampling 1/3 of genome .fasta

I have a genome of about 2 gb composed by scaffolds I would random sample the genome. I used reformat.sh but the output was only a scaffold. I need 1/3 of the total genome... >LGKD01000001.1 ...
0
votes
3answers
44 views

Rename Parts of fasta-header according to .csv

I want to change parts of my fasta header with a list with parts of a .tsv. Im not a Bioinformatician just a Microbiologist with beginner skills on bash and python. Thx for the help. Example: ...
2
votes
1answer
34 views

Dataframe creation from filename, contig identifier, and length of sequence

I am attempting to create a dataframe from fasta files which contain a header (the name of the contig) and a DNA sequence. In the first column of my dataframe I would like to have the name of the file,...
1
vote
1answer
31 views

Sort the order of fasta sequence using python

I have a fasta file (consists of >header and sequence lines) as below: myfasta >S.sclerotiorum_Ch16_153_209 AACCCTAACCCTAACCCTTGATTGATTGATTGATTGATTGAT TGATTGATGAAATTATAGTCTCCGTAAAGCAAATAAAGCATT ...
1
vote
2answers
40 views

Extract multiple columns and add null character in between

I have a file with the following format : TRINITY_DN119001_c0_g1_i1 4 * 0 0 * * 0 0 GAGCCTCCCTCATGAATGTACCAGCATTTACCTCATAAAGAGCT * XO:Z:NM TRINITY_DN119037_c0_g1_i1 4 * ...
0
votes
2answers
45 views

How to work with nested loop, looping through array elements?

Coming from R background, I wanted to try nested for-loop in python. I am having trouble looping through each iteration of types in my code below. My code works for types[0], but not for successive ...
0
votes
1answer
20 views

From FASTA file, extract only entries with specified taxonomy

I would like to extract all the entries of a fasta file that are from human taxonomy and make those entries into a new smaller fasta file. I'm trying to use R, but I'm not sure how to do it. Two ...
4
votes
3answers
102 views

Using conditions to match multiple patterns within a line

I have a fasta file like this: myfasta.fasta >1_CDS AAAAATTTCTGGGCCCCGGGGG AAATTATTA >2_CDS TTAAAAATTTCTGGGCCCCGGGAAAAAA >3_CDS TTTGGGAATTAAACCCT >4_CDS TTTGGGAATTAAACCCT >5_rRNA ...
2
votes
3answers
83 views

Is there a way to replace all occurrances of certain characters but only on every nth line?

I am trying to replace all characters that are not C, T, A or G with an N in the sequence part of a fasta file - i.e. every 2nd line I think some combination of awk and tr is what I would need... To ...
0
votes
0answers
21 views

What kind of error should be checked by a validator while validating biological file formats like GFF and FASTA

I'm working on a project to create a library(in Java) that can validate various biological file formats like GFF, FASTA, OBO etc. But as I'm not from this field, So I'm little confused about what ...
1
vote
1answer
16 views

How to get the count of duplicated sequences in fasta file using python

I have a fasta file like this: test_fasta.fasta >XXKHH_1 AAAAATTTCTGGGCCCC >YYYXXKHH_1 TTAAAAATTTCTGGGCCCCGGGAAAAAA >TTDTT_11 TTTGGGAATTAAACCCT >ID_2SS TTTGGGAATTAAACCCT >YKHH_1 ...
0
votes
0answers
39 views

How to search for matching fasta sequences in multifasta files and append output in another file?

I have three fasta files. File1 org_seqs.fasta >OAJ152.7_org_name ...
0
votes
2answers
30 views

Substring multifasta file using python

I am trying to extract sequences from a multifasta file from position 2 to 8 (seeds of microRNAs). To do this I have written a small python script. The script works but I couldn't write an output file....
0
votes
1answer
55 views

Concatenate two fasta files in Python

I have two data files (FASTA) and each file represents one gene and the sequences are identified by species and local. I would like to concatenate these files into one as the example: psbki.fas: >...
1
vote
3answers
84 views

Convert sequence list to fasta for multiple files

I have thousands of files, which are a list of sequence names followed by their sequence, one individual per line, something like this: L.abdalai.LJAMM.14363.SanMartindeLosAndes ...
1
vote
1answer
34 views

create and save fasta file from stringset [closed]

I have this DNA stringset, but I want to create a new file.fa containing this information. What is an efficient way to save these? I've tried to use write.fasta but it crashed. genes_seq <- A ...
1
vote
1answer
62 views

How to print the first few records using SeqIO from Biopython

I have a fasta file that has several hundred records but I'm trying to return a table with just the first 20 records (record description, AA length, and name). My code is not working and I would ...
0
votes
0answers
21 views

How to read a FASTA file and insert the sequence in another function calling a class?

I have an assignment where I have to write a python class to represent and manipulate biological sequences. I have almost finished the class, however I need to import sequences from FASTA files input ...
0
votes
1answer
34 views

grep: invalid repetition count(s) when using while loop [duplicate]

So I'm using a MacOS commandline and have two files File A.txt A B F File B.txt >A abcde >B efghi >C jklmn >D opqrs >E tuvwx >F yz123 I want it to go through a while loop ...
1
vote
1answer
38 views

Parse a fasta file using PHP

I have a fasta file input.fa which looks like this: >KJH325_Org_name_strain ANNTTHWQLPMCVREEDFSC >IJA254.1_Org_name HITYYPQLKSSCMART >ASDL658_Org_name_str TTILPQWYERSAASMNCFGHDKLCC and so on....
3
votes
1answer
66 views

Directly calling SeqIO.parse() in for loop works, but using it separately beforehand doesn't? Why?

In python this code, where I directly call the function SeqIO.parse() , runs fine: from Bio import SeqIO a = SeqIO.parse("a.fasta", "fasta") records = list(a) for asq in SeqIO.parse("a.fasta", "...
1
vote
6answers
105 views

Script using sed and grep gives unintended output

I have a "source.fasta" file that contains information in the following format: >TRINITY_DN80_c0_g1_i1 len=723 path=[700:0-350 1417:351-368 1045:369-722] [-1, 700, 1417, 1045, -2] ...
1
vote
4answers
131 views

how can I count the frequency of letters

I have a data like this >sp|Q96A73|P33MX_HUMAN Putative monooxygenase p33MONOX OS=Homo sapiens OX=9606 GN=KIAA1191 PE=1 SV=1 ...
1
vote
1answer
28 views

Splitting header and content using regex

I have the following sequence text in a file every header starts with ">" content number of lines is random >XM_024446048.1 PREDICTED: Homo sapiens mannosidase alpha class 2A member 1 (MAN2A1), ...
-2
votes
2answers
20 views

Multifasta header trimming

I have a multifasta file and I need to delete some part of the header for every fasta file. For example: >Viridibacillus_arenosi_FSL_R5_0213-BK137_RS04360-22-CBS_domain-containing_protein <...
2
votes
2answers
50 views

Extracting gene sequences from FASTA File?

I have the following code that reads a FASTA file with 10 gene sequences and return each sequences as a matrix. However the code seems to be missing on the very last sequence and I wonder why? file=...
2
votes
1answer
69 views

Replacing all of instances of a letter in a column of a FASTA alignment file

I am writing a script which can replace all of the instances of an amino acid residue in a column of a FASTA alignment file. Using AlignIO, I just can read an alignment file and extract information ...
0
votes
1answer
25 views

How to select genes from FASTA file based on names list in CSV format?

I am looking for an R solution to extract multiple sequences from a FASTA file based on a match to a list of header ID's in a separate file (.csv). I am new to R and am trying to find a way to: Take ...
0
votes
0answers
36 views

How to extract certain lines from a fasta file into a vector

I have a fasta file from which I'm supposed to generate kmers. I'm trying to put the characters of every second string after a "<" symbol into a vector. for example, if the fasta file said >...
-1
votes
2answers
34 views

Perl with FASTA sequence extraction has problems (only) with first sequence

I am using a function/subroutine extract_seq available on internet to extract sequences in FASTA files. Briefly: A sequence begins with first line identified by '>', followed by ID and other ...
0
votes
2answers
38 views

delete a pattern within a file

I have a fasta file containing thousands of sequences. It appears with this format >3276_2258569 M05025:154:000000000-BVP4M:1:1101:17272:1161 1:N:0:TGGTGG orig_bc=TGCGA new_bc=TGCGA ...
0
votes
1answer
91 views

Rename file using fasta header

I have multiple fasta files downloaded from NCBI and want to rename them with some part of the header: Example of the header: >KY705281.1 Streptococcus phage P7955, complete genome Example of ...
0
votes
1answer
72 views

Automatically rename fasta files with the ID of the first sequence in each file

I have multiple fasta files with single sequence in the same directory. I want to rename each fasta file with the header of the single sequence present in the fasta file. When i run my code , i obtain ...
0
votes
3answers
50 views

How can I use awk (first field of each file) on multiple files and get the result for each input file

I've tried ls *.fasta | parallel --gnu "awk '{print $1}' > {/.}.outputfile.txt" and its not producing the result I need. I have 48 files where I need to extract these fields and output them to 48 ...
1
vote
1answer
26 views

Replace ids from file1 with that of file2

I have two text files and I want to replace id from file1 with that of file2. All the ids are in the same order in both the files. File1 >12_abc ghfghfjgfhjgfjf hgfjfgjgfjfgjgfjf >13_def ...
0
votes
3answers
79 views

How can i edit my python script so it can select the whole text of a fasta sequence?

I have 2 files: one is a text file that contains a series of IDs, and the other is a multifasta file that contains fasta sequences corresponding to the IDs in the first file. I have a python a script ...
-1
votes
1answer
41 views

How to edit a header in a fasta sequence by cutting some parts of it and keeping the main text of the sequence using a linux command line?

I have a multi fasta file named fasta1.fasta that contains the sequences and their IDs. What i want is to cut the header of the sequence that have the ID and reduce it to contains the ID accession ...
1
vote
2answers
45 views

How can I get this output from FASTA file without using Biopython?

I need to obtain the output shown below from FASTA file, but wihtout using BioPython. Anyone have an idea? This is the code using BioPython: from Bio import SeqIO records = SeqIO.parse("data/...
0
votes
3answers
76 views

Read nucleotides in FASTA without using BioPython

I need to obtain the same output obtained with the following code, but without using BioPython. I'm stuck... Anyone could help me? Thanks!!! from Bio import SeqIO records = SeqIO.parse("data/...
0
votes
3answers
58 views

How to concatenate fasta files with identical names into one file with different headers?

My problem is more on how to rename the header line for each fasta sequence, as I know how to concatenate a bunch of fasta files into one file. The problem is, after generating my files each file has ...
0
votes
2answers
69 views

motif finder in a text file using python

I have a big text file like this example: example: >chr9:128683-128744 GGATTTCTTCTTAGTTTGGATCCATTGCTGGTGAGCTAGTGGGATTTTTTGGGGGGTGTTA >chr16:134222-134283 ...
-1
votes
1answer
52 views

How to fix ''generator' object is not subscriptable" error when reading fasta file with BioPython

I am trying to open and read a fasta file and use only the first line from the input. Currently, I'm calling the first line and appending it to a list to use in a later function. However, I'm ...
0
votes
0answers
47 views

Translating multiple rna sequences from a FASTA file into proteins in BioPython

I need to translate multiple unambiguous rna sequences present in one FASTA file in Biopython. How can i put the data from all rna sequences in a single code to translate it to proteins?
0
votes
1answer
38 views

Drawing multiple sequences from 1 file, based on shared fields in another file

I'm trying to run a python script to draw sequences from a separate file (merged.fas), in respect to a list (gene_fams_eggnog.txt) I have as output from another program. The code is as follows: from ...
2
votes
1answer
158 views

How to rename headers in many multi-fasta files with the name of a file?

I have a directory with several hundred multi-FASTA files. These files are called with the name of the species or genus, such as: Bubo_bubo.fasta Poa_CC7849.fasta Homo_sapiens.fasta ... Inside each ...