[apollo] Names of genes/transcripts changed after bulk upload

Discussion:

Robin A. Ohm

2016-02-12 16:36:36 UTC

Hello,

I'm trying to bulk upload genes from a GFF file into the user annotation
track using add_transcripts_from_gff3_to_annotations.pl script.
I notice that the names of the genes and transcript are changed after
uploading. For example, after I upload the GFF3 file below, the
resulting gene name is "SA|3829a" and the transcript name is
"SA|3829a-00001". I would prefer to use the original names in the GFF3
file. Is that possible?

Thanks, best regards, Robin

##gff-version 3
##sequence-region scaffold1 1 232163
scaffold1 FGDB gene 752 2086 . - . ID=SA|3829|gene
scaffold1 FGDB mRNA 752 2086 . - .
ID=SA|3829;Parent=SA|3829|gene;proteinId=SA|3829;Name=SA|3829
scaffold1 FGDB exon 752 2086 . - .
ID=SA|3829|exon1;Parent=SA|3829
scaffold1 FGDB CDS 752 2086 . - 0
ID=SA|3829|CDS;Parent=SA|3829

Monica Munoz-Torres

2016-02-12 21:10:50 UTC

Permalink

Hi Robin,

The short answer is that this is configurable.

The explanation for why it exists right now is that Apollo assumes that
each genomic element on the User-created Annotations area ('annotation
track') is a transcript for a given gene model (in the case of coding
genes). Starting in v2.0.x Apollo automatically assigns a [configurable]
number to the first transcript, and additional transcripts of the same gene
will keep the root name and have an increment in the numbers as you create
more.

In this case, the gene model in the evidence track would be "SA|3829a" and
the first transcript in the User-created Annotations area is "SA|3829a-00001".
If there are more than one splice forms for this gene SA|3829a, the next
one will be labeled SA|3829a-00002, etc. You can customize it to be -RA,
-RB (per FlyBase naming conventions, followed also by NCBI).

The goal of this feature is to assist curators in appropriately naming and
keeping track of isoforms. Otherwise, they could end up, for example, with
three identically labeled isoforms, despite the fact that they represent
different transcripts of the same gene (in evidence track) - as it was the
case in v1.0.x:

[image: Inline image 2]

In the case you describe, if you are sure that there will be no conflicts
because your users are not going to encounter the scenario of there being
more than one splice form of their gene of interest, I am quite sure you
may customize your configuration to not include this count.

- I'll let Nathan share with you where this lives in the code.

I would also like to learn more about your use case. How you are
implementing the use of those gene models on your User-created Annotations
area directly from the GFF3 file. Are you doing this for all scaffolds? Or
only for a few gene models at a time?

cheers,
~moni.

Post by Robin A. Ohm
Hello,
I'm trying to bulk upload genes from a GFF file into the user annotation
track using add_transcripts_from_gff3_to_annotations.pl script.
I notice that the names of the genes and transcript are changed after
uploading. For example, after I upload the GFF3 file below, the resulting
gene name is "SA|3829a" and the transcript name is "SA|3829a-00001". I
would prefer to use the original names in the GFF3 file. Is that possible?
Thanks, best regards, Robin
##gff-version 3
##sequence-region scaffold1 1 232163
scaffold1 FGDB gene 752 2086 . - . ID=SA|3829|gene
scaffold1 FGDB mRNA 752 2086 . - .
ID=SA|3829;Parent=SA|3829|gene;proteinId=SA|3829;Name=SA|3829
scaffold1 FGDB exon 752 2086 . - .
ID=SA|3829|exon1;Parent=SA|3829
scaffold1 FGDB CDS 752 2086 . - 0
ID=SA|3829|CDS;Parent=SA|3829
--
Robin A. Ohm, PhD | Assistant Professor | Microbiology | Utrecht University
Kruyt Building | Room W402 | Padualaan 8 | 3584 CH | Utrecht | The
Netherlands | +31 (0) 30 2533016
This list is for the Apollo Annotation Editing Tool. Info at
http://genomearchitect.org/
If you wish to unsubscribe from the Apollo List: 1. From the address with
2. In the subject line of your email type: unsubscribe apollo | 3. Leave
the message body blank.

--
Mentorship Matters!
--
Monica Munoz-Torres, PhD.
Berkeley Bioinformatics Open-source Projects (BBOP)
Environmental Genomics and Systems Biology Division
Lawrence Berkeley National Laboratory

Mailing Address:
Lawrence Berkeley National Laboratory
1 Cyclotron Road Mailstop 977
Berkeley, CA 94720

Nathan Dunn

2016-02-15 19:48:36 UTC

Permalink

Robin,

Short-answer is that it **should** work by default just as you described, but I am getting the same results you are.

I think I know whatâs wrong and Iâll try to get a fix soon.

tools/data/add_transcripts_from_gff3_to_annotations.pl \
-U localhost:8080/apollo -u â***@lbl.gov" -p âpassword" -o "Honeybee"\
-i Annotations56.gff3 -t mRNA -d CDS -g gene -e exon

Should properly import a GFF3 export of the type we already export (e.g.):

Group1.10 . gene 1290368 1293149 . + . Name=GB40862-RA;date_creation=2016-02-04;owner=***@me.com;ID=70cfdae4-1950-41f1-b56c-bc2f105624b4;date_last_modified=2016-02-04
Group1.10 . mRNA 1290368 1293149 . + . Name=GB40862-RA-00001;date_creation=2016-02-04;Parent=70cfdae4-1950-41f1-b56c-bc2f105624b4;owner=***@me.com;ID=90b0c2fe-45dd-47c8-b27d-b9929e098895;date_last_modified=2016-02-04
Group1.10 . non_canonical_three_prime_splice_site 1291824 1291824 . + . Name=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_three_prive_splice_site-1291823;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_three_prive_splice_site-1291823
Group1.10 . non_canonical_three_prime_splice_site 1292399 1292399 . + . Name=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_three_prive_splice_site-1292398;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_three_prive_splice_site-1292398
Group1.10 . non_canonical_five_prime_splice_site 1292317 1292317 . + . Name=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_five_prime_splice_site-1292316;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=90b0c2fe-45dd-47c8-b27d-b9929e098895-non_canonical_five_prime_splice_site-1292316
Group1.10 . exon 1291824 1292314 . + . Name=e5ce94a7-36a3-4cfa-bc2e-53652cbf1953-exon;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=e5ce94a7-36a3-4cfa-bc2e-53652cbf1953
Group1.10 . exon 1290368 1290636 . + . Name=256d4f9a-31dc-45ab-970a-596777c90f17-exon;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=256d4f9a-31dc-45ab-970a-596777c90f17
Group1.10 . exon 1290765 1290929 . + . Name=b4fb60f7-e382-458b-a990-ae749640b1f3-exon;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=b4fb60f7-e382-458b-a990-ae749640b1f3
Group1.10 . exon 1293140 1293149 . + . Name=abd2c3a9-5f65-4fb1-8bd6-14adca5c57f9-exon;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=abd2c3a9-5f65-4fb1-8bd6-14adca5c57f9
Group1.10 . exon 1292399 1292764 . + . Name=6100efd1-ca74-438f-92a0-88fb2b6bfdf8-exon;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=6100efd1-ca74-438f-92a0-88fb2b6bfdf8
Group1.10 . CDS 1290577 1290636 . + 0 Name=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9-CDS;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9
Group1.10 . CDS 1290765 1290929 . + 0 Name=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9-CDS;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9
Group1.10 . CDS 1291824 1291841 . + 0 Name=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9-CDS;Parent=90b0c2fe-45dd-47c8-b27d-b9929e098895;ID=8ddaccd7-a3f6-4bd9-954a-576deca1a9e9
###

Nathan Dunn, PhD
Berkeley Bioinformatics Open-source Projects (BBOP)
Genomics Division, Lawrence Berkeley National Laboratory

Post by Robin A. Ohm
Hello,
I'm trying to bulk upload genes from a GFF file into the user annotation track using add_transcripts_from_gff3_to_annotations.pl script.
I notice that the names of the genes and transcript are changed after uploading. For example, after I upload the GFF3 file below, the resulting gene name is "SA|3829a" and the transcript name is "SA|3829a-00001". I would prefer to use the original names in the GFF3 file. Is that possible?
Thanks, best regards, Robin
##gff-version 3
##sequence-region scaffold1 1 232163
scaffold1 FGDB gene 752 2086 . - . ID=SA|3829|gene
scaffold1 FGDB mRNA 752 2086 . - . ID=SA|3829;Parent=SA|3829|gene;proteinId=SA|3829;Name=SA|3829
scaffold1 FGDB exon 752 2086 . - . ID=SA|3829|exon1;Parent=SA|3829
scaffold1 FGDB CDS 752 2086 . - 0 ID=SA|3829|CDS;Parent=SA|3829
--
Robin A. Ohm, PhD | Assistant Professor | Microbiology | Utrecht University
Kruyt Building | Room W402 | Padualaan 8 | 3584 CH | Utrecht | The Netherlands | +31 (0) 30 2533016
This list is for the Apollo Annotation Editing Tool. Info at http://genomearchitect.org/