Copyright Ó 2006 by the Genetics Society of AmericaDOI: 10.1534/genetics.105.050336
Recently Evolved Genes Identified From Drosophila yakuba and D. erecta
David J. Begun,1 Heather A. Lindfors, Melissa E. Thompson and Alisha K. Holloway
Section of Evolution and Ecology, University of California, Davis, California 95616
Accepted for publication November 22, 2005
The fraction of the genome associated with male reproduction in Drosophila may be unusually dy-
namic. For example, male reproduction-related genes show higher-than-average rates of protein diver-gence and gene expression evolution compared to most Drosophila genes. Drosophila male reproductionmay also be enriched for novel genetic functions. Our earlier work, based on accessory gland proteingenes (Acp’s) in D. simulans and D. melanogaster, suggested that the melanogaster subgroup Acp’s may be lostand/or gained on a relatively rapid timescale. Here we investigate this possibility more thoroughlythrough description of the accessory gland transcriptome in two melanogaster subgroup species, D. yakubaand D. erecta. A genomic analysis of previously unknown genes isolated from cDNA libraries of these speciesrevealed several cases of genes present in one or both species, yet absent from ingroup and outgroupspecies. We found no evidence that these novel genes are attributable primarily to duplication and di-vergence, which suggests the possibility that Acp’s or other genes coding for small proteins may originatefrom ancestrally noncoding DNA.
AN extensive literature documenting the unusually most proteins (Begun et al. 2000; Swanson et al. 2001;
rapid evolution of reproductive traits in many taxa
Holloway and Begun 2004; Kern et al. 2004; Mueller
suggests that sexual selection may be a primary agent of
et al. 2005; Wagstaff and Begun 2005a,b). Population
evolution in natural animal populations (e.g., Eberhard
genetic evidence for directional selection on Acp’s has
1985; Andersson 1994; Birkhead and Moller 1998).
been found in the melanogaster subgroup, the repleta group,
Although most data bearing on evolution of reproduc-
and the obscura group of Drosophila (Tsaur and Wu
tive traits are morphological or behavioral in nature,
1997; Aguade´ 1999; Begun et al. 2000; Holloway and
directional selection on reproductive function should
Begun 2004; Kern et al. 2004; Wagstaff and Begun
be manifest in patterns of genome evolution. For
2005a,b; Begun and Lindfors 2005), perhaps due to
example, a genomic approach for identifying biological
male–male, male–female, or fly–pathogen interactions.
functions that may be under directional selection is to
As noted previously, genomic surveys of divergence of
use sequence divergence in concert with gene annota-
male reproduction-related genes have demonstrated
tion to identify functions enriched for rapidly evolving
that they evolve rapidly compared to most other protein
proteins (e.g., Nielsen et al. 2005; Richards et al. 2005).
classes. Indeed, many testis-expressed Drosophila mela-
Such analyses support the idea that proteins function-
nogaster genes have no obvious homolog in D. pseudoob-
ing in male reproduction in Drosophila, mice, and pri-
scura (Richards et al. 2005), which is consistent with
mates evolve unusually quickly (Zhang et al. 2004; Good
either very rapid evolution or gene presence/absence
and Nachman 2005; Nielsen et al. 2005; Richards et al.
variation (i.e., lineage-restricted genes). The notion that
2005). Such data do not prove that rapid evolution results
genes coding for male reproductive functions may be
from directional selection. However, the repeatability
enriched for lineage-restricted genes in Drosophila is
across taxa of the pattern of rapid protein evolution is
supported by reports of recently evolved, novel genes that
certainly consistent with this idea.
are expressed in Drosophila testes (Long and Langley
Drosophila ACPs (seminal fluid proteins) have been
1993; Nurminsky et al. 1998; Betran and Long 2003).
the subject of several evolutionary and functional in-
Although there has been little systematic investiga-
vestigations. These proteins elicit manifold physiological
tion regarding the question of whether reproductive
and behavioral changes in females (reviewed in Chapman
functions are characteristic of lineage-restricted genes,
and Davies 2004) and play an important role in
we previously reported that in Drosophila, an Acp in a
sperm storage (Neubaum and Wolfner 1999; Tram and
given species is sometimes absent from a related species
Wolfner 1999). They evolve quite rapidly compared to
(Begun and Lindfors 2005; Wagstaff and Begun2005a). For example, 6 of 13 D. melanogaster Acp’s in-
vestigated were absent from D. pseudoobscura (Wagstaff
Corresponding author: Section of Evolution and Ecology, University of
and Begun 2005a). A subsequent analysis of additional
D. melanogaster Acp’s vs. D. pseudoobscura yielded compa-
purified [QIAGEN (Chatsworth, CA) QIAquick PCR purifica-
rable results (Mueller et al. 2005). A subset of the D.
tion kit], incubated in Promega (Madison, WI) Taq poly-
melanogaster Acp’s that are absent from D. pseudoobscura
merase, and ligated into PCR4 TOPO vector (Invitrogen). Ligations were transformed and plated, with the resulting
have loss-of-function phenotypes or show evidence of
colonies subjected to PCR using vector primers. Colony PCR
directional selection in D. melanogaster/D. simulans,
products were sequenced at the University of California at
which suggests that invoking ‘‘functional redundancy’’
Davis College of Agricultural and Environmental Sciences
and gene loss is overly simplistic. In fact, these analyses
Genomics Facility. For D. yakuba, 415 clones were sequenced.
of D. melanogaster vs. D. pseudoobscura could not broach
They yielded 360 high-quality sequences, which assembled(Lasergene) into 119 unique contigs. For D. erecta, 333 clones
the issue of whether the lineage distribution of Acp’s
were sequenced. They yielded 252 high-quality sequences and
in these two species is explained by gene loss in D.
114 unique contigs. Unique D. yakuba and D. erecta accessory
pseudoobscura, gene gain in D. melanogaster, or some com-
gland ESTs can be found under GenBank accession nos.
bination. We also found putative cases of recent loss of
The complexity of these libraries appears to be considerably
greater than that estimated from random sequencing of a D.
2005). For example, D. melanogaster is missing an Acp that
mojavensis accessory gland cDNA library (Wagstaff and
was present in the common ancestor of D. melanogaster
Begun 2005b; 26 transcripts from 139 random clones). This
and D. simulans and that is present as a single-copy gene
suggests that Drosophila species vary in the complexity of the
in D. simulans, indicating that this gene was lost within
accessory gland transcriptome, but more quantitative data
would be required to address this issue.
Analysis of ESTs: Each unique EST was compared by BLAST
did not find unambiguous evidence for gains of Acp’s in
to predicted D. melanogaster genes and proteins. ESTs return-
the melanogaster subgroup. Nevertheless, loss of Acp’s
ing E-values ,1e-15 were considered to be candidate un-
implies either that compensatory gains maintain mela-
annotated homologous Acp’s or candidate Acp’s absent from
nogaster subgroup seminal fluid protein-coding capacity
the D. melanogaster genome. Each candidate was then com-
or that the melanogaster subgroup is evolving toward a
pared (BLASTn) to D. melanogaster chromosome arms to deter-mine if there was evidence for an unannotated D. melanogaster
lower equilibrium number of Acp’s per genome.
gene corresponding to the D. yakuba or D. erecta EST. ESTs
The gain and/or loss of Acp’s over time will result in
that failed to show convincing BLAST hits to D. melanogaster
the gradual functional divergence of seminal fluid func-
were candidate lineage-restricted genes (although they could
tion between Drosophila lineages, presumably under
also be highly diverged orthologs). RACE was used to isolate
the influence of natural selection. One possible mech-
the entire transcript associated with each putative lineage-restricted gene. These genes were investigated in terms of
anism for gene gain is duplication followed by functional
splicing, predicted protein sequence, and whether they were
divergence (Ohno 1970). However, computational anal-
present as putative single-copy genes in D. yakuba or D. erecta on
ysis of the D. melanogaster genome suggested that most
the basis of BLAST or BLAT analyses to genome assemblies.
duplicated Acp’s are ancient (Holloway and Begun
Finally, given that most ACPs have strongly predicted signal
sequences (Swanson et al. 2001), which are required for secre-
ueller et al. 2005), which does not support the
tion, the predicted proteins were analyzed by SignalP to deter-
idea that recent losses of the melanogaster subgroup Acp’s
mine the likely presence/absence of a signal peptide (Bendtsen
are entirely compensated for by recent duplication and
et al. 2004). Candidate lineage-restricted genes were subjected
divergence. The purpose of the work presented here
to additional investigation, as described in the next section.
was to systematically investigate potential gains of Acp’s
Search for orthologs based on syntenic alignments: Syn-
in the melanogaster subgroup of Drosophila. This was
tenic regions of variable size (generally several kilobases)
accomplished by description of the accessory gland
encompassing each candidate gene were isolated from the D. yakuba or D. erecta genome assemblies (BLAT via the UCSC
transcriptome in D. yakuba and D. erecta, followed by
genome browser (Kent et al. 2002; http://genome.ucsc.edu)
computational analysis of melanogaster group species ge-
to D. yakuba (Release 1.0; Washington University Medical
nome assemblies. We have assumed that D. yakuba and
Genome Sequencing Center) or BLAST to D. erecta contigs
D. erecta are sister species (Ko et al. 2003; Parsch 2003);
(October 2004 assembly; sequencing by Agencourt) at http://
D. ananassae served as the outgroup.
rana.lbl.gov/drosophila/. These regions were then analyzedby BLAT to identify putative orthologous regions of the D. melanogaster genome. This resulted in a putative orthologousregion from D. melanogaster, D. yakuba, and D. erecta for each
candidate, along with the gene annotation derived from ourEST/RACE data and computational analysis for either D.
D. yakuba and D. erecta accessory gland cDNA libraries
yakuba or D. erecta. Finally, we attempted to isolate a syntenic
and ESTs: Accessory glands from 100 D. yakuba males (line
region from D. ananassae ( July 2004 assembly; sequencing by
Tai18E2) and 45 D. erecta males (line 14021-0224.0) were
Agencourt) for each candidate. Generally, this was more
dissected in RNA-Later (Ambion, Austin, TX). Total accessory
difficult (and not always successful), probably because of
gland RNA was isolated using the Ambion mirVana miRNA kit
greater sequence divergence, and often required investigation
and RNAsed (Ambion DNA-Free kit). RACE-ready cDNA was
of larger genomic regions, occasionally up to 10–15 kb. Each
synthesized from 2 mg of each prep [Invitrogen (San Diego)
gene region identified from a D. yakuba or a D. erecta accessory
GeneRacer kit; the SSIII module and oligo(dT) primer were
gland EST was investigated in detail in the corresponding
used for the RTstep]. The resulting cDNA was amplified (eight
region of the other species. This entailed pairwise alignments
cycles for D. erecta; five cycles for D. yakuba) using the Roche
using the Martinez/Needleman-Wunsch algorithm as imple-
Expand High Fidelity PCR System. Amplified libraries were
mented in DNASTAR and/or multispecies alignments using
ClustalW v. 1.82. In many cases, there was no DNA in other
species corresponding to the gene of interest. In other cases,there was apparently a homologous sequence, but no obvious
Summary of inferred phylogenetic distributions of genes
conserved open reading frame (ORF). For the latter, we com-
putationally investigated the genomic sequence in the homol-
ogous region to determine protein-coding capacity and whetherany putative proteins showed sequence similarity or similar
protein lengths relative to the candidate, or whether a pre-
dicted protein had a predicted signal sequence. In a few cases,
these investigations revealed evidence for highly diverged
orthologous genes, likely Acp’s, which would have gone un-
detected on the basis of the alignment of DNA sequences.
Population genetic analysis: Molecular population genetic
data were collected for several D. yakuba- and/or D. erecta-specific genes. High-fidelity PCR was used to amplify Acp’s
from multiple D. yakuba isofemale lines and a single D. teissieri
isofemale line (provided by P. Andolfatto and M. Long, re-
spectively). These PCR products were cloned and subjected to
colony PCR. A single allele was isolated and sequenced fromeach line. Summary statistics and tests of neutral evolution
were generated by use of DnaSP (Rozas et al. 2003). Sequence
data for the population genetics analysis can be found under
GenBank accession nos. DQ318145–319181.
Signal sequence potential of D. melanogaster intergenic and
intronic sequences: Intergenic sequences (defined as sequen-
SignalP probabilities and lengths are from D. yakuba for D.
ces between two adjacent genes, independent of a strand) and
yakuba-derived genes and from D. erecta for D. erecta-derived genes.
introns were obtained from release 4.1 of the D. melanogastergenome. Introns were parsed to mask known exons embed-ded within them. RepeatMasker (S
more detail below. None are associated with repetitive
used to mask repetitive elements of intergenic and intronic
sequences; all show male-specific expression as deter-
sequences. A Perl script was used to identify single-exon ORFs
mined by RT–PCR on templates generated from RNA
in the remaining DNA. An ORF was defined as a continuous
isolated from whole adult males or females. Syntenic
sequence starting with an ATG that extends at least 40 codons
alignments of these putative lineage-restricted genes
and ends with the first termination codon. ORFs from bothstrands and all reading frames were included in the data
and orthologous regions can be found in Supplemental
set. SignalP version 3.0 was used to predict the presence or
Data B at http://www.genetics.org/supplemental/; pu-
absence of signal peptides, which are characteristic of secreted
tative CDS regions are in boldface type with the excep-
proteins (Bendtsen et al. 2004). SignalP employs two meth-
tion of Gene144, for which the transcript is in boldface
ods, a neural network method and a hidden Markov model, for
type; introns are underlined. Table 1 summarizes inferred
detecting signal sequences. We accepted that an ORF had asignal sequence if both the neural network and hidden
phylogenetic distributions of putative lineage-restricted
Markov model (posterior probability $ 0.95) predicted that
genes and some physical properties of the gene/pro-
tein, including the probability that the predicted aminoacid has a signal sequence, which is frequently found inAcp’s (Swanson et al. 2001). Table 2 presents the results
of BLAST analysis of several D. yakuba accessory gland
Many of our D. yakuba/D. erecta accessory gland ESTs
ESTs corresponding to putative novel genes compared
returned highly significant BLAST hits to annotated D.
to the genomes of D. yakuba (April 2004 assembly), D.
melanogaster genes or proteins. These were not consid-
melanogaster (release 4.2.1), D. erecta (August 2005 assem-
ered further. Several ESTs had highly significant BLAST
bly), and D. ananassae (August 2005 assembly). Table 3
hits to unannotated D. melanogaster sequence (as well as
provides summary statistics of D. yakuba polymorphism
to D. yakuba and D. erecta genomic sequence). On the basis
and divergence to D. teissieri for five genes.
of the conserved location and organization of an open
Putative lineage-restricted genes identified from D.
reading frame and the presence of a strongly predicted
yakuba accessory gland ESTs: Acp134 codes for a pre-
signal sequence in either D. yakuba or D. erecta and D.
dicted protein of 35 residues. This gene is represented in
melanogaster, we consider 20 genes to be candidates
the D. yakuba testis ESTcollection (CV785591, CV785729,
for previously unknown Acp’s that are shared among
CV786139), probably as a result of low-level contami-
melanogaster subgroup species [supplemental Data A
nation of the testis dissection with accessory gland
at http://www.genetics.org/supplemental/ presents the
tissue. Acp134 returns no significant BLAST results vs.
putative D. melanogaster protein-coding sequence (CDS)
D. melanogaster, D. erecta, or D. ananassae. The putative
for each gene]. However, additional empirical work
syntenic alignments for the D. yakuba Acp134 region with
would be required to solidify their status as such.
D. melanogaster, D. erecta, and D. ananassae suggest that
Accessory gland ESTs for which we failed to find
there are no plausible orthologous protein-coding regions
putative orthologs in other species are presented in
in D. melanogaster, D. erecta, or D. ananassae that correspond
putative syntenic alignment between D. yakuba and D. ananassae is presented in supplemental data at http://
BLASTn results (default parameters) of D. yakuba ESTs from
www.genetics.org/supplemental/. However, the quality
putative orphans to the D. yakuba genome (two best hits), to
other melanogaster subgroup species genomes (best hit),
of this alignment leads us to consider the status of the
and annotation of the corresponding microsyntenic
Acp223 codes for a predicted protein of 116 residues.
It is located between the D. yakuba orthologs of Obp56f
and Obp56e. Indeed, the organization of the three genes
is similar, which together with their physical location,suggests that they are paralogous. D. erecta also has a
copy of Acp233. D. yakuba Acp223 is more highly diverged
from the D. yakuba Obp56e and Obp56f genes than these
genes are from one another. A partial, homologous D.
melanogaster ORF appears to be present; however, it
codes for a predicted protein of only 44 residues, which
leaves it with questionable status in D. melanogaster
(Supplemental Data B at http://www.genetics.org/
supplemental/). A syntenic alignment of the putative
D. ananassae orthologous region with D. yakuba pro-
vides no evidence for a D. ananassae copy of Acp223.
Acp224 codes for a predicted protein of 231 residues
in D. yakuba and is located within an intron of CG31757. An alignment of the orthologous region from D. erectareveals that the reading frame starting with the D. yakuba
to D. yakuba Acp134. Moreover, a computational analysis of
initiation codon codes for a predicted protein of 75
these orthologous regions also revealed no potential
residues. However, the fact that the D. yakuba gene and
genes that were plausible orthologs. These data strongly
the putative D. erecta ortholog are extremely divergent in
suggest that Acp134 is present only in D. yakuba.
terms of length and sequence casts some doubt on the
Acp225 codes for a predicted protein of 121 residues.
status of the D. erecta gene. To address this uncertainty,
The syntenic alignment strongly suggests that there is
we used RACE on accessory gland cDNA to isolate the
no ortholog of Acp225 in D. melanogaster or D. erecta. A
ends of the D. erecta gene. The RACE results revealed
small ORF (36 bp) in D. erecta in the region near the first
that there is an apparently orthologous D. erecta tran-
exon of D. yakuba Acp225 is clearly not orthologous. A
script, which codes for two potential ORFs (89 codons
D. yakuba/D. teissieri population genetics data for putative orphans
n is the number of D. yakuba alleles sampled. For D. teissieri, n ¼ 1 for all loci. Genes are on chromosome arm
2R with the exception of Acp225, which is on 3R. Divergence estimates are Jukes–Cantor corrected.
and the aforementioned 75 codons) that share the same
RACE), along with the absence of a genomic poly(A)
reading frame (but different initiation codons). The
sequence downstream of the transcript, suggests that it is
shorter ORF has a more strongly predicted signal se-
not the result of genomic contamination. We unsuccess-
quence, which suggests that it is the more likely candi-
fully attempted to amplify the homologous region by
date. Acp224 is the only putative Acp from our study that
RT–PCR using RNA isolated from whole D. melanogaster
has a recognizable functional domain based on an NCBI
males. This failure is consistent with the idea that this gene
conserved domain search (Marchler-Bauer et al. 2003).
is not present in each of the melanogaster subgroup species.
The D. yakuba copy has three predicted Kazal-type serpin
Acp157a codes for a 112-residue-long predicted pro-
domains, while the D. erecta copy has one such predicted
tein. An alignment of the D. yakuba Acp157a region to
domain. Serpin domains have previously been observed
orthologous regions of the D. erecta and D. melanogaster
in Drosophila Acp’s (Swanson et al. 2001; Mueller et al.
genomes shows that D. erecta contains an ortholog, while
2004). Syntenic alignments of D. yakuba Acp224 region
D. melanogaster does not. A similar alignment to the pu-
vs. D. melanogaster and D. ananassae (Supplemental Data
tative orthologous region of the D. ananassae assembly
B at http://www.genetics.org/supplemental/) strongly
strongly suggests that the gene is not in this species.
suggest that the gene is absent from these species. Thus,
Thus, Acp157a is likely a D. yakuba/D. erecta-specific gene.
Acp224 is likely a very rapidly evolving D. yakuba/D.
D. yakuba, but not other species, harbors a nearby, recent
duplication (4 kb 59) of Acp157a. However, this dupli-
Acp158 codes for a predicted protein of 71 residues.
cation has no long open reading frame, suggesting that
Syntenic alignments of orthologous regions in D. mela-
it is a D. yakuba-specific pseudogene.
nogaster and D. erecta provide no evidence of an ortho-
Putative lineage-restricted genes identified from D.
logous gene in these species. This gene is located within
erecta accessory gland ESTs: Acp100 codes for a pre-
an intron of Pkc53E. Another putative Acp, Acp133, which
dicted protein of 190 residues. A potential highly di-
is likely shared in D. melanogaster, D. yakuba, and D. erecta,
verged D. yakuba ortholog is present. This D. yakuba gene
is located 1.2 kb 59 of Acp158 in D. yakuba, also in a
shares the putative D. erecta initiation codon, but with a
Pkc53E intron. Acp133 and Acp158 code for proteins of
predicted length of 263 residues, is significantly longer
roughly equal length (62 and 71 residues, respectively)
than the predicted D. erecta protein. Both species share a
and both are composed of two small exons and one
canonical polyadenlyation signal downstream of their
small intron. These similarities, along with their physical
putative stop codons. A syntenic alignment between D.
proximity, suggest the possibility that the two genes are
erecta and D. melanogaster suggests that the gene is absent
related by duplication. However, their predicted protein
from the latter. We were unable to generate a convincing
sequences are too highly diverged to provide strong
syntenic alignment with D. ananassae.
evidence of homology. The data are consistent with the
Gene 37 codes for a predicted protein of 80 residues.
idea that Acp158 is a highly diverged duplication of
This protein does not have a predicted signal sequence,
Acp133 that is present only in D. yakuba. This implies
casting some doubt on its status as an Acp. Syntenic align-
either that Acp158 is a recent duplication that has di-
ments to D. yakuba, D. melanogaster, and D. ananassae sug-
verged incredibly rapidly or that Acp158 is an old dupli-
gest that this gene is D. erecta specific. We computationally
cation that has been lost multiple times in the melanogaster
discovered a second putative open reading frame (single
subgroup. Alternatively, it is possible that these two genes
exon, 210 residues) that is 39 of gene 37 and coded on the
are not paralogous. The alignment of the D. yakuba
opposite strand (the putative CDS is annotated by left-
Acp158 region with the putative orthologous region of
facing arrows in the supplemental data alignment at
D. ananassae suggests that neither it nor Acp133 is pres-
http://www.genetics.org/supplemental/). This second pu-
ent in this species, although some uncertainty regarding
tative gene, which contains a strongly predicted signal
the alignment means that this conclusion should be
sequence and a predicted fibrinogen domain, overlaps
gene 37 (their putative 39-ends overlap). The best hit in
Gene144 has a single exon. The protein-coding po-
a BLASTp analysis of this second gene to D. melanogaster
tential of this gene is unclear. Transcript data from our
proteins is to CG30281 (6e-36, 40% identity). CG30281
original cDNA clone and RACE experiments suggest the
is associated with the gene ontology terms ‘‘receptor
possibility of three open reading frames, two of which
binding’’ and ‘‘defense response.’’ It appears to be D.
start with methionine and code for predicted proteins of
erecta specific. However, we were unable to generate a D.
14 residues and one of which starts with isoleucine and
erecta RT–PCR product, which casts doubt on its status.
codes for a predicted protein of 39 residues (which is not
Population genetics of lineage-restricted Acp’s: We
predicted to have a signal sequence). None of the three
collected polymorphism and divergence data from several
open reading frames is conserved in D. melanogaster,
D. yakuba/D. erecta-specific putative Acp’s to investigate
although there is apparently orthologous genomic se-
mechanisms of protein evolution between D. yakuba and
quence. This is likely not an Acp, and may not be a protein-
D. teissieri (Table 2). The data, pooled across genes,
coding gene (e.g., Tupyet al. 2005). However, the fact that
reject the null (neutral) model (Kimura 1983) in the
we isolated this putative transcript twice (cDNA clone and
direction of adaptive protein divergence (McDonald
and Kreitman 1991); however, only one gene, Acp158, is
sequence. Acp’s have several features that make this sug-
individually significant. Removing the data from Acp158
gestion worth considering. First, they tend to have short
yields a nonsignificant test on data from the remaining
open reading frames, of which there are huge numbers
genes (P ¼ 0.17). Thus, although the rates of protein
in noncoding genomic sequence. Second, as secreted
divergence reported here are high compared to most
proteins, a signal sequence is the primary functional
Drosophila genes (e.g., Begun 2002; Richards et al.
element. Although signal sequences tend to be hydro-
2005), there is no strong support for recent, recurrent
phobic and a-helical (Doudna and Batey 2004), the
directional selection on these genes overall.
amino acid sequences are not always highly conserved(Nielsen et al. 1997). Third, Acp’s frequently have noknown functional domains apart from their signal
sequences (Swanson et al. 2001; Mueller et al. 2005;
We discovered several genes, many of which are likely
Wagstaff and Begun 2005b), which is consistent with
Acp’s, that have a lineage-restricted distribution in the
the potential for a large degree of functional and evo-
melanogaster subgroup. Each lineage-restricted gene de-
lutionary lability. Finally, seminal fluid function may be
scribed here could be explained in two ways: (i) as a novel
under stronger or more frequent directional selection
gene gained in D. yakuba, D. erecta, or their common an-
than many other biological functions, which may make
cestor or (ii) as multiple losses of a gene. One’s intuition is
it more likely for novel Acp’s to invade populations.
that gains of novel genetic functions are much less likely
Unannotated portions of eukaryotic genomes (and,
than losses. The problem with this formulation is that it
indeed, random DNA sequences) contain many short
raises the question, How many losses must one invoke
(e.g., 30–100 residues) open reading frames. A fraction
before entertaining the hypothesis of gene gain as equally
of new mutations, most of which are likely deleterious
(or more) parsimonious? Regardless of the conclusion
(Hahn et al. 2003), may create promoters near such
for any particular Acp, it seems unreasonable to repeat-
ORFs, thereby driving their expression, even if at a low
edly invoke multiple losses and disallow occasional gains,
level. Moreover, the consensus, highly conserved animal
as this would imply that ancestral seminal fluid function
polyadenylation signal AATAAA (Zhao et al. 1999) is
is being lost from Drosophila, which seems unlikely.
short, simple, and, therefore, common. Thus, at muta-
Thus, we favor the interpretation that some of the
tion-selection balance there is likely a large pool of small
orphan genes described here are newly evolved.
open reading frames (many of which possess signal
What are plausible mechanisms for the origin of novel
sequences) that are a short mutational distance from del-
Acp’s? One possibility is duplication and divergence
eterious expression and translation. Occasionally, how-
(Holloway and Begun 2004; Mueller et al. 2005). For
ever, a ‘‘spuriously’’ expressed ORF coding for a small,
example, Acp158, which appears to be present only in D.
secreted peptide could be recruited into a novel function
yakuba, may be a highly diverged duplicate of Acp133,
which is present in D. melanogaster, D. yakuba, and D.
To investigate the plausibility of this scenario, we car-
erecta. However, most of our orphans cannot be explained
ried out an analysis of the signal peptide-coding potential
this way (Table 2), as BLASTresults support the idea that
of the intergenic and intronic portions of the D. mela-
they are unique. This is consistent with previous analyses
nogaster reference sequence. We found that Repeat-
of the D. melanogaster genome suggesting the presence
Masked D. melanogaster intergenic sequence harbors
of few recent Acp duplications (Holloway and Begun
174,779 open reading frames of $40 residues. Of these,
2004; Mueller et al. 2005). An alternative possibility is
we conservatively estimate that 6071 (3.5%) have a
that novel genetic functions can be co-opted from previ-
strongly predicted signal sequence (SignalP, hidden
ously noncoding sequence. Such phenomena have been
Markov model P . 0.95 and positive neutral network pre-
observed before. For example, the recently evolved D.
diction). The corresponding numbers for introns are
melanogaster gene, Sdic, is partially derived from an intron
53,003 ORFs and 1963 strongly predicted signal sequen-
of a cytoplasmic dynein gene (Nurminsky et al. 1998).
ces (3.7%). Although a small fraction of these ORFs
In nototheneoid fishes, intronic sequence from an an-
may be previously undescribed genes or exons, it seems
cestral trypsinogen gene has been co-opted into protein-
more likely that we should conclude that the coding
coding function in a descendant antifreeze protein (Chen
potential for novel, small, secreted peptides in Drosophila
et al. 1997). Such examples support the plausibility of the
noncoding DNA is impressively large. Recent reports
recruitment of ancestral noncoding sequence into coding
that a surprisingly high fraction of eukaryotic genomes
function. For the genes described here, however, there
is transcribed (Bertone et al. 2004; Stolc et al. 2004,
is neither evidence for partial derivation from ancestral
2005) would favor the mutation-selection-recruitment
protein-coding sequence nor evidence of association
model for the origin of small peptides. Direct support for
with transposable elements or other repetitive sequences.
this model could be best obtained through the discovery
These observations raise the question of the plausi-
of small, novel, polymorphic proteins in populations.
bility of the birth of novel Acp’s entirely from small
It seems clear that Acp’s are much more likely than
open reading frames present in ancestrally noncoding
most other genes to have lineage-restricted distributions.
The proximate and ultimate explanations for this pat-
tern are unclear, although, in principle, the small size of
of jingwei, a chimeric processed functional gene in Drosophila. Science 260: 91–95.
Acp’s and the fact that they may be under unusually
Marchler-Bauer, A., J. B. Anderson, C. DeWeese-Scott, N. D.
strong directional selection may contribute to a rapid
gain of seminal fluid proteins. Comparative functional
base of conserved domain alignments. Nucleic Acids Res. 31:383–387.
analysis of Acp’s, including the lineage-restricted genes
described here, could greatly illuminate their evolution-
tion at the Adh locus in Drosophila. Nature 351: 652–654.
Mueller, J. L, D. R. Ripoll, C. F. Aquadro and M. F. Wolfner,
Comparative structural modeling and inference of con-
M. Levine, S. Schaeffer, and two anonymous reviewers provided
served protein classes in Drosophila seminal fluid. Proc. Natl.
useful comments. This work was supported by National Science
Foundation grant DEB-0327049 and National Institutes of Health
Mueller, J. L., K. RaviRam, L. A. McGraw, M. C. Bloch Qazi, E. D.
Cross-species comparison of Drosophila male ac-
cessory gland protein genes. Genetics 171: 131–143.
nogaster females require a seminal fluid protein, Acp36DE, to
store sperm efficiently. Genetics 153: 845–857.
Nielsen, H., J. Engelbrecht, S. Brunak and G. von Heijne,
Positive selection drives the evolution of the
Identification of prokaryotic and eukaryotic signal peptides
Acp29AB accessory gland protein in Drosophila. Genetics 152:
and prediction of their cleavage sites. Protein Eng. 10: 1–6.
Nielsen, R., C. Bustamante, A. G. Clark, S. Glanowski, T. B. Sackton
Sexual Selection. Princeton University Press,
A scan for positively selected genes in the genomes
of humans and chimpanzees. PloS Biol. 3(6): e170.
Protein variation in Drosophila simulans and com-
Nurminsky, D. I., M. V. Nurminskaya, D. DeAguiar and D. L.
parison of genes from centromeric versus non-centromeric re-
Selective sweep of a newly evolved sperm-specific
gions of chromosome 3. Mol. Biol. Evol. 19: 201–203.
gene in Drosophila. Nature 396: 572–575.
Evolution by Gene Duplication. Springer-Verlag, Berlin.
Acp complement in the melanogaster subgroup of Drosophila.
Selective constraints on intron evolution in Dro-
Begun, D. J., P. Whitley, B. Todd, H. Waldrip-Dail and A. G.
Richards, S., Y. Liu, B. B. Bettencourt, P. Hradecky, S. Letovsky
Molecular population genetics of male accessory
Comparative genome sequencing of Drosophila pseu-
gland proteins in Drosophila. Genetics 156: 1879–1888.
doobscura: chromosomal, gene, and cis-element evolution. Genome
Bendtsen, J. D., H. Nielsen, G. von Heijne and S. Brunak,
Improved prediction of signal peptides: SignalP 3.0.
Rozas, J., J. C. Sanchez-DelBarrio, X. Messegyer and R. Rozas,
DnaSP, DNA polymorphism analysis by the coalescent
Bertone, P., V. Stolc, T. E. Royce, J. S. Rozowsky, A. E. Urban et al.,
and other methods. Bioinformatics 19: 2496–2497.
Global identification of human transcribed sequences
Smit, A. F. A., R. Hubley and P. Green, 1996–2004 RepeatMasker
with genome tiling arrays. Science 306: 2242–2246.
Open-3.0 (http://www.repeatmasker.org).
Stolc, V., Z. Gauhar, C. Mason, G. Halasz, M. F. van Batenburg
posed gene with specific male expression under positive Darwin-
A gene expression map for the euchromatic genome
ian selection. Genetics 164: 977–988.
of Drosophila melanogaster. Science 306: 655–660.
Birkhead, T. R. and A. P. Moller (Editors), 1998
Stolc, V., M. J. Samanta, W. Tongpsait, H. Sethi, S. Liang et al.,
and Sexual Selection. Academic Press, San Diego.
Identification of transcribed sequences in Arabidopsis thali-
ana by using high-resolution genome tiling arrays. Proc. Natl.
seminal fluid proteins of male Drosophila melanogaster fruit
Swanson, W. J., A. G. Clark, H. Waldrip-Dail, M. F. Wolfner and
Chen, L., A. L. DeVries and C-H. C. Cheng, 1997
Evolutionary EST analysis identifies rapidly
tifreeze glycoprotein from a trypsinogen gene in Antarctic noto-
evolving male reproductive proteins in Drosophila. Proc. Natl.
thenioid fish. Proc. Natl. Acad. Sci. USA 94: 3811–3816.
recognition particle. Annu. Rev. Biochem. 73: 539–557.
essential for sperm storage in Drosophila melanogaster. Genetics
Sexual Selection and Animal Genitalia. Harvard
evolution of a gene of male reproduction, Acp26Aa of Drosoph-
are positively correlated with developmental timing of expres-
ila. Mol. Biol. Evol. 14: 544–549.
sion during mouse spermatogenesis. Mol. Biol. Evol. 22: 1044–
Tupy, J. L., A. M. Bailey, G. Dailey, M. Evans-Holm, C. W. Siebel
Identification of putative noncoding polyadenylated
Hahn, M. W., J. E. Stajich and G. A. Wray, 2003
transcripts in Drosophila melanogaster. Proc. Natl. Acad. Sci.
against spurious transcription factor binding sites. Mol. Biol.
accessory gland protein genes in Drosophila melanogaster and D.
ulation genetics of duplicated accessory gland protein genes in
pseudoobscura. Mol. Biol. Evol. 22: 818–832.
Drosophila. Mol. Biol. Evol. 21: 1625–1628.
Kent, W. J., C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle
netics of accessory gland protein genes and testis-expressed genes
The human genome browser at UCSC. Genome Res.
in Drosophila mojavensis and D. arizonae. Genetics 171: 1083–1101.
Zhang, Z., T. M. Hambuch and J. Parsch, 2004
Kern, A. D., C. D. Jones and D. J. Begun, 2004
of sex-biased genes in Drosophila Mol. Biol. Evol. 21: 2130–2139.
tion genetics of male accessory gland proteins in the Drosophila
simulans complex. Genetics 167: 725–735.
ends in eukaryotes: mechanism, regulation, and interrelation-
The Neutral Theory of Molecular Evolution. Cambridge
ships with other steps in mRNA synthesis. Microbiol. Mol. Biol.
Ko, W. Y., R. M. David and H. Akashi, 2003
of the Drosophila melanogaster species subgroup. J. Mol. Evol. 57: 562–573.
EDITAL DE PROCESSO SELETIVO PÚBLICO N.° 002/2013 PROVA: CONHECIMENTOS GERAIS E ESPECÍFICOS Este caderno de prova é composto de 20 (vinte) questões de múltipla escolha, assim 05 (cinco) questões de Português; 15 (quinze) questões de Conhecimentos Específicos; Você recebeu: Caderno de Prova. Cartão-resposta. Caso o CADERNO DE PROVA esteja incompleto ou te
A logical approach to represent and reason about calendars Department of Computer Science, University of Verona, ItalyDepartment of Sciences, University ‘G. D’Annunzio’ of Pescara, ItalyDepartment of Physical Sciences, University ‘Federico II’ of Napoli, Italy Abstract • Expressiveness . The class of granularities representedin the formalism should be large enough to be of