Simons Simplex Collection: Revealing the Genetic Underpinnings of Autism
The power of the Simons Simplex Collection (SSC) to clarify the genetic basis of autism spectrum disorder (ASD) has been made abundantly clear over the last two years. Several published studies, and one large study that will likely be published in the second half of 2014, show conclusively that analysis of the DNA samples in the SSC will identify dozens, if not hundreds, of individual genetic mutations (genetic errors) that are strong risk factors for ASD. These landmark findings are a testament to the creativity of the researchers, as well as to the inspiring commitment of the more than 2,600 families who took part in the SSC.
The most recent SSC@IAN report on the genetics of ASD outlined findings published in 2011, which involved the identification of de novo, or spontaneous, copy number variants (CNVs) — chunks of DNA that are either missing or duplicated in the DNA of the child with ASD but not in his or her unaffected parents and siblings.1,2 Although this effort has since been deepened3, and an analysis of the genes affected by these CNVs highlighted biological pathways that may be disrupted in ASD4, it fell short of the ultimate goal. As large CNVs tend to delete or duplicate more than one gene, this approach cannot definitively implicate individual genes in ASD risk most of the time. To accomplish that, another strategy was needed.
The DNA sequencing revolution
When researchers announced the sequencing of a complete set of human genetic material (a human genome) in 2001, it was the culmination of a multi-year, multi-billion dollar effort that has since transformed the study of biology. But it soon became clear that the full benefits of this accomplishment could only be applied to clinical practice if many thousands of individual genomes were sequenced and compared. And that would only be feasible if the cost of sequencing fell dramatically.
Thirteen years later, thanks to rapid advances in technology, the cost of sequencing an entire human genome is now less than $5,000. The cost of sequencing just the protein-coding portion of the genome, or the ‘exome’ — the 20,000 genes that together make up just 1.5 percent of the genome — is now less than $500, and 100 exomes can be sequenced in just a week. This more focused effort of exome sequencing is particularly attractive to biologists because it is generally much easier to understand the effects of mutations that change the sequence of a gene than those mutations found in the rest of the genome. And most important to the effort to understand the genetic basis of ASD, if such mutations are found, they implicate individual genes directly.
As such, in 2011, three groups took the first step in the effort to sequence the exomes of all of the individuals in the SSC. They were led by Michael Wigler of Cold Spring Harbor Laboratory in Long Island, New York, Matthew State, now at the University of California, San Francisco, and Evan Eichler of the University of Washington, all of whom were involved in the previous work on CNVs. Their initial results on approximately 730 SSC families were published in three papers5-7 in April 2012, and were accompanied by a complementary paper8 from Mark Daly and his colleagues at Massachusetts General Hospital, who reported exome-sequencing results from families outside of the SSC.
The most important finding was that de novo mutations likely to disrupt a gene’s function occur approximately twice as often in the children with ASD than in their unaffected siblings. Most importantly, because the frequency of such mutations was so low in the unaffected children, the identification of such mutations in the same gene in only two or three affected individuals was sufficient to establish those genes as contributing to ASD risk with high confidence. This is a crucial point, and highlights the real value of a simplex collection: Many of the genes identified to date as being involved in ASD rest on a body of evidence that is substantially weaker than that compiled for genes implicated by exome sequencing of the SSC.
All told, the four groups found five genes to be mutated in more than one affected child in the initial round of sequencing. Spurred on by these results, the Wigler, State and Eichler labs completed exome sequencing of the entire SSC. When the paper describing the results is published later this year, it is expected to identify at least 30 such ‘recurrently’ mutated genes, which will offer an unprecedented glimpse at the genetic underpinnings of ASD.
In addition to these initial ‘recurrent’ 30, the researchers have estimated that the complete collection will reveal roughly 300 genes as harboring a de novo disruptive mutation in one affected child. They predict about half of these will turn out to be bona fide risk genes, eventually to be confirmed by searching for ‘second hits’ in additional families. Fortunately, there is a clear path forward for doing this, established by Eichler and his University of Washington colleague Jay Shendure. Their method allows for the rapid and inexpensive sequencing of candidate genes in large numbers of samples, and proof of its effectiveness was established in a 2012 paper in Science.9 They sequenced 44 top candidate genes from the SSC in 2,400 non-SSC individuals affected by ASD, and reported enough de novo mutations in 6 of the genes to put them on the list of confirmed risk genes.
Biological themes in autism
Identifying relevant genetic risk factors is of course only a means to an end, the ultimate goal being a better understanding of the biological origins of ASD in the developing brain. The initial results from CNVs and exome sequencing have already enabled scientists to develop new ideas. For example, the Wigler lab has made the intriguing observation that there is a close biochemical connection between many ASD risk genes and a protein called FMRP.7,10 Mutations in the gene encoding FMRP cause fragile X syndrome, which is the most common inherited form of intellectual disability, and is frequently accompanied by ASD. The role of FMRP is to regulate the production of protein from 800-900 genes in the human genome, and Wigler predicts that roughly half of ASD risk genes are likely to be targeted by FMRP. This suggests that a better understanding of FMRP will shed light not only on fragile X syndrome, but on other forms of ASD as well.
The Eichler lab has noted other common functions of these newly identified ASD risk genes. All cells in the body communicate with each other through specific molecules that bind to receptors on their surface and result in a change in cell behavior. One of the best-studied families of such molecules goes by the name ‘Wnt,’ and Eichler and his colleagues have argued that many ASD risk genes are associated with the Wnt pathway of cell-to-cell communication.11
Wnt-related molecules have clear roles in the function of synapses, the key components of neuron-to-neuron communication, which studies of CNVs have implicated in ASD.4 Neurons themselves are nerve cells found in the brain and elsewhere in the nervous system that allow for the transmission of signals. Interestingly, Wnts also have a role in regulating the number of neurons that are produced in the developing brain, which may help to explain recent observations of differences in neuron number and overall head size in at least some individuals with ASD.
Finally, Matthew State’s group, in collaboration with several other labs that have participated in the development of the SSC, has addressed one of the most pressing questions in ASD research: When and where do mutations in ASD risk genes cause brain development to be diverted from its typical course? They selected nine of the high-confidence risk genes that emerged from the SSC and, using a recently established database of gene expression in the developing human brain, asked when and where the expression of these genes changes in the same direction. By ‘expression,’ we mean the times and places when a gene actually produces its protein product, and is not silent. The assumption is that genes whose expression is going up or down at the same time and place are probably carrying out common functions.
State and his colleagues found that the co-expression of these ASD risk genes is strongest in a particular class of neurons in the deepest layers of the developing prefrontal cortex during the middle period of fetal development.12 While this is almost certainly not the only time and place in the brain that ASD genes act in concert, this finding gives researchers a place to start testing hypotheses by disrupting these genes in specific cell types in experimental animals. Such experiments should also get us closer to the development of new therapies that are rationally designed based on this new knowledge.13
The contribution of SSC families
At the end of just about every scientific paper, there is a brief section called “Acknowledgments,” in which the researchers typically thank funding agencies and other parties who contributed to the work but are not listed as authors. The families who made the effort to contribute to this collection head the Acknowledgments list in every SSC-related paper. From conversations with the researchers themselves, we know that these few words at the end of their manuscripts understate how grateful they are for the contributions of the families. This ongoing collaboration promises to be a boon to autism research for many years to come.
- Sanders, S. J., Ercan-Sencicek, A. G., Hus, V., Luo, R., Murtha, M. T., Moreno De Luca, D. et al. (2011). Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron, 70, 863-885. View abstract.
- Levy, D., Ronemus, M., Yamrom, B., Lee, Y. H., Leotta, A., Kendall, J. et al. (2011) Rare de novo and transmitted copy-number variation in autistic spectrum disorders. Neuron, 70, 886-897. View abstract.
- Girirajan, S., Dennis, M. Y., Baker, C., Malig, M., Coe, B. P., Campbell, C. D. et al. (2013). Refinement and discovery of new hotspots of copy-number variation associated with autism spectrum disorder. Am. J. Hum. Genet. 92, 221-237. View abstact.
- Gilman, S. R., Iossifov, I., Levy, D., Ronemus, M., Wigler, M. & Vitkup, D. (2011). Rare de novo variants associated with autism implicate a large functional network of genes involved in formation and function of synapses. Neuron, 70, 898-907. View abstract.
- Sanders, S. J., Murtha, M. T., Gupta, A. R., Murdoch, J. D., Raubeson, M. J., Willsey, A. J. et al. (2012). De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237-241. View abstract.
- O’Roak, B. J., Vives, L., Girirajan, S., Karakoc, E., Krumm, N., Coe, B. P. et al. (2012). Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246-250. View abstract.
- Iossifov, I., Ronemus, M., Levy, D., Wang, Z., Hakker, I., Rosenbaum, J. et al. (2012). De novo gene disruptions in children on the autistic spectrum. Neuron 74, 1-15. View abstract.
- Neale, B. M., Kou, Y., Liu, L., Ma’ayan, A., Samocha, K. E., Sabo, A. et al. (2012). Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242-245. View abstact.
- O’Roak, B. J., Vives, L., Fu, W., Egertson, J. D., Stanaway, I. B., Phelps, I. G. et al. (2012). Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science 338, 1619-1622. View abstract.
- Ronemus, M., Iossifov, I., Levy, D. & Wigler, M. (2014). The role of de novo mutations in the genetics of autism spectrum disorders. Nat. Rev. Genet. 15, 133-141. View abstract.
- Krumm, N., O’Roak, B. J., Shendure, J. & Eichler, E. E. (2013). A de novo convergence of autism genetics and molecular neuroscience. Trends Neurosci. 37, 95-105. View abstract.
- Willsey, A. J., Sanders, S. J., Li, M., Dong, S., Tebbenkamp, A. T., Muhle, R. A. et al. (2013). Coexpression networks implicate human midfetal deep cortical projection neurons in the pathogenesis of autism. Cell 155, 997-1007. View abstract.
- Krystal, J. H. & State, M. W. (2014). Psychiatric disorders: diagnosis to therapy. Cell 157, 201-214. View abstract.
DNA sequence data image from the Talking Glossary of Genetic Terms.