Data & Databases – Genes to Genomes https://genestogenomes.org A blog from the Genetics Society of America Mon, 03 Jun 2019 02:00:22 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 https://genestogenomes.org/wp-content/uploads/2023/06/cropped-G2G_favicon-32x32.png Data & Databases – Genes to Genomes https://genestogenomes.org 32 32 Finding fresh mutations https://genestogenomes.org/finding-fresh-mutations/ Thu, 06 Jun 2019 12:00:33 +0000 https://genestogenomes.org/?p=52225 Improved duplex sequencing identifies spontaneous mutations in bacteria without long-term culturing. Spontaneous mutations are the driving force of evolution, yet, our ability to detect and study them can be limited to mutations that accumulate clonally. Sequencing technology often cannot identify very rare variants or discriminate between bona fide mutations and errors introduced during sample preparation.…]]>

Improved duplex sequencing identifies spontaneous mutations in bacteria without long-term culturing.


Spontaneous mutations are the driving force of evolution, yet, our ability to detect and study them can be limited to mutations that accumulate clonally. Sequencing technology often cannot identify very rare variants or discriminate between bona fide mutations and errors introduced during sample preparation. In GENETICS, Zhang et al. created an improved sequencing method to study low-abundance spontaneous mutations in the bacterium Escherichia coli.

To develop their method, the authors began with duplex sequencing, in which fragmented DNA molecules are tagged with an adaptor sequence for sequencing. This method is powerful, but at high read depths, it can erroneously call true mutations as PCR duplicates, making it ill-suited for finding rare mutations.

The authors first determined the error rate of the PCR step of duplex sequencing, where most experimental artifacts would be expected to occur. Because duplex sequencing can identify reads that came from the same parental DNA molecules (based on the adaptor sequences), the authors assumed that any such reads that had mismatches must have come from base changes during the PCR. By identifying these discrepancies, they determined the rates of different kinds of errors in the sequencing process.

The authors then sequenced E. coli genomes using a new method, which they termed improved duplex sequencing (IDS). IDS is similar to duplex sequencing, but it uses adaptor sequences of multiple different lengths. The use of more and different adaptor sequences minimizes the chance that two different DNA molecules that happen to break at the same place will be erroneously called as PCR replicates. By employing this method and accounting for the error rate of the PCRs, which they had already determined, the authors were able to confidently identify rare, random mutations in E. coli.

Having identified such mutations, the authors looked for patterns. They found that clusters of mutations occurred in regions of the genome that are known to be replication fork stopping regions. This is suggestive of transcriptional errors, as would be expected for spontaneous mutations. Interestingly, mutations in these hotspots were almost entirely in relatively unimportant regions of the genome—for instance, in the non-functional parts of tRNA genes. These vulnerable areas of the genome hint at mechanisms in E. coli that may protect more critical regions from damage.

CITATION:

Spatial Vulnerabilities of the Escherichia coli Genome to Spontaneous Mutations Revealed with Improved Duplex Sequencing

Xiaolong Zhang, Xuehong Zhang, Xia Zhang, Yuwei Liao, Luyao Song, Qingzheng Zhang, Peiying Li, Jichao Tian, Yanyan Shao, Aisha Mohammed AI-Dherasi, Yulong Li, Ruimei Liu, Tao Chen, Xiaodi Deng, Yu Zhang, Dekang Lv, Jie Zhao, Jun Chen, Zhiguang Li

Genetics October 2018 210: 547-558; https://doi.org/10.1534/genetics.118.301345

https://www.genetics.org/content/210/2/547

]]>
From sequence to centimeters: predicting height from genomes https://genestogenomes.org/from-sequence-to-centimeters-predicting-height-from-genomes/ Thu, 08 Nov 2018 14:51:09 +0000 https://genestogenomes.org/?p=27780 Machine learning and access to ever-expanding databases improves genomic prediction of human traits. In theory, a scientist could predict your height using just your genome sequence. In practice, though, this is still the stuff of science fiction. It’s not only your genes that affect height—environment also plays a role—but the larger problem is that height…]]>

Machine learning and access to ever-expanding databases improves genomic prediction of human traits.


In theory, a scientist could predict your height using just your genome sequence. In practice, though, this is still the stuff of science fiction. It’s not only your genes that affect height—environment also plays a role—but the larger problem is that height is affected by tens of thousands of individual genetic variations. This is also true of other complex traits, such as susceptibility to particular diseases. To get closer to accurate genomic prediction of human traits, geneticists are using new approaches to harness the vast amounts of sequence data becoming available. In GENETICS, Lello et al. describe a machine learning approach to the problem that allowed them to make predictions within a few centimeters of reality.

“To me, genomic prediction is the actual decoding of the genome,” says senior author Stephen Hsu from Michigan State University. A theoretical physicist by training, Hsu explains that his lab became interested in the problem of genomic prediction several years ago as the cost of genotyping continued to drop and more datasets became available. They had previously argued that they could predict complex traits, like height, if they only had enough data.The release of nearly 500,000 UK Biobank genotypes allowed them an opportunity to test this hypothesis.

A genomic prediction approach is quite different from the more familiar genome-wide association study (GWAS). GWAS methods test each SNP one at a time, looking for statistically significant contributions to the phenotype. In contrast, genomic prediction makes use of all SNPs at once in trying to build the best possible predictors.

The authors took the Biobank genotype and phenotype data and used a type of regression to identify the combination of SNPs that, taken together, best correlate with the trait of interest. Since only a subset of SNPs influence each trait—even the thousands of loci that control height are only a tiny fraction of the total number of SNPs identified —they also introduced a penalization factor that prevents the model from including unneeded SNPs. They were essentially trying to solve an optimization problem: identify the fewest number of variables (i.e. SNPs) that will allow for the best prediction about the outcome (i.e. trait).

Having generated their algorithm, the authors then put it to the test. They constructed models for height, heel bone density, and educational attainment, and they found that their algorithm worked well, particularly for height. For example, it produced a nearly 0.65 correlation with actual height, and predicted heights were usually within a few centimeters of actual heights. “Our predictor actually captures almost all the heritability that we could expect to find,” says Hsu.

With enough data, Hsu believes, accurate genomic prediction for complex traits will no longer be sci-fi. As more and more genotypes are obtained, Hsu predicts that this kind of prediction could be applied for most traits in as little as five years.

CITATION:

Accurate Genomic Prediction of Human Height

Louis Lello, Steven G. Avery, Laurent Tellier, Ana I. Vazquez, Gustavo de los Campos, Stephen D. H. Hsu

Genetics October 2018 210: 477-497; https://doi.org/10.1534/genetics.118.301267

http://www.genetics.org/content/210/2/477

]]>
What’s the cost of a slip in translation? https://genestogenomes.org/whats-the-cost-of-a-slip-in-translation/ Tue, 09 Oct 2018 12:00:03 +0000 https://genestogenomes.org/?p=24885 Programmed ribosomal frameshifting has translational costs that may influence codon usage bias. The genetic code has some redundancy—the same amino acid is often encoded by several codons. However, these codons are not necessarily equal in their effect, as evidenced by the codon usage bias observed in many organisms. The translation efficiency hypothesis posits that some…]]>

Programmed ribosomal frameshifting has translational costs that may influence codon usage bias.


The genetic code has some redundancy—the same amino acid is often encoded by several codons. However, these codons are not necessarily equal in their effect, as evidenced by the codon usage bias observed in many organisms. The translation efficiency hypothesis posits that some codons are more easily translated than others, and these are the ones more commonly used. Based on this hypothesis, the codon usage bias index of a given mRNA should correlate closely with its translation efficiency—but in a report in G3: Genes|Genomes|Genetics, Garcia et al. explain why this might not always be the case.

Programmed ribosomal frameshifting (PRF) occurs when a ribosome stalls at specific sequences—appropriately termed “slippery sites”—which shifts  translation to a new reading frame. The authors were specifically interested in “-1 PRF” cases, where the ribosome moves a single nucleotide backward during translation. This phenomenon is ubiquitous across the tree of life, and it is generally used by eukaryotes as a way to regulate gene expression, but the authors wondered if it could also induce translational costs.

To test this, they used the database PRFdb to examine associations between -1 PRF signals and gene expression in yeast, determining that -1 PRF signals are less common in highly expressed genes. They also found that these signals tend to be present towards the start of open reading frames, which makes sense if the -1 PRF signals are causing translational costs—if the signals occur towards the start of the mRNA, translation is disrupted more quickly, so less energy is wasted. These lines of evidence support the idea that PRF signals do incur translational costs.

The authors then retested the association between codon usage bias and translational efficiency while accounting for the cost of -1 PRF signals. Using a set of mathematical models, they found that incorporating this data generally strengthened support of the translational efficiency hypothesis—that is, more costly transcripts were less often translated, whether the cost was incurred due to codon usage bias or -1 PRF signals. Better understanding these phenomena may help elucidate how translational efficiency is controlled.

CITATION:

Accounting for Programmed Ribosomal Frameshifting in the Computation of Codon Usage Bias Indices

Victor Garcia, Stefan Zoller, Maria Anisimova

https://doi.org/10.1534/g3.118.200185

http://www.g3journal.org/content/8/10/3173


[wysija_form id=”1″]

]]>
Nanopore sequencing of 15 Drosophila genomes https://genestogenomes.org/nanopore-sequencing-of-15-drosophila-genomes/ Wed, 03 Oct 2018 14:30:12 +0000 https://genestogenomes.org/?p=24514 Low-cost sequencing closes gaps in fly genomes. Genetic sequencing technologies have revolutionized biological science, and regular advances in these tools continue to deliver better genomic data—more accurate and more useful—at a lower cost. In G3: Genes|Genomes|Genetics, Miller et al. report the genomes of 15 Drosophila species sequenced using Oxford Nanopore technology. Their work improves on…]]>

Low-cost sequencing closes gaps in fly genomes.


Genetic sequencing technologies have revolutionized biological science, and regular advances in these tools continue to deliver better genomic data—more accurate and more useful—at a lower cost. In G3: Genes|Genomes|Genetics, Miller et al. report the genomes of 15 Drosophila species sequenced using Oxford Nanopore technology. Their work improves on prior assemblies and describes how this technology can be feasibly applied in other labs.

Nanopore sequencing is an example of “third-generation” or “long-read” sequencing technology. In contrast to “next-generation” sequencing, which typically generates reads of a few hundred base pairs in length, the Nanopore approach can produce reads of several kilobases. This allows for better coverage and deeper sequencing, but it can also make the sequencing process more error-prone.

The authors used the Oxford Nanopore MinION to sequence 15 Drosophila species, all but one of which had been previously sequenced. They also resequenced the genome of Drosophila melanogaster and published their results in a separate report. When compared against reference sequences, their sequences captured a respectable amount of the published genomes (about 83% of the total sequences, on average). To correct for sequencing errors, they employed the polishing algorithms Racon and Pilon, which correct the genome sequences using reference Nanopore reads or Illumina reads, respectively. The polishing algorithms significantly increased assembly quality without altering other assembly statistics.

Because Nanopore sequencing produces longer reads, the authors wondered whether their data might be able to close gaps in existing reference sequences that were generated by short-read technology. By aligning short contigs from the reference genomes to their assemblies, they were able to fill ~61% of gaps in the reference genomes, demonstrating how the combination of newer and older technologies can increase the accuracy of genome builds.

The authors also describe how Nanopore technology can be readily applied in a variety of labs. They offer advice for sequencing and bioinformatics protocols. They found that using 1-10 µg of input DNA yielded better results than the factory-recommended 400 ng and that the de novo assembler miniasm used fewer computational resources than alternatives but produced comparable products. Excitingly, the material cost of sequencing of the reported Drosophila genomes was about $1000 USD, meaning that genome sequencing via Oxford Nanopore is likely feasible for labs of all sizes.

CITATION:

Highly Contiguous Genome Assemblies of 15 Drosophila Species Generated Using Nanopore Sequencing

Danny E. Miller, Cynthia Staber, Julia Zeitlinger, R. Scott Hawley

; https://doi.org/10.1534/g3.118.200160

http://www.g3journal.org/content/8/10/3131


[wysija_form id=”1″]

]]>
Enhancing our view of enhancers https://genestogenomes.org/enhancing-our-view-of-enhancers/ Wed, 01 Aug 2018 12:00:28 +0000 https://genestogenomes.org/?p=21155 GC content alone is associated with distinct functional classes of human enhancers. Because enhancers can be located hundreds of kilobases away from their target genes, it can be challenging to accurately predict their functions. A new report in GENETICS uses sequence composition to distinguish two enhancer classes that have distinct functions and spatial organization in humans.…]]>

GC content alone is associated with distinct functional classes of human enhancers.


Because enhancers can be located hundreds of kilobases away from their target genes, it can be challenging to accurately predict their functions. A new report in GENETICS uses sequence composition to distinguish two enhancer classes that have distinct functions and spatial organization in humans.

Enhancers are regulatory DNA sequences that aid in transcription initiation. In some ways, enhancers are like promoters, since both are bound by transcription factors as part of transcription initiation. Unlike promoters, which are located near the transcriptional start site of the genes they regulate, enhancers are sequentially far away from their targets, typically coming into long-distance contact with gene promoters via 3D DNA looping. Since it is difficult to identify enhancers through sequence information alone, our understanding of them is somewhat primitive compared with other DNA regulatory elements.

Lecellier, Wasserman, and Mathelier were interested in classifying enhancers based on their sequences. The percentage of a given sequence that is guanine and cytosine (the GC content or %GC) can be used to classify promoters, so they investigated whether a similar approach could be useful for enhancer classification. To perform this analysis, they took advantage of the FANTOM5 project, which recently cataloged tens of thousands of enhancers across the human genome.

The enhancers were divided into two simple groups: those with higher %GC and those with lower %GC than the median overall. The authors compared the properties of the two groups, finding that different transcription factors were predicted to be associated with each group. Each group was also associated with different DNA shapes (e.g. bending) and distinct localization in chromatin loops, suggesting that the enhancer sequence composition is linked to the 3D architecture of the chromatin.

The authors then examined whether the two groups of enhancers had distinct biological functions. By consolidating previous reports, they compiled lists of thousands of genes predicted to be targets of each class of enhancer, and they analyzed these genes as proxies for the biological functions of the enhancers across different cell and tissue types. They found that enhancers with a higher %GC were associated with ubiquitous gene expression, whereas enhancers with a lower %GC were associated with specific patterns of expression in particular subsets of cells.

In particular, lower %GC enhancers were linked to immune response genes. To test this association against experimental data, the authors used data obtained from dendritic cells infected with Mycobacterium tuberculosis. This data tracked changes in chromatin accessibility, which can be mediated by enhancer activity. They found that lower %GC enhancers were significantly more activated in infected cells, providing experimental support for their observations.

CITATION:

Human enhancers harboring specific sequence composition, activity, and genome organization are linked to the immune response

Charles-Henri LecellierWyeth W. Wasserman, Anthony Mathelier

http://www.genetics.org/content/209/4/1055

]]>
Remapping lab rats https://genestogenomes.org/remapping-lab-rats/ Thu, 05 Jul 2018 12:00:22 +0000 https://genestogenomes.org/?p=18881 For the first time in nearly 15 years, the rat genetic map has been updated. Genetic maps help us navigate uncharted data, but to successfully use them to link genes to complex traits, their resolution must be high enough to yield a manageable list of candidate variants. That’s why genetic maps for mice and humans…]]>

For the first time in nearly 15 years, the rat genetic map has been updated.


Genetic maps help us navigate uncharted data, but to successfully use them to link genes to complex traits, their resolution must be high enough to yield a manageable list of candidate variants. That’s why genetic maps for mice and humans have been routinely updated in recent years as mapping technologies have improved.

However, one important map has lagged: the genetic map for rats had not been updated since 2004. As such, the resolution of that map was 100 times lower than the mouse genetic map. Since rats are such an important experimental organism for understanding disease, Littrell et al. set out to construct a new, high-resolution genetic map for lab rats, which they published in G3: Genes|Genomes|Genetics.

With a nearly 50-fold improvement, the new map has a much higher resolution than the previous one. Additionally, the authors created sex-specific gene maps, which had not previously been available for rats. They also examined some particular features of these new maps, finding that rates of recombination were higher on average in females than in males, which is a phenomenon that occurs in many mammal species.

To make it even more useful, the authors also added other data to the map, including the locations of tens of thousands of SNPs. The hope is that this new view of the rat genome will allow geneticists to more effectively explore genetic modifiers of common diseases.

CITATION:

A High-Resolution Genetic Map for the Laboratory Rat

John Littrell, Shirng-Wern Tsaih, Amelie Baud, Pasi Rastas, Leah Solberg-Woods, Michael J. Flister
http://www.g3journal.org/content/8/7/2241

[wysija_form id=”1″]

]]>
ModERN treasure: hundreds of worm and fly transcription factor binding profiles cataloged https://genestogenomes.org/modern-treasure-hundreds-of-worm-and-fly-transcription-factor-binding-profiles-cataloged/ Mon, 21 May 2018 15:53:16 +0000 https://genestogenomes.org/?p=18292 Offshoot of the modENCODE project provides crucial data and strains for understanding gene regulation. Following a multidisciplinary effort spanning six institutions, researchers working on the modERN (model organism Encyclopedia of Regulatory Networks) project have released a powerful resource for biologists studying the fruit fly Drosophila melanogaster and the nematode worm Caenorhabditis elegans. So far, report Kudron,…]]>

Offshoot of the modENCODE project provides crucial data and strains for understanding gene regulation.


Following a multidisciplinary effort spanning six institutions, researchers working on the modERN (model organism Encyclopedia of Regulatory Networks) project have released a powerful resource for biologists studying the fruit fly Drosophila melanogaster and the nematode worm Caenorhabditis elegans. So far, report Kudron, Victorsen, et al., the project has yielded information about the interactions of 262 transcription factors (TFs) with 1.23 million binding sites in flies, along with 219 TFs with 670,000 binding sites in worms—all of which can be found in a searchable database organized by gene and developmental stage.

Along with announcing the availability of this resource, the group shared findings made during its construction. One such observation is that genomic regions with a large number of TF binding sites are often associated with broadly expressed genes, whereas regions with fewer TF binding sites are more often found near genes that are expressed mainly in specific tissues.

The collection includes 403 worm strains and 427 fly strains, each of which has a different TF tagged with green fluorescent protein. Researchers can obtain stocks through existing resources, the Caenorhabditis Genetics Center and the Bloomington Drosophila Stock Center. The strains have a variety of possible uses—for example, determining expression patterns of TF genes of interest.

Choosing flies and worms for the modERN project was a logical choice for multiple reasons, not least of which being that so much is known about these important model organisms. The authors also note that a major advantage of working with flies and worms for this project is that they can be studied as whole, living organisms at all developmental stages, which is not possible with human subjects. And since many fly and worm TFs are homologous to human TFs, it’s likely that research fueled by modERN data will provide a treasure trove of useful leads for biologists studying humans as well.

CITATION:

The ModERN Resource: Genome-Wide Binding Profiles for Hundreds of Drosophila and Caenorhabditis elegans Transcription Factors
Michelle M. Kudron, Alec Victorsen, Louis Gevirtzman, LaDeana W. Hillier, William W. Fisher, Dionne Vafeados, Matt Kirkey, Ann S. Hammonds, Jeffery Gersch, Haneen Ammouri, Martha L. Wall, Jennifer Moran, David Steffen, Matt Szynkarek, Samantha Seabrook-Sturgis, Nader Jameel, Madhura Kadaba, Jaeda Patton, Robert Terrell, Mitch Corson, Timothy J. Durham, Soo Park, Swapna Samanta, Mei Han, Jinrui Xu, Koon-Kiu Yan, Susan E. Celniker, Kevin P. White, Lijia Ma, Mark Gerstein, Valerie Reinke, Robert H. Waterston
Genetics 2018 208: 937-949; https://doi.org/10.1534/genetics.117.300657
http://www.genetics.org/content/208/3/937

]]>
Mixed up: Insights into artificial sequencing chimeras https://genestogenomes.org/mixed-up-insights-into-artificial-sequencing-chimeras/ Thu, 29 Mar 2018 12:00:58 +0000 https://genestogenomes.org/?p=13850 Sequencing a genome is not as simple as reading a book. All those neatly lined up letters are the final product of a complex process made up of many intricate steps that can—and do—go wrong. In a report published in G3: Genes|Genomes|Genetics, Peccoud et al. put their painful sequencing experiences to good use providing new insights into…]]>

Sequencing a genome is not as simple as reading a book. All those neatly lined up letters are the final product of a complex process made up of many intricate steps that can—and do—go wrong. In a report published in G3: Genes|Genomes|Genetics, Peccoud et al. put their painful sequencing experiences to good use providing new insights into a common sequencing problem: artificial chimeras.

Sequencing typically requires cutting up genetic material into fragments. These fragments are then amplified by PCR, and these amplified fragments are then sequenced. The end result is millions of short sequences, called reads. These reads can then be aligned to a reference sequence to identify changes like recombination and mutations.

The authors of the G3 study originally set out to identify recombination events between dengue virus and its host mosquito. They sequenced RNA from virus-infected mosquito cells, and they added pillbug RNA to a separate batch to serve as a control. Unexpectedly, the authors found virus-mosquito and virus-pillbug recombinant reads at similar frequencies. Since the virus RNA had never been in contact with the pillbug RNA before the sequencing procedure, they concluded that most, if not all, of these recombination events must have happened during the amplification or sequencing steps.

False-positives are always disappointing, but instead of giving up, the authors used their data and data from previous studies to better understand how the artificial reads occurred, as well as to learn how to better filter them.

This investigation revealed certain characteristics that are shared by both real and fake recombinant reads, including microhomology around the recombination junction. Crucially, they found that biologically-generated recombination almost always joins sequences in the same orientation, whereas artificial recombinant reads are often joined in opposite directions. The authors explain that this is likely due to template switching during the PCR step of sequencing.

Knowing the traits of false-positive reads may allow researchers to more carefully filter their data in future studies, ensuring they get the most accurate information possible—and knowing that what appears to be a dead end can still yield useful insights may help graduate students sleep better at night.

CITATION

A Survey of Virus Recombination Uncovers Canonical Features of Artificial Chimeras Generated During Deep Sequencing Library Preparation

Jean PeccoudSébastian LequimeIsabelle Moltini-ConcloisIsabelle GiraudLouis LambrechtsClément Gilbert
]]>
Genetics Society of America honors Philip Hieter with 2018 George W. Beadle Award https://genestogenomes.org/genetics-society-of-america-honors-philip-hieter-with-2018-george-w-beadle-award/ Mon, 05 Feb 2018 13:00:24 +0000 https://genestogenomes.org/?p=11733 The Genetics Society of America (GSA) is pleased to announce that Philip Hieter is the recipient of the 2018 George W. Beadle Award, bestowed in honor of his outstanding contributions to the genetics research community. Hieter is Professor of Medical Genetics in the Michael Smith Laboratories at the University of British Columbia. Geneticists across the…]]>

The Genetics Society of America (GSA) is pleased to announce that Philip Hieter is the recipient of the 2018 George W. Beadle Award, bestowed in honor of his outstanding contributions to the genetics research community. Hieter is Professor of Medical Genetics in the Michael Smith Laboratories at the University of British Columbia.

Philip Hieter.

Philip Hieter

Geneticists across the model organism and human genetics communities recognize Hieter for his dedication to uniting human biologists with those who work on model organisms such as mice, fruit flies, worms, and yeast. The resulting collaborations are crucial to advancing our knowledge of biology, including human health and disease; connecting model organism researchers and human biologists with one another speeds progress for both groups, facilitates mechanistic understanding of disease gene functions, and helps uncover novel disease mechanisms and candidate therapeutic targets.

In 1997, when few genome sequences were available, Hieter helped create XREFdb, a public database that linked the functional annotations of genes studied in model organisms with the phenotypic annotations on the human and mouse genetic maps. This resource provided cross-species candidate genes for mammalian phenotypes, including human diseases, and stimulated interactions between basic scientists working on various organisms and the medical genetics community. He has also founded and co-led several multidisciplinary meetings that bridged the gap between biologists working on humans and those working on model organisms. Hieter and Jeannie Lee, a professor at Harvard Medical School and the Massachusetts General Hospital (and 2018 GSA President), were co-chairs of 2016’s Allied Genetics Conference, which brought together over 3,000 attendees from seven different genetic research communities to exchange ideas and findings.

As the 2012 GSA President, Hieter continued to foster closer relationships among different groups of life scientists. “As president of the GSA, Phil had a strong focus on bridging the many separate communities of the Society as well as increasing the interactions of the GSA community with members of the human genetics community,” says Stanley Fields, professor at the University of Washington and 2016 GSA President.

To help biological insights reach patients, Hieter co-founded, in 2014, the Canadian Rare Diseases: Models and Mechanisms National Network, a consortium that connects clinician scientists identifying gene mutations in patients that cause rare diseases to basic scientists analyzing the corresponding genes in model organisms. This network funds pilot studies to expedite collaboration between the two groups, conduct model organism-based functional studies of disease gene variants, and develop new therapeutic strategies using model organisms.

In addition to having connected research communities, Hieter and his lab have made many significant contributions to our understanding of chromosome biology, including the dissection of yeast centromeres and the identification of genes involved in genome stability. Their contributions to the yeast community include physical mapping methods, synthetic lethality screen approaches for identifying cross-species candidate genes as potential cancer drug targets, and a widely used set of vectors and yeast host strains that have been instrumental in work that has led to countless discoveries in recent decades.

The George W. Beadle Award was created by GSA to honor the memory of George W. Beadle (1903–1989), the 1946 GSA President. Beadle and his colleague Edward L. Tatum were awarded the Nobel Prize for Physiology or Medicine in 1958 for work that linked genetics to biochemistry, providing a major part of the foundation for the field of molecular biology. In addition to being a GSA President, Beadle served society in several leadership roles—for instance, as chairman of the National Academy of Sciences Committee on the Biological Effects of Atomic Radiation—and demonstrated a strong commitment to science outreach and education.

The Prize will be presented to Hieter at the 2018 Yeast Genetics Meeting, a GSA Conference to be held August 20–26 at Stanford University.

]]>
Is a statistical test letting significance slip through the cracks? https://genestogenomes.org/is-a-statistical-test-letting-significance-slip-through-the-cracks/ Mon, 22 Jan 2018 13:00:59 +0000 https://genestogenomes.org/?p=11325 Every scientist is familiar with the p-value: it’s one of the most commonly used metrics in statistics to evaluate the likeliness that an observed relationship is due to chance. Typically, a cutoff is set at p=0.05, such that any p-value of greater than 0.05 means the result is deemed “not statistically significant”—a heartbreaking outcome for…]]>

Every scientist is familiar with the p-value: it’s one of the most commonly used metrics in statistics to evaluate the likeliness that an observed relationship is due to chance. Typically, a cutoff is set at p=0.05, such that any p-value of greater than 0.05 means the result is deemed “not statistically significant”—a heartbreaking outcome for so many researchers.

The p-value has a massive impact on how data is interpreted, from whether others think following up on a result is worthwhile to whether it’s published at all—so it’s vital to get it right. In some areas of genetics, the Sequence Kernel Association Test (SKAT) is commonly used to calculate p-values. The test’s low computational cost feeds into its popularity, but it can’t be used in all situations: SKAT is unreliable for small datasets, and correcting for this issue is computationally demanding. In work reported in GENETICS, researchers developed a new, faster method to solve that problem and, along the way, identified previously unknown conditions under which the test breaks down.

The group found that SKAT can even fail when applied to some large datasets, resulting in extremely skewed p-values that could lead researchers to falsely conclude that their findings are not statistically significant. Using a large dataset from previous research on relationships between blood lipids and chemical modifications to DNA, they found that their new test, RL-SKAT, identified almost 40 times more statistically significant relationships than the traditional SKAT analysis did.

These potential issues with typical SKAT analysis could mean important associations are being ignored. The researchers have made the code for RL-SKAT freely available online so others can investigate this issue further—and maybe even find out whether they have some interesting results collecting dust.

CITATION:

Schweiger, R.; Weissbrod, O.; Rahmani, E.; Müller-Nurasyid, M.; Kunze, S.; Gieger, C.; Waldenberger, M.; Rosset, S.; Halperin, E. RL-SKAT: An Exact and Efficient Score Test for Heritability and Set Tests.
GENETICS, 207(4), 1275-1283.
DOI: 10.1534/genetics.117.300395
http://www.genetics.org/content/207/4/1275

]]>