Genomics – Genes to Genomes https://genestogenomes.org A blog from the Genetics Society of America Mon, 14 Oct 2024 17:42:20 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 https://genestogenomes.org/wp-content/uploads/2023/06/cropped-G2G_favicon-32x32.png Genomics – Genes to Genomes https://genestogenomes.org 32 32 New Senior Editor joins G3 https://genestogenomes.org/new-senior-editor-joins-g3/ Thu, 17 Oct 2024 14:57:09 +0000 https://genestogenomes.org/?p=87500 A new associate editor is joining G3: Genes|Genomes|Genetics. We’re excited to welcome Alexander Edward Lipka to the editorial team.]]>

Alexander Edward Lipka
Senior Editor

Alexander Edward Lipka leads a research team at the University of Illinois that applies cutting-edge statistical approaches to quantitative genetics analyses, resulting in more accurate quantification of genomic signals underlying phenotypic variation and prediction of breeding values of agronomically important traits. His lab also develops freely available software that enables the broader research community to apply these approaches to their own work. Here are some examples of publications from his lab:

References

  • Olatoye MO, Clark LV, Labonte NR, Dong H, Dwiyanti MS, Anzoua KG, Brummer JE, Ghimire BK, Dzyubenko E, Dzyubenko N, LBagmet L, Sabitov A, Chebukin P, Głowacka K, Heo K, Jin X, Nagano H, Peng J, Yu CY, Yoo JH, Zhao H, Long SP, Yamada T, Sacks EJ and Lipka AE (2020). “Training Population Optimization for Genomic Selection in Miscanthus.” G3: Genes, Genomes, Genetics: 10(7), 2465-2476

  • Murphy MD, Fernandes SB, Morota G, Lipka AE (2022). “Assessment of two statistical approaches for variance genome-wide association studies in plants.” Heredity 129(2): 93-102. DOI: 10.1038/s41437-022-00541-1

  • Fernandes SB and Lipka AE (2020). “simplePHENOTYPES: simulation of pleiotropic, linked and epistatic phenotypes.” BMC Bioinformatics: 21(1), 491.

     

Why Publish in G3?

]]>
Four new pipelines to streamline and improve genomic analyses https://genestogenomes.org/four-new-pipelines-to-streamline-and-improve-genomic-analyses/ Tue, 17 Sep 2024 13:00:00 +0000 https://genestogenomes.org/?p=87443 G3 reports exciting methods designed to make specific genomic analyses easier.]]>

As part of its scope, G3 Genes|Genomes|Genetics is dedicated to reporting new methods and technologies of significant benefit to the genetics community. Here, we highlight a selection of new analysis pipelines and software developments from the August 2024 issue that promise to improve research and practical applications in their respective subfields. These advances include easy and ready-to-use genomics tools that improve data management and analysis and overcome long-time challenges, emphasizing the ongoing progress and innovation happening in genomics.

An easy-to-use phylogenetic analysis pipeline

A new turn-key pipeline called OrthoPhyl has answered the call to improve the phylogenetic analysis of bacterial genomes. Developed by Middlebrook et al., OrthoPhyl can analyze up to 1,200 input genomes and reconstruct high-resolution phylogenetic trees based on whole genome codon alignments from diverse bacterial clades.

The beauty of OrthoPhyl is that it streamlines a usually complex, multi-step process requiring extensive bioinformatics expertise and computing resources into a multi-threaded tool that runs from a single command.

With more than 2 million publicly available bacterial genomes in NCBI’s GenBank database, OrthoPhyl can help research groups in the fields of bacterial phylogenetics and taxonomy take advantage of existing datasets to inform their ongoing analyses amid the ever-expanding sea of bacterial diversity.

Accurate genotype phasing and inference of grandparental haplotypes

To improve the analysis of complex plant genomes, Montero-Tena et al. have developed a new computational pipeline called haploMAGIC, which lets researchers identify locations of recombination known as genome-wide crossovers (COs) in multi-parent populations. haploMAGIC uses single-nucleotide polymorphism (SNP) data and known pedigree information to accurately phase genotypes, i.e., determine which alleles were inherited from each parent, and to reconstruct grandparental haplotypes, i.e., determine which alleles were inherited from each grandparent.

When tested on real-world data, haploMAGIC improved upon existing methods by using different levels of haploblock filtering to prevent false-positive COs—a common limitation—even as rates of genotyping errors increased. haploMAGIC can also distinguish between COs and gene conversions. By learning more about the position and frequency of genetic recombination events in complex plant genomes, breeders can better manage and expand genetic variation in their breeding programs.

A complete HiC/HiFi assembly pipeline

The USDA-ARS AgPest100 Initiative aims to create high-quality genome assemblies of pest insects that threaten agricultural production. However, the high cost and time currently needed to produce and manage these assemblies often hinders progress.

Molik et al. set out to address this challenge by developing a new Hi-C/high-fidelity (HiFi) sequencing genomic assembly pipeline called only the best (otb) using the Nextflow programming language. They then used otb to create a HiC/HiFi genome of the two-lined spittlebug, a significant agricultural pest that is not well understood. Overall, otb was able to streamline the process and reduce manual input and analysis time—including time spent organizing data and installing and calibrating bioinformatic tools.

By saving time, otb can significantly reduce costs for large genomic projects like AgPest100 and pave the way for new discoveries. Indeed, the HiC/HiFi assembly of the spittlebug genome represents a first step toward better understanding this plant-eating pest, which may lead to new, sustainable ways to manage it.

Assigning triploids to their diploid parents

Roche et al. have developed the first publicly-available, ready-to-use software for assigning triploid fish to their diploid parents. Triploidy means that an organism has three sets of chromosomes instead of two, and sterile triploids are commonly used in aquaculture breeding programs for their better yield and growth and to prevent genetic contamination of wild fish populations. The authors improve upon existing frameworks by updating the parentage assignment R package APIS to support triploids with diploid parentage.

When assessed with simulated and real datasets, APIS accurately assigned triploid offspring to their diploid parents using both likelihood and exclusion methods. The new software represents a key tool for establishing pedigrees in fish farming.

References

]]>
Some assembly required: how accurate are genome assembly lengths? https://genestogenomes.org/some-assembly-required-how-accurate-are-genome-assembly-lengths/ Thu, 22 Aug 2024 14:11:19 +0000 https://genestogenomes.org/?p=87361 Sequencing quality and read have improved greatly, but new research in GENETICS asks whether assemblies match the estimated genome size for their species.]]>

Advances in technology have allowed geneticists to sequence a slew of unique animal, plant, and fungi to species over the past thirty years. Public databases currently house tens of thousands of eukaryotic genome assemblies, but a relative few include an estimate of the total genome size for their respective species. Genome size (or C-value) varies widely, even at the species level, largely due to noncoding DNA, which is often dismissed as “junk” DNA. The standard metrics used to characterize assemblies don’t get at size and chromosome number—the fundamental structure of genomes. Without this foundational information, a new study in GENETICS asks: “Are our genome assemblies good enough?”

To determine whether existing assemblies match the estimated genome size for their corresponding species, author Carl Hjelmen designed an R script to pull information from four NCBI databases: Assembly, BioSample, Sequence Read Archive (SRA), and Taxonomy. Starting from the >40,000 available eukaryotic genome assemblies, he analyzed the ~15,000 animal, plant, and fungi genomes that had existing size estimates. He also used karyotype databases to determine the haploid chromosome number for mammals, dipterans, coleopterans, amphibians, polyneopterans.

Taking into account Kingdom, the sequencing platform used, and common assembly statistics, Hjelmen devised a metric called “Proportional difference from genome size” to determine how closely a given assembly length came to matching the estimated genome size. If the assembly was within 10% of the estimate, he considered it “good.” 

He found that almost half of the assemblies analyzed were outside of 10% of the genome size estimate for their species. Most were smaller than the estimates, suggesting that some assemblies are missing information. The larger the genome size, the more dramatic the deviation tended to be—which wasn’t surprising considering that larger eukaryotic genomes often carry more of that so-called “junk” DNA. (Nongenic DNA—a friendlier way to describe the regions of the genome that don’t code for proteins—might turn out to be more informative than its reputation would suggest, points out Hjelmen.)

Hjelmen also discovered a positive relationship between late-replicating heterochromatin and assembly/genome size deviation. When genomes contained more heterochromatin, the assembly was more likely to be missing DNA; he argues that this “lost information” should be highly sought after when studying populations and their health. And though the results were modest, long-read technologies appeared more likely to assemble genomes near that 10% cutoff.

This study points out the limitations of widely used genome metrics like N50 (which narrowly measures contiguity) and BUSCO value (which describes completeness of core sets of genes). To shrink this analytic gap, Hjelmen proposes a new structural unit: “PN50,” or proportional N50 value, which contextualizes N50 values by relating them to estimated genome size and haploid chromosome number. Adding PN50 to the current mix of metrics could increase the rigor of genome research, offering insight into the less-studied structural components of assemblies and supporting universal assembly comparison.

References

]]>
Justin Borevitz joins G3 as an associate editor https://genestogenomes.org/justin-borevitz-joins-g3-as-an-associate-editor/ Mon, 12 Aug 2024 16:58:23 +0000 https://genestogenomes.org/?p=87312 A new associate editor is joining G3: Genes|Genomes|Genetics. We’re excited to welcome Justin Borevitz to the editorial team.]]>


Justin Borevitz
Associate Editor

Justin Borevitz is a researcher and professor at The Australian National University. The Borevitz lab works on evolutionary plant genomics, moving from model organisms to foundation species of agriculture and ecosystems. They are interested in the identification and prediction of climate adaptation alleles, the traits they control, and the environments they are filtered in. Borevitz’s lab takes a landscape genomic approach, using long read sequencing, assembly and (structural) variant calling of individuals across large, diverse populations. They also take a phenomic approach to dissect adaptive traits among offspring in selected families, grown across common gardens (seedlings to saplings to satellite).

Why Publish in G3?

]]>
University of Minnesota researchers map genome of the last living wild horse species https://genestogenomes.org/university-of-minnesota-researchers-map-genome-of-the-last-living-wild-horse-species/ Fri, 09 Aug 2024 15:13:00 +0000 https://genestogenomes.org/?p=87313 The study, published in G3: Genes|Genomes|Genetics, is part of larger conservation efforts to save Przewalski’s horse.]]>

University of Minnesota researchers have successfully mapped the complete genome of the endangered Przewalski’s horse. Once extinct in the wild, the species now has a population of around 2,000 animals thanks to conservation efforts.

The study, published in the journal G3, was led by Nicole Flack and Lauren Hughes, researchers at the College of Veterinary Medicine, along with Christopher Faulk, a professor in the College of Food, Agricultural and Natural Resource Sciences. University of Minnesota students contributed to the genome sequencing through Faulk’s animal science course. 

“The genome is the basic blueprint for an animal and tells us what makes a species unique and also tells us about the health of a population,” said Faulk. “My students worked together to produce the highest quality Przewalski’s horse genome in the world.”

Researchers can now use this as a tool to make accurate predictions about what gene mutations mean for Przewalski’s horse health and conservation.  

“Studying genes without a good reference is like doing a 3 billion-piece puzzle without the picture on the box,” said Flack. “Przewalski’s horse researchers studying mutations in an important gene need a good reference picture to compare their puzzle with.” 

Researchers used a blood sample from Varuschka, a 10-year-old Przewalski’s mare at the Minnesota Zoo, to construct a representative map of genes for the species. The zoo has long been active in Przewalski’s horse breeding and management, with over 50 foals born since the 1970s. 

“We were excited to partner with the University of Minnesota to preserve the genetic health of the species as their populations continue to recover, both in zoos and in the wild,” said Anne Rivas, doctor of veterinary medicine at the Minnesota Zoo. “We are thrilled to offer our community the opportunity to see the horse as the results of our conservation efforts.” 

The cutting-edge technology sequencing used to construct the genome uses a small machine about the size of a soda can. Its portability means this method could be adapted for further study of wild Przewalski’s horses in remote locations.

Future applications of the reference genome may include studying genes that help the horse adapt to environmental changes, identifying mutations associated with specific traits or diseases, and informing future breeding decisions to help improve upon genetic diversity. Given the extreme population bottleneck that occurred during the near-extinction of Przewalski’s horse, such understanding is crucial for continued breeding efforts.

]]>
Christelle Fraïsse joins GENETICS as an associate editor https://genestogenomes.org/christelle-fraisse-joins-genetics-as-an-associate-editor/ Tue, 14 May 2024 17:26:27 +0000 https://genestogenomes.org/?p=86990 A new associate editor is joining GENETICS in the Theoretical Population & Evolutionary Genetics section. We’re excited to welcome Christelle Fraïsse to the editorial team.]]>

Christelle Fraïsse
Associate Editor

Christelle Fraïsse is a Centre National de la Recherche Scientifique researcher at Lille University working at the interface between theoretical and empirical evolutionary genetics. She is interested in understanding the evolutionary processes underlying speciation and adaptation, the determinants of selection efficacy and the evolution of sex chromosomes. Her lab combines theoretical modelling, computational methods and genomic data analyses. She received her PhD in Evolutionary Biology from the University of Montpellier in France and pursued a postdoctoral fellowship at the Institute of Science and Technology, Austria in the laboratories of Nick Barton and Beatriz Vicoso. She received an European Research Council Starting Grant to study the evolution of haplodiplontic plants.

Why Publish in GENETICS?

]]>
Yao-Wu Yuan joins GENETICS as an associate editor https://genestogenomes.org/yao-wu-yuan-joins-genetics-as-an-associate-editor/ Tue, 07 May 2024 20:05:28 +0000 https://genestogenomes.org/?p=86988 A new associate editor is joining GENETICS in the Genetics of Complex Traits section. We’re excited to welcome Yao-Wu Yuan to the editorial team.]]>

Yao-Wu Yuan
Associate Editor, Complex Traits

Yao-Wu Yuan is an Associate Professor at the University of Connecticut, Storrs. He is interested in understanding how and why organisms evolve so many beautiful forms in nature. His lab primarily studies floral trait diversification in the wildflower genus Mimulus (monkeyflowers) and aims to uncover the genes, pathways, and principles that explain the tremendous diversity of flowers by integrating genetics, genomics, development, mathematical modeling, and pollination ecology.

Why Publish in GENETICS?

]]>
Hongyu Zhao joins GENETICS as new Senior Editor https://genestogenomes.org/hongyu-zhao-joins-genetics-as-new-senior-editor/ Tue, 16 Apr 2024 16:09:20 +0000 https://genestogenomes.org/?p=86974 A new senior editor is joining GENETICS in the Statistical Genetics and Genomics section. We’re excited to welcome Hongyu Zhao to the editorial team.]]>

Hongyu Zhao
Senior Editor, Statistical Genetics and Genomics

Hongyu Zhao is the Ira V. Hiscock Professor of Biostatistics, Professor of Genetics, and Professor of Statistics and Data Science at Yale University. He received his BS in Probability and Statistics from Peking University in 1990 and PhD in Statistics from the University of California, Berkeley in 1995. His research interests are the development and application of statistical methods in molecular biology, genetics, therapeutics, and precision medicine with a focus on genome-wide association studies, biobank analysis, and single cell analysis. He is an elected fellow of the American Association for the Advancement of Science, the American Statistical Association, the Institute of Mathematical Statistics, and Connecticut Academy of Science and Engineering. He received the Mortimer Spiegelman Award for a top statistician in health statistics by the American Public Health Association and Pao-Lu Hsu Prize by the International Chinese Statistical Association.

Why publish in GENETICS?

]]>
New members of the GSA Board of Directors: 2024–2026 https://genestogenomes.org/new-members-of-the-gsa-board-of-directors-2024-2026/ Thu, 14 Dec 2023 18:14:12 +0000 https://genestogenomes.org/?p=86404 We are pleased to announce the election of four new leaders to the GSA Board of Directors: 2024 Vice President/2025 President Brenda Andrews Professor, University of Toronto It’s an honor to continue my association with the Society by serving as Vice President of the Board of Directors. I have broad knowledge of the ongoing activities…]]>

We are pleased to announce the election of four new leaders to the GSA Board of Directors:

2024 Vice President/2025 President

Brenda Andrews

Professor, University of Toronto

It’s an honor to continue my association with the Society by serving as Vice President of the Board of Directors. I have broad knowledge of the ongoing activities of the Society and see more opportunities for expanding the GSA profile internationally, including outreach to scientists in geographic regions underserved by major societies. The current International Seminar Series and this year’s International C. Elegans Conference in Glasgow are great examples of international outreach, and these types of activities should be expanded.

I will prioritize support for early- and mid-career researchers, in recognition of the challenges they face. GSA can help scientists by providing mentorship, training, and increased advocacy efforts whether for funding or communicating the value of basic research. It is important that the next generation of scientists see value in the activities supported by the Society, including our journals, which face challenges in light of the rapidly evolving landscape of academic publishing. Here, we must continue to foster relationships with authors, improving the visibility of their work, and helping to raise the profiles of our journals. All of our work must be considered in the context of GSA’s ongoing commitment to inclusivity. Here, the Society may wish to work with other groups to enable access to genetics and genomics research by young people from under-represented groups. I found that a program I started at the Donnelly Centre that supported visits to labs by local high school classes from less privileged parts of Toronto was very impactful.

Times have changed and so must GSA. I hope to learn from and listen to you as we shape GSA together.

Director

Arun Sethuraman

Associate Professor, San Diego State University

I am honored to be elected to the GSA Board of Directors. I have served as an Associate Editor at G3: Genes|Genomes|Genetics since 2017 and on GSA’s Conference Committee since 2021 as a representative of the population, evolutionary, and quantitative genetics group, and my work includes contributions to a recent training grant submitted to fund early-career and historically excluded geneticists attending TAGC 2024. I look forward to serving the GSA membership in an active Directorial role. As an early-career researcher at a Minority Serving Institution, I see this as an invaluable opportunity for me to be the voice of a largely underrepresented group of researchers in the Society. I am thrilled to have this opportunity to join a dedicated and diverse team of geneticists, editorial board members, and Society staff who are actively working to change the face and representation of our field.

My commitment to serving on GSA’s Board comes with a push to address five key issues that are close to my heart: (1) developing important training resources to actively involve undergraduates in genetics and genomics research as part of GSA’s catalog of activities and conferences; (2) changing how we teach fundamentals of genetics with exclusionary language by organizing a GSA community-wide effort to crowdsource and develop a new teaching paradigm for topics such as transmission, sex determination, polygenic selection, and genome-wide association studies; (3) interfacing with the equity and inclusion and conference committees in continuing to assess GSA’s membership demographic to build actionable items to increase participation of a diverse audience at all GSA conferences and to recruit and train a diverse group of editors, reviewers, and members; (4) actively featuring methods tutorials and blurbs of published work on the Genes to Genomes blog, specifically highlighting the work of early-career researchers, graduate and undergraduate students; and (5) increasing GSA’s representation at undergraduate and minority-focused conferences (e.g. SACNAS meetings, ABRCMS, Beckman Symposia).

Director

Eyleen O’Rourke

Associate Professor, University of Virginia

I bring to this role a strong background in molecular genetics research, having published in reputable journals, and presented my work at national and international conferences. Additionally, my experience as a teacher and mentor has enriched my understanding of the educational needs within our community. I pledge to collaborate with fellow board members and the broader GSA membership to advance our shared goals. I will listen to your feedback, actively seek your input, and work hard to represent your interests. I humbly request your support in this endeavor.

My work will be grounded in three core principles:

  1. Advancing Genetics Research: I believe that supporting and promoting cutting-edge genetics research is core to our society’s mission. I will actively foster collaboration and knowledge sharing among GSA members. I propose initiatives such as promoting the selection of unpublished work for oral presentation at GSA-organized conferences. Additionally, I will advocate for increased research funding and opportunities, catering to the needs of both early-career and established researchers.
  2. Education and Outreach: Genetics should transcend the confines of the laboratory. In an era where the public does not trust lifesaving vaccines, I am committed to enhancing the society’s educational initiatives. I will work on programs that promote genetics literacy and support science education at all levels. By bridging the gap between scientific discoveries and public understanding, we can strengthen our society’s impact.
  3. Diversity and Inclusion: Science works at its best when it reflects the diversity of our broader community. As a first-generation high-school graduate and Latina, I have dedicated the past decade to learn, teach, and champion inclusive research and teaching practices. I have promoted minorities both locally and internationally. I pledge to carry this dedication into GSA, advocating for programs that support underrepresented groups and nations in genetics. I will work diligently to foster an inclusive environment where every voice is not only heard but valued.

Together, we can advance genetics research, education, and inclusivity. Thank you for consideration, and I look forward to the opportunity to serve you.

Director

Jason Stajich

Professor, University of California, Riverside

I am honored to have the opportunity to serve on the Board of Directors of GSA. The Society has enabled many opportunities in my career, and I am eager to contribute back. I first became a GSA member in graduate school and was completely hooked on the community and research after attending my first Fungal Genetics conference. I have served as an Associate Editor at GENETICS since 2018, and previously contributed to conferences by sitting on the Neurospora and Fungal Genetics Policy Committees. I am currently a Professor in the Department of Microbiology and Plant Pathology where I have taught in the fields of Genomics, Microbiology, and Bioinformatics for the past 14 years. I currently serve as Vice Chair of my department and previously have served the campus faculty as Chair of the Academic Senate and as chair of the Graduate Council. I am excited to contribute to the Society’s efforts in building training and mentorship for early career scientists, helping shape the advocacy for science and genetics in funding and policy decisions, and providing perspectives on the community’s needs to advance new research systems and questions.

As a member of the Board, I will continue to champion the value and importance of diverse research systems and diverse research communities to address fundamental understandings of genetics and biology. I am an omnivore of biological research systems and believe there are strengths in a collection of computational and experimental approaches across a variety of organisms. My own draw to science was found in the satisfaction of problem solving, and I will contribute my efforts to the Society as we consider different problems such as the public perception of science, retaining and recruiting a broad representation of individuals to work in our field, or the creativity needed in how societies navigate changes in journal publication strategies. The GSA Journals have been a home for my publications and the conferences and members have been a strong and supportive community for my research and development. If elected, I would dedicate the time and energy to help sustain and grow our society.

]]>
The silver lining of bioinformatics https://genestogenomes.org/the-silver-lining-of-bioinformatics/ Thu, 08 Sep 2022 15:29:00 +0000 https://genestogenomes.org/?p=81488 Bioinformatics—a scientific discipline that aims to curate, analyze, and distribute biological data—is facing a crisis: a deluge of data is overwhelming laboratories and existing infrastructure.  Biologists, especially those working in genome sciences, have recognized the importance of big data: in just two decades, the number of genome sequences has increased 10,000-fold (from 180,000 to 1.8…]]>

Bioinformatics—a scientific discipline that aims to curate, analyze, and distribute biological data—is facing a crisis: a deluge of data is overwhelming laboratories and existing infrastructure. 

Biologists, especially those working in genome sciences, have recognized the importance of big data: in just two decades, the number of genome sequences has increased 10,000-fold (from 180,000 to 1.8 billion genomes) and the number of sequenced bases has increased 25,000-fold (from 640 million to 16 trillion bases). Such a rich collection of genome sequences rivals the esteemed Library of Alexandria, a prestigious collection of roughly half a million scrolls established in approximately 250 BCE.

Similar to the ancient Library of Alexandria, mystery shrouds the genomic library of today. Specifically, unraveling how the 1.8 billion genomes encode organismal complexity and their components—even in “simple” organisms like bacteria—remains a grand challenge. So, what stops us from understanding the link between the data we generate and their biological meaning? One major hurdle is both a challenge and an opportunity. 

The necessary infrastructure of supercomputers and widely distributed analytical pipelines for processing ever-increasing datasets are lacking. As the number of genomes available continues to increase, even as this article is being read, scalable solutions are needed. Cloud-based platforms promise a solution to overcome this hurdle and usher in a new era of understanding in biosciences. We provide an overview of major hurdles the field faces and describe how cloud-based infrastructure may be the silver lining for a rapidly growing field.

The data deluge

Biology generates massive amounts of data every year; almost 40 petabytes, which is roughly equivalent to the entire written works of humankind from the beginning of recorded history in all languages. Instead of simple text files, the types of data generated in biological studies are diverse. There are genome sequences, transcript and protein abundances, growth curves, species presence and abundance in specific environments, and imaging, just to name a few.

One major challenge is that heterogeneous data types are often stored in different formats, require different suites of software for processing and analysis, generate different output file formats, and may require additional software for creating human-interpretable representations of the data. The number of data types (and amount of data) will continue to rise with the advent of new technologies. Curating, storing, and distributing colossal datasets in diverse formats will require innovative solutions.

One solution is a collaboration between academic institutions and bioindustries. Specifically, the latter may have established a computational infrastructure that exceeds what is available to some academic groups; for example, the Broad Institute of MIT and Harvard use cloud-based platforms to distribute data generated by diverse research consortia.

Cloud analytics

In the future, all analysis and interpretation of biological data will be done using cloud analytics. With resources that vastly exceed the personal computer, desktops and laptops are shifting from analysis hubs to portals linking researchers to cloud architectures. For academic labs, this will drive down hardware costs because a personal computer will only need enough memory to maintain a stable connection to the cloud. That means inexpensive laptops, tablets, and even Raspberry Pis can act as portals to the cloud. Academic labs will no longer face other costs and headaches, such as the maintenance and management of computing infrastructure. 

Major research institutions have already migrated to cloud-based architectures. For example, the European Bioinformatics Institute uses Amazon Web Services’ Elastic Compute Cloud. Following increased demand, there are now numerous providers of cloud-based platforms: Rackspace, VMware, IBM, and Microsoft, among others. With the threat of slashed budgets for scientific research, these services are likely to become even more prominent in academia.

Overcoming (bioinformatics) supply chain issues

Despite advantages in data storage and analytic capacity, a major complexity remains: the development of toolkits and analytical workflows to carry out analyses. Let’s say a cancer biologist wants to investigate the genomic and transcriptomic signatures associated with pancreatic cancer. The researcher likely wants to automate a complete analysis, creating end-to-end bioinformatic processing and analysis to obtain meaningful results from raw data. Doing so requires multiple steps and the handling of diverse data formats. Suppose the researcher completed this herculean task by developing in-house software and a data management system. It would be an amazing feat, but how would it help a biologist studying, for example, colon cancer using a similar analysis for their experiment? This raises an issue of scale. Emailing codebases and describing workflows is a solution that can work for a few people, not many. However, platforms like GitHub offer developers a cloud-based distribution platform. Other distribution hubs like PyPi, Bioconda, and Bioconductor further help to disseminate software packages across the globe. User-friendly platforms like Galaxy, the CLC Workbench from Qiagen, and the console from LatchBio help researchers seamlessly stitch together software and more easily share workflows. Taken together, these advances make it easier for scientists to share their cloud-based work, leading to lower lab costs and a more accessible field of bioinformatics.

A bright future or dark days?

In the future, bioinformatics workflows will be available to academic and citizen scientists alike. With intuitively designed platforms, students in high school, or even elementary school, could conduct bioinformatic research. Imagine that: middle-grade science fairs could feature analysis of terabytes of data—that is amazing! For the readers skeptical of these claims, I urge you to consider the history of the microscope. The early days of microscopy required niche skillsets in lens manufacturing and engineering making microscopes a rare commodity. Since then, microscope manufacturing has improved resulting in lowered costs and allowing the masses to become microscopists. Case in point, a Stanford research group invented the Foldiscope, a paper microscope that has a magnification of 140x and costs less than a dollar. Bioinformatics is in the midst of the same revolution. With the appropriate distribution of tools and access portals to cloud-based infrastructures, everyone in the world can become a bioinformatician. Widely accessible resources, however, will pose new challenges.

In summary, as bioinformatics transitions to cloud-based infrastructures, researchers will find themselves empowered and enabled to conduct experiments all across the globe. Without careful consideration of the major problems, bioinformatics will stagnate or fail to uphold the tenets of scientific rigor and integrity. However, careful consideration of these issues in bioinformatics will steer the ongoing revolution toward an exciting and productive era of cloud-based computing systems, broadening the accessibility of bioinformatics research. The future of bioinformatics research is in the cloud. And behind the clouds, the sun is shining.


Jacob L. Steenwyk is a post-doctoral fellow in the laboratory of Howard Hughes Medical Institute Investigator Dr. Nicole King at the University of California, Berkeley. He studies genome function and evolution in animals and fungi and develops software for the life sciences.

Kyle Giffin is the co-founder and COO of LatchBio, a cloud infrastructure platform used by biotech companies and labs across the world. Previously at Berkeley, he studied computational & cognitive neuroscience, data science, and entrepreneurship, before leaving school to start Latch.

]]>