Bioinformatics – Genes to Genomes https://genestogenomes.org A blog from the Genetics Society of America Tue, 29 Oct 2024 17:22:51 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 https://genestogenomes.org/wp-content/uploads/2023/06/cropped-G2G_favicon-32x32.png Bioinformatics – Genes to Genomes https://genestogenomes.org 32 32 Early Career Leadership Spotlight: Olufemi Osonowo https://genestogenomes.org/early-career-leadership-spotlight-olufemi-osonowo/ Thu, 07 Nov 2024 18:00:00 +0000 https://genestogenomes.org/?p=87509 We’re taking time to get to know the members of the GSA’s Early Career Scientist Committees. Join us to learn more about our early career scientist advocates.

Olufemi Adekunle Osonowo
Career Development Subcommittee
Dalhousie University

Research Interest

Metabolomics and genomics are two distinct but complimentary approaches that offer valuable insights into the underlying mechanisms of complex traits, such as feed efficiency in sheep. My current research, which involves sustainable livestock production and the application of bioinformatics and machine learning to livestock production, seeks to unlock those insights.

In addition, I seek to develop a standardized operational procedure for optimizing the feed intake test period to use limited test station facilities more efficiently and accelerate selection rate by testing more animals in sheep production. Through genomic signature selection, both metabolomics and genomics will enable the measurement and association of metabolites in sheep that are linked with feed efficiency while also identifying specific genetic biomarkers associated with feed efficiency in sheep.

As a PhD-trained scientist, you have many career options. What interests you the most?

As an MSc student, I have multifaceted interests, encompassing both academic and applied aspects of science. My primary focus is sustainable livestock production, where I aim to improve efficiency and productivity while minimizing environmental impact. This interest aligns with the growing global demand for sustainable agricultural practices and the necessity to feed an increasing population.

One of the most intriguing areas for me is the application of bioinformatics and machine learning to livestock production. These cutting-edge technologies offer immense potential to revolutionize traditional agricultural practices. By analyzing large datasets, we can uncover patterns and insights that were previously inaccessible, leading to significant advancements in animal breeding, disease management, and overall farm management. For instance, genomics and metabolomics data can be used to identify biomarkers for disease resistance or superior production traits, enabling more precise and efficient breeding programs.

Machine-learning algorithms can predict and optimize various aspects of livestock management, from feed efficiency to animal health monitoring. The integration of sensor data, environmental factors, and historical performance records into predictive models can help farmers make informed decisions, ultimately leading to more sustainable and profitable operations.

In addition to the technical aspects, I am also passionate about the translational impact of my research. I believe that bridging the gap between scientific discoveries and practical applications is crucial for advancing the field. This connection involves collaborating with industry partners, policymakers, and other stakeholders to ensure that innovative solutions are effectively implemented and adopted.

Moreover, I am interested in the educational and mentorship aspects of my career. As a scientist, I feel a strong responsibility to contribute to the development of the next generation of researchers through activities such as teaching, supervising undergraduate students, and participating in outreach activities to promote scientific literacy and enthusiasm among young people.

I am driven by the potential to impact both the scientific community and the agricultural industry. My goal is to contribute to a future where agricultural practices are more efficient, sustainable, and capable of meeting global food demands while fostering scientific curiosity and innovation in others.

In addition to your research, how do you want to advance the scientific enterprise?

By bridging gaps between different fields, we can develop innovative solutions to complex problems. In my work, I actively seek collaborations with experts in bioinformatics, machine learning, veterinary medicine, and environmental science. This interdisciplinary approach not only enriches my research but also opens new avenues for discovery and application. I aim to foster a culture of collaboration in the scientific community, encouraging researchers to look beyond their disciplines and work together to tackle global challenges.

Furthermore, researchers must be able to convey their findings to diverse audiences, including policymakers, industry stakeholders, and the public. I am committed to improving my own communication skills and helping others do the same. This outreach involves not only publishing in scientific journals but also engaging industry partners, writing for popular science platforms, and participating in science communication workshops. By making scientific knowledge more accessible, we can inspire public interest in science and inform evidence-based decision-making.

In addition, different perspectives and experiences can lead to unique insights and innovative approaches. I am dedicated to promoting diversity in all its forms within the scientific enterprise—e.g., mentoring underrepresented students, advocating for inclusive policies, and participating in initiatives that support diversity in STEM fields. By creating an environment where everyone feels valued and supported, we can ensure that the best ideas and talents are brought to the forefront.

Advancing the scientific enterprise requires a multifaceted approach that goes beyond individual research endeavors. These initiatives not only enhance the quality and reach of scientific research but also ensure that science continues to serve society effectively.

As a leader within the Genetics Society of America, what do you hope to accomplish?

As a leader within GSA, I aim to foster innovation, promote inclusivity, enhance professional development, and advocate for science policy. By addressing these areas, I seek to strengthen the GSA community and make a meaningful impact on the field of genetics.

Innovation is at the heart of scientific progress. As a leader, I want to create an environment that encourages creative thinking and novel approaches to genetic research. Thus, by organizing symposiums, workshops, and conferences for collaborative brainstorming and interdisciplinary exchange, we can drive forward the frontiers of genetic science.

Additionally, a diverse and inclusive community is essential for the health and vibrancy of any scientific organization. I am committed to promoting inclusivity within GSA by championing programs and initiatives that support underrepresented groups in genetics—e.g., mentorship programs, scholarships, and networking opportunities. By fostering a culture of inclusivity, we can ensure that all voices are heard and valued, leading to a richer and more dynamic scientific community. Also, supporting the professional growth of GSA members is a key priority. Planning professional development resources—including career workshops, training sessions, and mentorship programs—will help members at all career stages to develop essential skills, navigate career transitions, and achieve their professional goals. Investing in the professional development of our members helps us cultivate the next generation of leaders in genetics.

I aim to advocate for policies that support funding for genetic research, promote science education, and ensure the ethical use of genetic information. Doing so involves engaging with policymakers, contributing to public discussions, and collaborating with other scientific organizations to amplify our voice. By advocating for supportive policies, we can create a favorable environment for genetic research and its beneficial impacts on society.

Overall, a strong and connected community is fundamental to GSA’s success. I will work to enhance member engagement and communication through regular updates, interactive platforms, and community-building events. By fostering a sense of belonging and shared purpose, we can strengthen the bonds within our society and create a supportive network for all members. We can make significant strides in advancing the field of genetics and addressing the complex challenges of our time.

Previous leadership experience

  • Communication Officer, Dalhousie Agricultural Association of Graduate Students, May 2024-Present
  • Globalink Mentor, Mitacs, April 2024-Present
  • President, National Youth Service Corp; Sustainable Development Goals (SDGs) Community Development Service, August 2019-July 2020
  • Intern (Team Lead), Community-Based Farming Scheme, September 2016- July 2017
  • Editor-in-Chief, The Source Magazine of Nigeria Association of Agricultural Students, Federal University of Agriculture, Abeokuta, November 2015-September 2016
]]>
GENETICS welcomes new associate editor Lei Sun https://genestogenomes.org/genetics-welcomes-new-associate-editor-lei-sun/ Tue, 19 Sep 2023 17:00:39 +0000 https://genestogenomes.org/?p=86291 A new associate editor is joining GENETICS in statistical genetics and genomics. We’re excited to welcome Lei Sun to the editorial team. Lei SunAssociate EditorLei Sun is a Professor in Statistics and Biostatistics at the University of Toronto. She studied mathematics at Fudan University and obtained her PhD in statistics from the University of Chicago…]]>

A new associate editor is joining GENETICS in statistical genetics and genomics. We’re excited to welcome Lei Sun to the editorial team.

Lei Sun headshot

Lei Sun
Associate Editor
Lei Sun is a Professor in Statistics and Biostatistics at the University of Toronto. She studied mathematics at Fudan University and obtained her PhD in statistics from the University of Chicago in 2001. Her research area is in statistical genetics and genomics, with a focus on robust association methods, multiple hypothesis testing, selective inference, and more recently methods for the X chromosome. In 2017, she received the prestigious Centre de recherches mathématiques-Statistical Society of Canada Prize in Statistics, and in 2020, she served as the President of the Biostatistics Section of the Statistical Society of Canada.

]]>
Early Career Leadership Spotlight: Jadson C. Santos (Jall) https://genestogenomes.org/early-career-leadership-spotlight-jadson-c-santos-jall/ Tue, 27 Sep 2022 17:11:00 +0000 https://genestogenomes.org/?p=82105 Jadson C. Santos (Jall) Career Development Subcommittee University of São Paulo Research Interest I have carried out research in various scientific areas—among them, human genetics, bioinformatics, structural biology of proteins, and molecular immunology. I’ve always been passionate about science, but the molecular world sparked my imagination and attracted me more than any other area. Currently,…]]>

Jadson C. Santos (Jall)

Career Development Subcommittee

University of São Paulo

Research Interest

I have carried out research in various scientific areas—among them, human genetics, bioinformatics, structural biology of proteins, and molecular immunology. I’ve always been passionate about science, but the molecular world sparked my imagination and attracted me more than any other area.

Currently, as a third-year PhD student in genetics, I integrate computational and experimental methodologies to understand the impact of pathogenic mutations on the 3D structure of proteins important to the immune system. In parallel, as part of my MBA in project management, I conducted research on leadership and working in scientific teams to understand the main interpersonal challenges that those teams face in scientific projects.

As a PhD-trained scientist, you have many career options. What interests you the most?

As a scientist, my main interests are in transdisciplinary research, which integrates different areas of knowledge in the search for innovations and discoveries that can solve complex world challenges, such as biodiversity loss, species extinction, the climate crisis, education, water scarcity, and global health.

To this end, I find myself applying the transferable skills I’ve learned during my scientific journey—combined with the management and leadership skills I’ve gained over the past four years—to connect knowledge and people with a common purpose. More specifically, I’m interested in working in management positions of international scientific societies to increase the visibility of science and its social impact, as well as catalyze scientists’ potential to innovate and discover “new worlds” through well-designed and well-executed projects.

Additionally, I am deeply interested in work that involves the career development of scientists and early career professionals. Therefore, since 2020, I have been mentoring undergraduate and graduate students on skills and career development in my country. This activity is a service of great social value and brings me immense satisfaction in knowing that I am directly contributing to the lives and careers of other scientists along my journey.

As a project consultant and trainer in project management, leadership, and communication, I aim to develop professional activities for scientists and research groups around the world. I am deeply fascinated by the academic/scientific environment. In my career vision, I will have the opportunity to visit different research groups and universities around the world, witnessing firsthand the places where knowledge arises while contributing to this process throughout my career. In short, I see myself as a scientist working to create the project, management, and leadership structures that can catalyze the results of scientists and generate impact beyond universities and research institutes. Science plays a central role in the development of the world and being involved in this development inspires me to do my best daily.

In addition to your research, how do you want to advance the scientific enterprise?

The collaborative nature of my PhD research made it clear to me that we need to continuously improve our interpersonal and intercultural skills. In most scientific and technical fields, more than 90 percent of research project studies and publications are collaborative, with collaboration skills being a prerequisite for scientists. Also, the increasing internationalization of scientific research makes such skills crucial in this environment.

In recent years, I’ve focused on training that can enhance my management and leadership skills to make a solid contribution to science by helping scientists strengthen their collaborations. This investment in learning outside academia was crucial to my understanding of the complexity of the challenges we face not only as scientists but also as individuals with different cultures, values, and life/career goals.

My broader career goal is to contribute to the creation of a more collaborative and productive scientific culture. Such a challenge requires a broad integration between science and other areas of knowledge. Likewise, it is essential to understand the dynamics of research teams and groups—an understanding that is facilitated when we live in this scientific environment. For this reason, my scientific journey forms the basis of my career, as it allows me to deeply understand the day-to-day challenges that scientists face in their research. I am also developing my collaborative knowledge and skills by writing a newsletter on leadership and collaboration in the research environment (with 8,000 subscribers, mostly graduate students and postdocs) and managing a community of more than 900 scientists and professionals interested in collaboration in life sciences. Being part of GSA’s Early Career Leadership Program is therefore a great opportunity for fostering a collaborative environment and improving my skills in this area.

As a leader within the Genetics Society of America, what do you hope to accomplish?

Before officially joining the program, I was already collaborating with GSA. In 2021, I was an organizer and moderator of the Portuguese Multilingual Seminar Series, along with two other Brazilian partners. At another scientific event, I hosted a virtual room for Portuguese-speaking scientists to integrate them into the event via their native language, thereby strengthening networking.

As co-chair of the Career Development Subcommittee, I look forward to continuing to learn from my partners inside and outside the subcommittee. Additionally, I intend to bring to our projects a vision from beyond academia that improves existing processes to better support the professional development of the scientific community.

The events that I have already organized together with the subcommittee members have proven relevant to the scientific community, especially early career scientists. I often receive positive feedback from my professional connections, informing me how crucial our content was to their lives and careers. This positive impact on the community motivates me to continue improving my ability to create value through my activities at GSA.

In the long term, I intend to broaden my experience in management and leadership in a multicultural environment and establish long-lasting collaborations with my Early Career Leadership Program partners. These long-term collaborations will be essential, allowing me to continue learning, engaging with the GSA community, and generating value for early career scientists and society.

Previous leadership experience

  • Founder and Mentor for Career Development, SSK Mentoring, 2020 – Present
  • Community Manager, Leadership and Collaboration in Science (Virtual Community), 2021 – Present
  • Advisor, Mendeley Community, 2020 – 2021
  • Tutor, theVirtual University of São Paulo, 2019 – 2020
  • Expert Volunteer, Science Buddies Ask an Expert Program, 2018 – 2019

You can contact Jadson C. Santos (Jall) on LinkedIn, Instagram, or Twitter. You can find his newsletter on LinkedIn here.

]]>
How bioinformatics can help fill the therapeutic drug pipeline https://genestogenomes.org/how-bioinformatics-can-help-fill-the-therapeutic-drug-pipeline/ Thu, 18 Jun 2020 17:24:20 +0000 https://genestogenomes.org/?p=68388 Written by members of the GSA Early Career Scientist Communication and Outreach Subcommittee: Angel F. Cisneros Caballero, Université Laval; Adelita Mendoza, PhD, Washington University; Narjes Alfuraiji, University of Manchester; Anna Bajur, Max Planck Institute of Molecular Cell Biology and Genetics During the current global pandemic, public attention is increasingly falling on the process of drug…]]>

Written by members of the GSA Early Career Scientist Communication and Outreach Subcommittee: Angel F. Cisneros Caballero, Université Laval; Adelita Mendoza, PhD, Washington University; Narjes Alfuraiji, University of Manchester; Anna Bajur, Max Planck Institute of Molecular Cell Biology and Genetics


During the current global pandemic, public attention is increasingly falling on the process of drug discovery and development. How exactly do we find new treatments? And what does it take to bring them to the clinic? One powerful tool in this process that often escapes notice is bioinformatics—the use of computational resources to answer biological questions.

Exponential increases in computational power have revolutionized the way we do science. Over time, this has created entirely new fields of research, since we can now analyze more data efficiently and explore more complex algorithms and models1. Bioinformatics is one of the fields made possible by this technological achievement, and it has been critical for many recent scientific advances2

Bioinformatics comprises two interdisciplinary sub-fields that interface with computer science, mathematics, and biology: One is the research and development that scientists need to build the models modern biology requires. The other is computational biology, which is dedicated to understanding basic biological queries.

Bioinformatics is not just an academic field; it has many clinical applications. For example, we now have the technology to sequence genomes and identify genes involved in diseases, such as cancers. However, we can only do it accurately by looking at short segments at a time. Sequencing an organism’s genome becomes like a giant puzzle with thousands of pieces, and only bioinformatic methods allow us to assemble the pieces. 

Bioinformatics can also be used to guide drug design experiments and maximize the chances of finding active molecules. This new knowledge can eventually be used to develop therapies and vaccines to save human lives. Here, we will look at some examples of how we can use bioinformatics to discover molecular signposts for particular biological processes. These signs are known as biomarkers, and they are important in all types of clinical research. We will then take a closer look at how bioinformatics can use this information to come up with an application, such as a drug.  

Biomarkers of regeneration

Humans do not have the ability to regenerate limbs after amputation, but certain animals have this extraordinary ability, including planarian flatworms and axolotls. To understand these strong regenerative capabilities, scientists study fruit flies, flatworms, axolotls, and zebrafish. These species are powerful model systems to study tissue regeneration after amputation or damage. As in most biological fields, modern-day bioinformatics techniques are playing a key role in understanding how the genome responds to injury. 

Regeneration requires a real-time genomic response, which can be studied by looking at which genes are activated or repressed in individual cells with single-cell RNA sequencing. A recent study from Fincher et al. identified flatworm genes that were active after injury by analyzing all messenger RNA (the transcriptome) of individual lineage precursor cells with Drop-seq. This technique isolates single cells in droplets so that they can be separately analyzed and compared. This method is so powerful that researchers were able to detect the transcriptome from cell types with frequencies as low as ~10 cells per animal3.

Bioinformatic analyses allowed the cells to be clustered by gene expression groups in different tissue types, which then allowed researchers to build an atlas of genes expressed in the transcriptome after injury. 

In another example, Vizcaya-Molina et al. identified novel enhancers that regulate gene activation during different phases of recovery from injury in developing fruit flies. The researchers looked for accessible regions in the DNA (which are associated with higher gene activation) using a technique called ATAC sequencing. They confirmed that some regions of the transcriptome changed in response to injury, and they then wanted to know if those genes had common functions. With the help of bioinformatic databases, they found that many of those genes belonged to signaling pathways involved in cell growth and differentiation4

A study by Goldman et al. uncovered the genetic regulatory program that responds to injured cardiomyocytes in zebrafish. Inaccessible regions of DNA are tightly wrapped around proteins called histones. They looked at profiles of a replacement histone that indicates transcriptional accessibility, known as H3.3, to uncover gene regulatory elements involved in heart regeneration. This method allowed researchers to identify genes that were upregulated in response to injury. Later, during cardiomyocyte regeneration, they found an enrichment of enhancer elements that were “open” for transcription and then identified the specific sequence involved during regeneration5.

These examples show that bioinformatics helps to unlock the mysteries of genes that regulate regeneration after injury. Bioinformatics techniques are applicable to monitoring  real-time genomic response in individual cells, probing sections of accessible regions in the DNA in several organisms that are capable of regeneration. The greater computational power that bioinformatics provides will allow scientists to ask new questions that are important to the field of regeneration.  

Biomarkers of virulence factors

Bioinformatic tools are also important in finding biomarkers of infectious disease virulence, which can be appealing candidates for drugs. For instance, we can look for specific genes that drive the pathogenicity of a given microorganism, such as yeast. To do this, we can design strains that lack particular genes and evaluate if this makes them less pathogenic. Testing a large number of yeast strains is typically performed using competitive growth methodologies6For example, Han et al. evaluated growth of each mutant strain under controlled conditions of direct competition with other mutants, thus reducing the time and cost associated with screening each one individually. This enabled screening of a large number of strains to identify a drug target. 

An example of how functional genomics can be used to identify drug targets in pathogenic fungi has been carried out in Candida albicans with the C. albicans fitness test (CaFT). In this test, each isolate is assigned a unique identifier (barcode) that we can track computationally in order to observe if there were differences in fitness among heterozygote isolates. This enabled the researchers to screen for loss of gene function in the presence of antifungal agents, from which they identified the mechanism of action of novel compounds7.

Competitive fitness profiling was also used to evaluate the relative fitness of large pools of A. fumigatus mutants to identify those that are involved in virulence using a non-genetically barcoded library of mutants8. As a result, they reduced the total number of animals that are usually required to perform virulence screening. Tn–Seq is another technique used to assess the contribution of genes to fitness in Streptococcus pneumoniae. However, instead of deleting the gene, Tn-Seq inserts additional DNA within the gene9.

Similarly, changes in mutant frequency can be used to compare the fitness of the different mutants. By looking at which mutants grow most poorly, scientists can identify which genes are the most essential and consider them as potential drug targets. This is of particular interest in drug discovery programmes, since it is crucial to identify genes that are responsible or involved in pathogenicity to develop and design a novel therapy.  

Drug design

Once we have found the optimal drug target, we can turn to bioinformatics again to help us find a drug for it. A classic approach is to generate millions of molecules experimentally, test them, and register the ones that have an effect. However, this method is very time-consuming and resource-intensive, while the number of effective molecules can be low. Instead, we can use our models of molecular interactions to test molecules computationally and only test experimentally the ones that are predicted to be effective. This allows us to narrow down the set of molecules to test in an experiment while maximizing the chance of success. Indeed, Doman et al. showed that computational tests increase the efficiency of these experiments. When they screened a big library of molecules, only 0.02% of their tests were positive. However, when they used a computational analysis to  evaluate only the ones predicted to be effective, 35% of their tests were positive10. Thus, virtual screening saves a considerable amount of time and money by reducing the number of assays yet results in higher efficiency. In fact, there are several examples of drugs found through computational screening that have been approved by the FDA. These include dorzolamide to treat glaucoma, captopril to treat hypertension, and saquinavir to treat HIV11Moreover, these approaches are being used in the context of the current COVID-19 pandemic to find potential new treatments.

All potential drugs should be subjected to multiple stages of evaluation to assess their safety—first in preclinical tests with model organisms, and then in clinical studies in humans. Despite the promise of computational methods to help identify active molecules, most fail to pass these clinical studies because of unwanted side-effects. Thus, one of the newest endeavors in the field is the use of machine learning to add predictions on how likely a given molecule is to be toxic. Machine learning is a series of tools that find trends in known data to predict the results of future observations12.

Currently, these methods look at databases of molecules to extract their physical properties and health concerns associated with them. Then, they build models that link those properties to health concerns to derive general rules. These approaches have been very successful, with some models being able to identify toxic compounds with up to 95% accuracy.

Gaining access to greater computational power has allowed us to pursue new questions and develop further techniques to address them. This has had a notable impact on diverse fields, from basic science to applications in the clinic. The future of bioinformatics will certainly be exciting, as it will likely produce more and more results that have an impact on our daily lives.

 

References:

  1. Edgar, T. W. & Manz, D. O. Research Methods for Cyber Security. (Syngress, 2017).
  2. Gauthier, J., Vincent, A. T., Charette, S. J. & Derome, N. A brief history of bioinformatics. Brief. Bioinform. (2018). doi:10.1093/bib/bby063
  3. Fincher, C. T., Wurtzel, O., de Hoog, T., Kravarik, K. M. & Reddien, P. W. Cell type transcriptome atlas for the planarian Schmidtea mediterranea. Science 360, (2018).
  4. Vizcaya-Molina, E. et al. Damage-responsive elements in Drosophila regeneration. Genome Research 28, 1852–1866 (2018).
  5. Goldman, J. A. et al. Resolving Heart Regeneration by Replacement Histone Profiling. Dev. Cell 40, 392–404.e5 (2017).
  6. Han, T. X., Xu, X.-Y., Zhang, M.-J., Peng, X. & Du, L.-L. Global fitness profiling of fission yeast deletion strains by barcode sequencing. Genome Biol. 11, R60 (2010).
  7. Xu, D. et al. Genome-wide fitness test and mechanism-of-action studies of inhibitory compounds in Candida albicans. PLoS Pathog. 3, e92 (2007).
  8. Macdonald, D. et al. Inducible Cell Fusion Permits Use of Competitive Fitness Profiling in the Human Pathogenic Fungus Aspergillus fumigatus. Antimicrob. Agents Chemother. 63, (2019).
  9. Solaimanpour, S., Sarmiento, F. & Mrázek, J. Tn-seq explorer: a tool for analysis of high-throughput sequencing data of transposon mutant libraries. PLoS One 10, e0126070 (2015).
  10. Doman, T. N. et al. Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J. Med. Chem. 45, 2213–2221 (2002).
  11. Sliwoski, G., Kothiwale, S., Meiler, J. & Lowe, E. W. Computational Methods in Drug Discovery. Pharmacol. Rev. 66, 334–395 (2014).
  12. Yang, H., Sun, L., Li, W., Liu, G. & Tang, Y. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts. Front Chem 6, 30 (2018).

 


The authors:

 

 

 

 

 

]]>
Finding fresh mutations https://genestogenomes.org/finding-fresh-mutations/ Thu, 06 Jun 2019 12:00:33 +0000 https://genestogenomes.org/?p=52225 Improved duplex sequencing identifies spontaneous mutations in bacteria without long-term culturing. Spontaneous mutations are the driving force of evolution, yet, our ability to detect and study them can be limited to mutations that accumulate clonally. Sequencing technology often cannot identify very rare variants or discriminate between bona fide mutations and errors introduced during sample preparation.…]]>

Improved duplex sequencing identifies spontaneous mutations in bacteria without long-term culturing.


Spontaneous mutations are the driving force of evolution, yet, our ability to detect and study them can be limited to mutations that accumulate clonally. Sequencing technology often cannot identify very rare variants or discriminate between bona fide mutations and errors introduced during sample preparation. In GENETICS, Zhang et al. created an improved sequencing method to study low-abundance spontaneous mutations in the bacterium Escherichia coli.

To develop their method, the authors began with duplex sequencing, in which fragmented DNA molecules are tagged with an adaptor sequence for sequencing. This method is powerful, but at high read depths, it can erroneously call true mutations as PCR duplicates, making it ill-suited for finding rare mutations.

The authors first determined the error rate of the PCR step of duplex sequencing, where most experimental artifacts would be expected to occur. Because duplex sequencing can identify reads that came from the same parental DNA molecules (based on the adaptor sequences), the authors assumed that any such reads that had mismatches must have come from base changes during the PCR. By identifying these discrepancies, they determined the rates of different kinds of errors in the sequencing process.

The authors then sequenced E. coli genomes using a new method, which they termed improved duplex sequencing (IDS). IDS is similar to duplex sequencing, but it uses adaptor sequences of multiple different lengths. The use of more and different adaptor sequences minimizes the chance that two different DNA molecules that happen to break at the same place will be erroneously called as PCR replicates. By employing this method and accounting for the error rate of the PCRs, which they had already determined, the authors were able to confidently identify rare, random mutations in E. coli.

Having identified such mutations, the authors looked for patterns. They found that clusters of mutations occurred in regions of the genome that are known to be replication fork stopping regions. This is suggestive of transcriptional errors, as would be expected for spontaneous mutations. Interestingly, mutations in these hotspots were almost entirely in relatively unimportant regions of the genome—for instance, in the non-functional parts of tRNA genes. These vulnerable areas of the genome hint at mechanisms in E. coli that may protect more critical regions from damage.

CITATION:

Spatial Vulnerabilities of the Escherichia coli Genome to Spontaneous Mutations Revealed with Improved Duplex Sequencing

Xiaolong Zhang, Xuehong Zhang, Xia Zhang, Yuwei Liao, Luyao Song, Qingzheng Zhang, Peiying Li, Jichao Tian, Yanyan Shao, Aisha Mohammed AI-Dherasi, Yulong Li, Ruimei Liu, Tao Chen, Xiaodi Deng, Yu Zhang, Dekang Lv, Jie Zhao, Jun Chen, Zhiguang Li

Genetics October 2018 210: 547-558; https://doi.org/10.1534/genetics.118.301345

https://www.genetics.org/content/210/2/547

]]>
Does Candida grow on trees? https://genestogenomes.org/does-candida-grow-on-trees/ Mon, 04 Feb 2019 13:05:54 +0000 https://genestogenomes.org/?p=33000 An opportunistic human pathogen makes itself at home on old oaks. At one point or another, most people have played host to the common yeast Candida albicans. Around 40-60% of healthy adults carry around it in their mouth or guts; in immunocompromised people, however, this normally harmless cohabitant becomes a deadly pathogen. Generally thought to…]]>

An opportunistic human pathogen makes itself at home on old oaks.


At one point or another, most people have played host to the common yeast Candida albicans. Around 40-60% of healthy adults carry around it in their mouth or guts; in immunocompromised people, however, this normally harmless cohabitant becomes a deadly pathogen. Generally thought to only grow in warm-blooded animals, C. albicans has occasionally been isolated from plants—from blades of grass in a New Zealand pasture to gorse and myrtle plants on a sheep-grazed hill in Portugal to an African tulip tree in the Cook Islands. Are these just cases of misplaced yeast, or can C. albicans really thrive outside a warm body? In a report in GENETICS, Bensasson et al. describe the genomes of three C. albicans strains isolated from the barks of oak trees in an ancient wood pasture, providing genetic evidence that this yeast can live on plants for extended periods of time.  

A survey of budding yeast on oaks in Europe turned up three new strains of C. albicans; they were found only on some of the oldest trees. After ensuring that the new strains were indeed new tree-based isolates (and not merely laboratory contaminants), the authors conducted a phenotypic investigation. All three strains showed most of the standard traits of C. albicans, including the ability to grow at the elevated temperatures expected in a mammal; however, they were not identical. One of the strains was not as salt tolerant as the others, would not grow on soluble starch, and switched to a different growth form under particular nutritional conditions.

The authors next sequenced the genomes of the new strains; this was the first time C. albicans from a non-animal source have been sequenced. Genomic analysis showed that the three strains were relatively distantly diverged from each other, and the new sequences were compared with over 200 yeast sequences previously isolated from humans and other animals to create a phylogenetic tree. Interestingly, all three of the tree strains showed more similarity with clinical strains than with each other.

The authors also analyzed the levels of heterozygosity—a measure of genetic variation—within the tree strains and found that these strains are more heterozygous than typical clinically isolated strains, which suggests that life on trees subjects the yeast to different selection or mutation pressures than life in humans. Higher heterozygosity could be a result of yeast evolving in conditions where they have to reproduce asexually, which would make mutations more likely to accumulate, thus increasing allelic variation. This difference also supports the idea that these yeast grow in the wild, rather than being recent emigrants from a warm-blooded host.

These findings may have implications beyond the shady groves of the New Forest; understanding the wild life of C. albicans could shed light on the evolution and lifestyle of the yeast found in humans and help us better understand how virulent strains emerge and damage human health.

CITATION:

Diverse Lineages of Candida albicans Live on Old Oaks

Douda Bensasson, Jo Dicks, John M. Ludwig, Christopher J. Bond, Adam Elliston, Ian N. Roberts, Stephen A. James

Genetics January 2019 211: 277-288; https://doi.org/10.1534/genetics.118.301482

http://www.genetics.org/content/211/1/277

 

]]>
From sequence to centimeters: predicting height from genomes https://genestogenomes.org/from-sequence-to-centimeters-predicting-height-from-genomes/ Thu, 08 Nov 2018 14:51:09 +0000 https://genestogenomes.org/?p=27780 Machine learning and access to ever-expanding databases improves genomic prediction of human traits. In theory, a scientist could predict your height using just your genome sequence. In practice, though, this is still the stuff of science fiction. It’s not only your genes that affect height—environment also plays a role—but the larger problem is that height…]]>

Machine learning and access to ever-expanding databases improves genomic prediction of human traits.


In theory, a scientist could predict your height using just your genome sequence. In practice, though, this is still the stuff of science fiction. It’s not only your genes that affect height—environment also plays a role—but the larger problem is that height is affected by tens of thousands of individual genetic variations. This is also true of other complex traits, such as susceptibility to particular diseases. To get closer to accurate genomic prediction of human traits, geneticists are using new approaches to harness the vast amounts of sequence data becoming available. In GENETICS, Lello et al. describe a machine learning approach to the problem that allowed them to make predictions within a few centimeters of reality.

“To me, genomic prediction is the actual decoding of the genome,” says senior author Stephen Hsu from Michigan State University. A theoretical physicist by training, Hsu explains that his lab became interested in the problem of genomic prediction several years ago as the cost of genotyping continued to drop and more datasets became available. They had previously argued that they could predict complex traits, like height, if they only had enough data.The release of nearly 500,000 UK Biobank genotypes allowed them an opportunity to test this hypothesis.

A genomic prediction approach is quite different from the more familiar genome-wide association study (GWAS). GWAS methods test each SNP one at a time, looking for statistically significant contributions to the phenotype. In contrast, genomic prediction makes use of all SNPs at once in trying to build the best possible predictors.

The authors took the Biobank genotype and phenotype data and used a type of regression to identify the combination of SNPs that, taken together, best correlate with the trait of interest. Since only a subset of SNPs influence each trait—even the thousands of loci that control height are only a tiny fraction of the total number of SNPs identified —they also introduced a penalization factor that prevents the model from including unneeded SNPs. They were essentially trying to solve an optimization problem: identify the fewest number of variables (i.e. SNPs) that will allow for the best prediction about the outcome (i.e. trait).

Having generated their algorithm, the authors then put it to the test. They constructed models for height, heel bone density, and educational attainment, and they found that their algorithm worked well, particularly for height. For example, it produced a nearly 0.65 correlation with actual height, and predicted heights were usually within a few centimeters of actual heights. “Our predictor actually captures almost all the heritability that we could expect to find,” says Hsu.

With enough data, Hsu believes, accurate genomic prediction for complex traits will no longer be sci-fi. As more and more genotypes are obtained, Hsu predicts that this kind of prediction could be applied for most traits in as little as five years.

CITATION:

Accurate Genomic Prediction of Human Height

Louis Lello, Steven G. Avery, Laurent Tellier, Ana I. Vazquez, Gustavo de los Campos, Stephen D. H. Hsu

Genetics October 2018 210: 477-497; https://doi.org/10.1534/genetics.118.301267

http://www.genetics.org/content/210/2/477

]]>
Nanopore sequencing of 15 Drosophila genomes https://genestogenomes.org/nanopore-sequencing-of-15-drosophila-genomes/ Wed, 03 Oct 2018 14:30:12 +0000 https://genestogenomes.org/?p=24514 Low-cost sequencing closes gaps in fly genomes. Genetic sequencing technologies have revolutionized biological science, and regular advances in these tools continue to deliver better genomic data—more accurate and more useful—at a lower cost. In G3: Genes|Genomes|Genetics, Miller et al. report the genomes of 15 Drosophila species sequenced using Oxford Nanopore technology. Their work improves on…]]>

Low-cost sequencing closes gaps in fly genomes.


Genetic sequencing technologies have revolutionized biological science, and regular advances in these tools continue to deliver better genomic data—more accurate and more useful—at a lower cost. In G3: Genes|Genomes|Genetics, Miller et al. report the genomes of 15 Drosophila species sequenced using Oxford Nanopore technology. Their work improves on prior assemblies and describes how this technology can be feasibly applied in other labs.

Nanopore sequencing is an example of “third-generation” or “long-read” sequencing technology. In contrast to “next-generation” sequencing, which typically generates reads of a few hundred base pairs in length, the Nanopore approach can produce reads of several kilobases. This allows for better coverage and deeper sequencing, but it can also make the sequencing process more error-prone.

The authors used the Oxford Nanopore MinION to sequence 15 Drosophila species, all but one of which had been previously sequenced. They also resequenced the genome of Drosophila melanogaster and published their results in a separate report. When compared against reference sequences, their sequences captured a respectable amount of the published genomes (about 83% of the total sequences, on average). To correct for sequencing errors, they employed the polishing algorithms Racon and Pilon, which correct the genome sequences using reference Nanopore reads or Illumina reads, respectively. The polishing algorithms significantly increased assembly quality without altering other assembly statistics.

Because Nanopore sequencing produces longer reads, the authors wondered whether their data might be able to close gaps in existing reference sequences that were generated by short-read technology. By aligning short contigs from the reference genomes to their assemblies, they were able to fill ~61% of gaps in the reference genomes, demonstrating how the combination of newer and older technologies can increase the accuracy of genome builds.

The authors also describe how Nanopore technology can be readily applied in a variety of labs. They offer advice for sequencing and bioinformatics protocols. They found that using 1-10 µg of input DNA yielded better results than the factory-recommended 400 ng and that the de novo assembler miniasm used fewer computational resources than alternatives but produced comparable products. Excitingly, the material cost of sequencing of the reported Drosophila genomes was about $1000 USD, meaning that genome sequencing via Oxford Nanopore is likely feasible for labs of all sizes.

CITATION:

Highly Contiguous Genome Assemblies of 15 Drosophila Species Generated Using Nanopore Sequencing

Danny E. Miller, Cynthia Staber, Julia Zeitlinger, R. Scott Hawley

; https://doi.org/10.1534/g3.118.200160

http://www.g3journal.org/content/8/10/3131


[wysija_form id=”1″]

]]>
Drosophila development in the drink https://genestogenomes.org/drosophila-development-in-the-drink/ Mon, 20 Aug 2018 14:30:11 +0000 https://genestogenomes.org/?p=22277 A fruit fly model of fetal alcohol spectrum disorder reveals a Cyclin E-centric network modifies developmental sensitivity. Alcohol exposure in utero can lead to a wide range of developmental problems, even causing fetal death in some cases. But since this exposure doesn’t always have the same outcome, is it more likely to be a problem…]]>

A fruit fly model of fetal alcohol spectrum disorder reveals a Cyclin E-centric network modifies developmental sensitivity.


Alcohol exposure in utero can lead to a wide range of developmental problems, even causing fetal death in some cases. But since this exposure doesn’t always have the same outcome, is it more likely to be a problem for some than others? Exploring the genetic factors that influence susceptibility to fetal alcohol effects is extremely challenging in humans because both exposure levels and the spectrum of phenotypic outcomes are inherently difficult to quantify. In a report in G3: Genes|Genomes|Genetics, Morozova et al. turned to fruit flies to investigate genes that might be involved in prenatal sensitivity to alcohol.

Alcohol is a familiar hazard to the rotten-fruit-loving fly. However, like humans, fruit flies are susceptible to the effects of too much booze, especially during development. When fly larvae are exposed to alcohol, the outcome can be developmental delays and even death. Morozova et al. looked for genes that influence the developmental response to alcohol by using a population of over 200 wild-derived inbred fly lines called the Drosophila melanogaster Genetic Reference Panel (DGRP). The DGRP lines capture a great deal of genetic diversity while also allowing for replication within a line, and the lines have fully-sequenced, well-annotated genomes. The authors compared the effects of alcohol exposure on viability (how many of the flies survived) and development time (how long it takes for flies to reach adulthood) in the DGRP lines. They also examined how ethanol exposure affected locomotion in a subset of the lines.

Unsurprisingly, ethanol exposure in this experiment increased development time, decreased viability, and impaired locomotion in most of the lines tested. However, there was a lot of variation between lines, and a few lines actually developed faster when reared on ethanol-supplemented food.

The authors performed genome-wide association analyses to identify the genetic variants associated with different sensitivities to alcohol exposure. The genes identified were involved in a wide range of biological processes, including cytoskeleton organization, egg laying, and mitosis regulation. The authors validated the function of a number of these genes using RNAi-mediated knockdown or transposon-tagged mutational insertions.

They then constructed an interaction network using the genes associated with viability and development time, revealing that Cyclin E (CycE) was a highly connected hub gene. Since CycE is associated with cell cycle regulation and is highly expressed in Drosophila ovaries, it makes sense that it might play a key role in determining an organism’s sensitivity to developmental alcohol exposure. Their results may one day help researchers narrow the search for human gene variants that influence fetal sensitivity to this most common of drugs.

CITATION:

A Cyclin E Centered Genetic Network Contributes to Alcohol-Induced Variation in Drosophila Development

Tatiana V. Morozova, Yasmeen Hussain, Lenovia J. McCoy, Eugenea V. Zhirnov, Morgan R. Davis, Victoria A. Pray, Rachel A. Lyman, Laura H. Duncan, Anna McMillen, Aiden Jones, Trudy F. C. Mackay, R. H. Anholt

G3: GENES|GENOMES|GENETICS August 2018 8: 2643-2653;

https://doi.org/10.1534/g3.118.200260

http://www.g3journal.org/content/8/8/2643

]]>
Navigating the maize of heritable epigenetic change https://genestogenomes.org/navigating-the-maize-of-heritable-epigenetic-change/ Tue, 07 Aug 2018 14:32:15 +0000 https://genestogenomes.org/?p=21631 Tissue culture causes heritable methylation changes in plants. Tissue culture is a useful tool for plant scientists and horticulturalists in large part because it allows them to produce clones. Inconveniently, however, these clones are not always identical to the original, as one might expect them to be. In a report in GENETICS, Han et al.…]]>

Tissue culture causes heritable methylation changes in plants.


Tissue culture is a useful tool for plant scientists and horticulturalists in large part because it allows them to produce clones. Inconveniently, however, these clones are not always identical to the original, as one might expect them to be. In a report in GENETICS, Han et al. examined how propagation by tissue culture induces heritable epigenomic changes in maize.

When a portion of a plant is grown in tissue culture, it de-differentiates into an amorphous callus. This deprogrammed tissue can be induced to form roots or shoots or even to regenerate an entire plant—but this complex process can leave its marks on the genome and epigenome of the progeny. To get a picture of how tissue culture affects the epigenome, the authors compared methylation patterns in parental plants, plants that had been cultured, and the progeny of those cultured plants.

They found that most methylation was highly stable; it was consistent among all plants and unaffected by culturing. However, a subset of the methylome was variable between cultured and uncultured plants.  Many of these DNA methylation differences were passed on to the progeny of the cultured plants. Importantly, some of the changes the authors identified were shared among independently regenerated progeny, suggesting that tissue culture can prompt consistent, heritable epigenetic effects in maize.

In theory, these epigenetic changes might be due to general stress; for example, the culture process might cause the methylation machinery to become dysregulated. However, since most methylation in the genome was largely unaffected, and many changes were consistent among cultured plants, it’s more likely that these changes are targeted, with certain alleles being more sensitive than others to heritable epigenetic changes during culture. The mechanisms that lead to methylation modifications and the genetic and phenotypic consequences of those changes will be interesting avenues for further study; however, since most plant genome editing requires a culture step, researchers should be cautious about unintended epigenetic consequences.

CITATION:

Heritable Epigenomic Changes to the Maize Methylome Resulting from Tissue Culture

Zhaoxue Han, Peter A. Crisp, Scott Stelpflug, Shawn M. Kaeppler, Qing Li, Nathan M. Springer

GENETICS August 2018 209: 983-995; https://doi.org/10.1534/genetics.118.300987

http://www.genetics.org/content/209/4/983

]]>