{"id":5288,"date":"2016-01-29T05:33:33","date_gmt":"2016-01-29T05:33:33","guid":{"rendered":"http:\/\/www.kurzweilai.net\/?p=271788"},"modified":"2016-02-01T05:48:39","modified_gmt":"2016-02-01T05:48:39","slug":"machine-learning-technique-uncovers-unknown-features-of-multi-drug-resistant-pathogen","status":"publish","type":"post","link":"https:\/\/hoo.central12.com\/fugic\/2016\/01\/29\/machine-learning-technique-uncovers-unknown-features-of-multi-drug-resistant-pathogen\/","title":{"rendered":"Machine-learning technique uncovers unknown features of multi-drug-resistant pathogen"},"content":{"rendered":"<div id=\"attachment_272456\" class=\"wp-caption aligncenter\" style=\"width: 424px;  border: 1px solid #dddddd; background-color: #f3f3f3; padding-top: 4px; margin: 10px; text-align:center; display: block; margin-right: auto; margin-left: auto;\"><img class=\"size-full wp-image-272456\" title=\"Pseudomonas aeruginosa\" src=\"http:\/\/www.kurzweilai.net\/images\/Pseudomonas-aeruginosa.jpg\" alt=\"\" width=\"414\" height=\"492\" \/><p style=' padding: 0 4px 5px; margin: 0;'  class=\"wp-caption-text\">According to the CDC, <em>Pseudomonas aeruginosa<\/em> is a common cause of healthcare-associated infections, including pneumonia, bloodstream infections, urinary tract infections, and surgical site infections. Some strains of P. aeruginosa have been found to be resistant to nearly all or all antibiotics. (illustration credit: CDC)<\/p><\/div>\n<p>A new machine-learning technique can uncover previously unknown features of organisms and their genes in large datasets, according to researchers from the\u00a0<a href=\"http:\/\/www.med.upenn.edu\/\" >Perelman School of Medicine<\/a>\u00a0at the University of Pennsylvania and the Geisel School of Medicine at Dartmouth University.<strong><br \/>\n<\/strong><\/p>\n<p><strong>F<\/strong>or example, the technique learned to identify the characteristic gene-expression patterns that appear when a bacterium is exposed in different conditions, such as low oxygen and the presence of antibiotics.<\/p>\n<p>The technique, called &#8220;ADAGE&#8221; (Analysis using Denoising Autoencoders of Gene Expression), uses a \u201cdenoising autoencoder\u201d algorithm, which learns to identify recurring features or patterns in large datasets &#8212; without being told what specific features to look for (that is, &#8220;unsupervised.&#8221;)*<\/p>\n<p>Last year, \u00a0<a href=\"http:\/\/www.med.upenn.edu\/apps\/faculty\/index.php\/g5455356\/p8850805\" >Casey Greene, PhD<\/a>, an assistant professor of Systems Pharmacology and Translational Therapeutics at Penn, and his team\u00a0published, in an <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC4299935\/\" >open-access paper<\/a> in the American Society for Microbiology&#8217;s <em>mSystems,<\/em> the first demonstration of ADAGE in a biological context: an analysis of two gene-expression datasets of breast cancers.<\/p>\n<p><strong>Tracking down gene patterns of a multi-drug-resistant bacterium<\/strong><\/p>\n<p>The new study, published Jan. 19 in an open-access paper in <em>mSystems,<\/em> was more ambitious. It applied ADAGE to a dataset of 950 gene-expression arrays publicly available at the time for the multi-drug-resistant bacterium<a href=\"https:\/\/en.wikipedia.org\/wiki\/Pseudomonas_aeruginosa\" ><em> Pseudomonas aeruginosa<\/em><\/a>. This bacterium is a notorious pathogen in the hospital and in individuals with cystic fibrosis and other chronic lung conditions; it&#8217;s often difficult to treat due to its high resistance to standard antibiotic therapies.<\/p>\n<p>The data included only the identities of the roughly 5,000 <em>P. aeruginosa<\/em> genes and their measured expression levels in each published experiment. The goal was to see if this \u201cunsupervised\u201d learning system could uncover important patterns in\u00a0<em>P. aeruginosa<\/em>\u00a0gene expression and clarify how those patterns change when the bacterium\u2019s environment changes &#8212; for example, when in the presence of an antibiotic.<\/p>\n<p>Even though the model built with ADAGE was relatively simple &#8212; roughly equivalent to a brain with only a few dozen neurons &#8212; it had no trouble learning which sets of\u00a0<em>P. aeruginosa<\/em>\u00a0genes tend to work together or in opposition. To the researchers\u2019 surprise, the ADAGE system also detected differences between the main laboratory strain of\u00a0<em>P. aeruginosa<\/em>\u00a0and strains isolated from infected patients. \u201cThat turned out to be one of the strongest features of the data,\u201d Greene said.<\/p>\n<p>\u201cWe expect that this approach will be particularly useful to microbiologists researching bacterial species that lack a decades-long history of study in the lab,&#8221; said Greene. &#8220;Microbiologists can use these models to identify where the data agree with their own knowledge and where the data seem to be pointing in a different direction &#8230; and to find completely new things in biology that we didn\u2019t even know to look for.&#8221;<\/p>\n<p>Support for the research came from the Gordon and Betty Moore Foundation, the William H. Neukom Institute for Computational Science, the National Institutes of Health, and the Cystic Fibrosis Foundation.<\/p>\n<p><em>* In 2012, Google-sponsored researchers applied a similar method to randomly selected YouTube images; their system learned to recognize major recurring features of those images &#8212; including cats of course.<\/em><\/p>\n<hr \/>\n<p><strong>Abstract of\u00a0<em>ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions<\/em><\/strong><\/p>\n<p>The increasing number of genome-wide assays of gene expression available from public databases presents opportunities for computational methods that facilitate hypothesis generation and biological interpretation of these data. We present an unsupervised machine learning approach, ADAGE (<em>a<\/em>nalysis using\u00a0<em>d<\/em>enoising\u00a0<em>a<\/em>utoencoders of\u00a0<em>g<\/em>ene\u00a0<em>e<\/em>xpression), and apply it to the publicly available gene expression data compendium for\u00a0<em>Pseudomonas aeruginosa<\/em>. In this approach, the machine-learned ADAGE model contained 50 nodes which we predicted would correspond to gene expression patterns across the gene expression compendium. While no biological knowledge was used during model construction, cooperonic genes had similar weights across nodes, and genes with similar weights across nodes were significantly more likely to share KEGG pathways. By analyzing newly generated and previously published microarray and transcriptome sequencing data, the ADAGE model identified differences between strains, modeled the cellular response to low oxygen, and predicted the involvement of biological processes based on low-level gene expression differences. ADAGE compared favorably with traditional principal component analysis and independent component analysis approaches in its ability to extract validated patterns, and based on our analyses, we propose that these approaches differ in the types of patterns they preferentially identify. We provide the ADAGE model with analysis of all publicly available\u00a0<em>P.\u00a0aeruginosa<\/em>\u00a0GeneChip experiments and open source code for use with other species and settings. Extraction of consistent patterns across large-scale collections of genomic data using methods like ADAGE provides the opportunity to identify general principles and biologically important patterns in microbial biology. This approach will be particularly useful in less-well-studied microbial species.<\/p>\n<hr \/>\n<p><strong>Abstract of\u00a0<em>Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders<\/em><\/strong><\/p>\n<p>Big data bring new opportunities for methods that efficiently summarize and automatically extract knowledge from such compendia. While both supervised learning algorithms and unsupervised clustering algorithms have been successfully applied to biological data, they are either dependent on known biology or limited to discerning the most significant signals in the data. Here we present denoising autoencoders (DAs), which employ a data-defined learning objective independent of known biology, as a method to identify and extract complex patterns from genomic data. We evaluate the performance of DAs by applying them to a large collection of breast cancer gene expression data. Results show that DAs successfully construct features that contain both clinical and molecular information. There are features that represent tumor or normal samples, estrogen receptor (ER) status, and molecular subtypes. Features constructed by the autoencoder generalize to an independent dataset collected using a distinct experimental platform. By integrating data from ENCODE for feature interpretation, we discover a feature representing ER status through association with key transcription factors in breast cancer. We also identify a feature highly predictive of patient survival and it is enriched by FOXM1 signaling pathway. The features constructed by DAs are often bimodally distributed with one peak near zero and another near one, which facilitates discretization. In summary, we demonstrate that DAs effectively extract key biological principles from gene expression data and summarize them into constructed features with convenient properties.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A new machine-learning technique can uncover previously unknown features of organisms and their genes in large datasets, according to researchers from the&nbsp;Perelman School of Medicine&nbsp;at the University of Pennsylvania and the Geisel School of Medicine at Dartmouth University. For example, the technique learned to identify the characteristic gene-expression patterns that appear when a bacterium is [&#8230;]<\/p>\n","protected":false},"author":13,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46,42,43],"tags":[],"class_list":["post-5288","post","type-post","status-publish","format-standard","hentry","category-airobotics","category-biotech","category-news"],"_links":{"self":[{"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/posts\/5288"}],"collection":[{"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/comments?post=5288"}],"version-history":[{"count":1,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/posts\/5288\/revisions"}],"predecessor-version":[{"id":5289,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/posts\/5288\/revisions\/5289"}],"wp:attachment":[{"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/media?parent=5288"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/categories?post=5288"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hoo.central12.com\/fugic\/wp-json\/wp\/v2\/tags?post=5288"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}