Papers by Matthew Blumberg
arXiv (Cornell University), Mar 30, 2024
Pretrained language models underpin several AI applications, but their high computational cost fo... more Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models * Equal contribution † Equal mentoring

PLOS ONE, Jan 18, 2024
Candida auris is a newly emerged multidrug-resistant fungus capable of causing invasive infection... more Candida auris is a newly emerged multidrug-resistant fungus capable of causing invasive infections with high mortality. Despite intense efforts to understand how this pathogen rapidly emerged and spread worldwide, its environmental reservoirs are poorly understood. Here, we present a collaborative effort between the U.S. Centers for Disease Control and Prevention, the National Center for Biotechnology Information, and GridRepublic (a volunteer computing platform) to identify C. auris sequences in publicly available metagenomic datasets. We developed the MetaNISH pipeline that uses SRPRISM to align sequences to a set of reference genomes and computes a score for each reference genome. We used MetaNISH to scan~300,000 SRA metagenomic runs from 2010 onwards and identified five datasets containing C. auris reads. Finally, GridRepublic has implemented a prospective C. auris molecular monitoring system using MetaNISH and volunteer computing.
The Psychopathology of Information Processing Systems
Springer eBooks, 2013

FindingCandida aurisin public metagenomic repositories
Candida aurisis a newly emerged multidrug-resistant fungus capable of causing invasive infections... more Candida aurisis a newly emerged multidrug-resistant fungus capable of causing invasive infections with high mortality. Despite intense efforts to understand how this pathogen rapidly emerged and spread worldwide, its environmental reservoirs are poorly understood. Here, we present a collaborative effort between the U.S. Centers for Disease Control and Prevention, the National Center for Biotechnology Information, and GridRepublic (a volunteer computing platform) to identifyC. aurissequences in publicly available metagenomic datasets. We developed the MetaNISH pipeline that uses SRPRISM to align sequences to a set of reference genomes and computes a score for each reference genome. We used MetaNISH to scan ∼300,000 SRA metagenomic runs from 2010 onwards and identified five datasets containingC. aurisreads. Finally, GridRepublic has implemented a prospectiveC. aurismolecular monitoring system using MetaNISH and volunteer computing.
Patterns of Connection
Handbook of Human Computation, 2013
Foundations in Human Computation
Handbook of Human Computation, 2013
Handbook of Human Computation, 2013
Information processing systems composed of groups of humans may exhibit modes of dysfunction that... more Information processing systems composed of groups of humans may exhibit modes of dysfunction that correspond to psychopathology observed in individuals. Thus, clinical models normally applied to individuals are considered as candidate models for understanding psychosis and neurosis in distributed systems. In the first part, Matthew Blumberg considers dysfunction at the interaction level in the context of schizophrenia, and in the second part, Pietro Michelucci examines dysfunction at the neurological level in the context of obsessive-compulsive disorder.
Uploads
Papers by Matthew Blumberg