Papers by Matthijs Brouwer

Horticulture Research
The Mediterranean basin countries are considered secondary centres of tomato diversification. How... more The Mediterranean basin countries are considered secondary centres of tomato diversification. However, information on phenotypic and allelic variation of local tomato materials is still limited. Here we report on the evaluation of the largest traditional tomato collection, which includes 1499 accessions from Southern Europe. Analyses of 70 traits revealed a broad range of phenotypic variability with different distributions among countries, with the culinary end use within each country being the main driver of tomato diversification. Furthermore, eight main tomato types (phenoclusters) were defined by integrating phenotypic data, country of origin, and end use. Genome-wide association study (GWAS) meta-analyses identified associations in 211 loci, 159 of which were novel. The multidimensional integration of phenoclusters and the GWAS meta-analysis identified the molecular signatures for each traditional tomato type and indicated that signatures originated from differential combinatio...

Can We Use the Relationship Between Within-Field Elevation and NDVI as an Indicator of Drought-Stress?
IFIP Advances in Information and Communication Technology, 2020
Large farmers’ datasets can help shed light on agroecological processes if used in the context of... more Large farmers’ datasets can help shed light on agroecological processes if used in the context of hypothesis testing. Here we used an anonymized set of data from the geoplatform Akkerweb to better understand the correlation between within-field elevation and normalized differential vegetation index (NDVI, a proxy for biomass). The dataset included 3249 Dutch potato fields, for each of which the cultivar, the field polygon, the year of cultivation and the soil type (clay or sandy) was known. We hypothesize that under dry conditions such correlation is negative, meaning that the lowest portions of the field have more biomass because of water redistribution. From the data, we observed that in dry periods, such as the summer of 2018, the correlation was negative in sandy soils. Furthermore, we observed that early cultivars show a weaker correlation between NDVI and elevation than late cultivars, possibly because early cultivar escape part of the long dry summer spells. We conclude that the correlation between NDVI and elevation may be a useful indicator of drought stress, and deviations from the norm may be useful to evaluate the resistance to drought of individual cultivars.
In recent years, multiple solutions have become available providing search on huge amounts of pla... more In recent years, multiple solutions have become available providing search on huge amounts of plain text and metadata. Scalable searchability on annotated text however still appears to be problematic. With Mtas, an acronym for Multi-Tier Annotation Search, we add annotation layers and structure to the existing Lucene approach of creating and searching indexes, and furthermore present an implementation as Solr plugin providing both searchability and scalability. We present a configurable indexation process, supporting multiple document formats, and providing extended search options on both metadata and annotated text, such as advanced statistics, faceting, grouping and keyword-in-context. Mtas is currently used in production environments, with up to 15 million documents and 9.5 billion words. Mtas is available from GitHub.
MTAS - Extending Solr into a Scalable Search Solution and Analysis Tool on Multi-Tier Annotated Text
ERCIM News, 2017
GO FAIR food systems implementation network manifesto – to advance a global data ecosystem for agriculture and food by implementing FAIR data and services
F1000Research, 2019
E-infrastructure projects such as CLARIN do not only make research data available to the scientif... more E-infrastructure projects such as CLARIN do not only make research data available to the scientific community, but also deliver a growing number of web services. While the standard methods for deploying web services using dedicated (virtual) server may suffice in many circumstances, CLARIN centers are also faced with a growing number of services that are not frequently used and for which significant compute power needs to be reserved. This paper describes an alternative approach towards service deployment capable of delivering on demand services in a workflow using cloud infrastructure capabilities. Services are stored as disk images and deployed on a workflow scenario only when needed this helping to reduce the overall service footprint.
Leiden 19.12.11
Convergence on FAIR Data Trains in Hamburg
The Farm Data Train Infrastructure
F1000Research, 2020
The Nederlab project aims to bring together all digitized texts relevant to Dutch history and lan... more The Nederlab project aims to bring together all digitized texts relevant to Dutch history and language, both in terms of metadata and fulltext content. Given that the data comes from a plethora of data providers, we present a technical solution to deal with the heterogeneity of datasets for access, which we call the Broker. It is an extra pivotal layer between the backend and front-end of the data infrastructure to query and retrieve massive amounts of humanities data. Moreover, extra services can be embedded in the Broker, such as lexicon service for automated query expansion.

Applied Sciences
Genetics research is increasingly focusing on mining fully sequenced genomes and their annotation... more Genetics research is increasingly focusing on mining fully sequenced genomes and their annotations to identify the causal genes associated with traits (phenotypes) of interest. However, a complex trait is typically associated with multiple quantitative trait loci (QTLs), each comprising many genes, that can positively or negatively affect the trait of interest. To help breeders in ranking candidate genes, we developed an analytical platform called pbg-ld that provides semantically integrated geno- and phenotypic data on Solanaceae species. This platform combines both unstructured data from scientific literature and structured data from publicly available biological databases using the Linked Data approach. In particular, QTLs were extracted from tables of full-text articles from the Europe PubMed Central (PMC) repository using QTLTableMiner++ (QTM), while the genomic annotations were obtained from the Sol Genomics Network (SGN), UniProt and Ensembl Plants databases. These datasets w...
The Nederlab project aims to bring together all digitized texts relevant to Dutch history and lan... more The Nederlab project aims to bring together all digitized texts relevant to Dutch history and language, both in terms of metadata and fulltext content. Given that the data comes from a plethora of data providers, we present a technical solution to deal with the heterogeneity of datasets for access, which we call the Broker. It is an extra pivotal layer between the backend and front-end of the data infrastructure to query and retrieve massive amounts of humanities data. Moreover, extra services can be embedded in the Broker, such as lexicon service for automated query expansion.
MathDox editor
We describe the MathDox Editor, a web based editor for easy creation of semantically rich mathema... more We describe the MathDox Editor, a web based editor for easy creation of semantically rich mathematical documents, enriched with ser-vices for computations and translation to various formats.
Data Mining in the Dutch (Historical) Civil Registration (1811–Present)
Human Biology, 2012
Names identify individual persons. As such, names are central in research dealing with individual... more Names identify individual persons. As such, names are central in research dealing with individuals, and groups defined by properties of these individuals – such as families. In the latter, also generations come into play, carrying the dimension of time and historical developments in society. The dimension of space equally influences groups: members migrate and interact. For studies of, among others,
Data mining in the (historic) Civil Registration of The Netherlands from 1811 - present
Anim Behav, 2010
Data mining in the Dutch (historical) Civil Registration (1811–present)
Names identify individual persons. As such, names are central in research dealing with individual... more Names identify individual persons. As such, names are central in research dealing with individuals, and groups defined by properties of these individuals – such as families. In the latter, also generations come into play, carrying the dimension of time and historical developments in society. The dimension of space equally influences groups: members migrate and interact. For studies of, among others,
Human Biology, 2012
A recent workshop entitled ''The Family Name as Socio-Cultural Feature and Genetic Metaphor: From... more A recent workshop entitled ''The Family Name as Socio-Cultural Feature and Genetic Metaphor: From Concepts to Methods" was held in Paris in December 2010, sponsored by the French National Centre for Scientific Research (CNRS) and by the journal Human Biology. This workshop was intended to foster a debate on questions related to the family names and to compare different multidisciplinary approaches involving geneticists, historians, geographers, sociologists and social anthropologists. This collective paper presents a collection of selected communications.
In recent years, multiple solutions have become available providing search on huge amounts of pla... more In recent years, multiple solutions have become available providing search on huge amounts of plain text and metadata. Scalable searchability on annotated text however still appears to be problematic. With Mtas, an acronym for Multi-Tier Annotation Search, we add annotation layers and structure to the existing Lucene approach of creating and searching indexes, and furthermore present an implementation as Solr plugin providing both searchability and scalability. We present a configurable indexation process, supporting multiple document formats, and providing extended search options on both metadata and annotated text, such as advanced statistics, faceting, grouping and keyword-in-context. Mtas is currently used in production environments, with up to 15 million documents and 9.5 billion words. Mtas is available from GitHub
Drafts by Matthijs Brouwer
The Nederlab project aims to bring together all digitized texts relevant to Dutch history and lan... more The Nederlab project aims to bring together all digitized texts relevant to Dutch history and language, both in terms of metadata and full-text content. Given that the data comes from a plethora of data providers, we present a technical solution to deal with the heterogeneity of datasets for access, which we call the Broker. It is an extra pivotal layer between the back-end and front-end of the data infrastructure to query and retrieve massive amounts of humanities data. Moreover, extra services can be embedded in the Broker, such as lexicon service for automated query expansion.
Uploads
Papers by Matthijs Brouwer
Drafts by Matthijs Brouwer