Proceedings of the International Workshop on Semantic Big Data, 2016
Determining valuable data among large volumes of data is one of the main challenges in Big Data. ... more Determining valuable data among large volumes of data is one of the main challenges in Big Data. We aim to extract knowledge from these sources using a Hierarchical Multi-Label Classification process called Semantic HMC. This process automatically learns the label hierarchy and classifies items from very large data sources. Five steps compose the Semantic HMC process: Indexation, Vectorization, Hierarchization, Resolution and Realization. The first three steps construct automatically the label hierarchy from data sources. The last two steps classify new items according to the hierarchy labels. This paper focuses in the last two steps and presents a new highly scalable process to classify items from huge sets of unstructured text by using ontologies and rule-based reasoning. The process is implemented in a scalable and distributed platform to process Big Data and some results are discussed.
This paper reviews progress on understanding biological carbon sequestration in the ocean with sp... more This paper reviews progress on understanding biological carbon sequestration in the ocean with special reference to the microbial formation and transformation of recalcitrant dissolved organic carbon (RDOC), the microbial carbon pump (MCP). We propose that RDOC is a concept with a wide continuum of recalcitrance. Most RDOC compounds maintain their levels of recalcitrance only in a specific environmental context (RDOC t). The ocean RDOC pool also contains compounds that may be inaccessible to microbes due to their extremely low concentration (RDOC c). This differentiation allows us to appreciate the linkage between microbial source and RDOC composition on a range of temporal and spatial scales. Analyses of biomarkers and isotopic records show intensive MCP processes in the Proterozoic oceans when the MCP could have played a significant role in regulating climate. Understanding the dynamics of the MCP in conjunction with the better constrained biological pump (BP) over geological timescales could help to predict future climate trends. Integration of the MCP and the BP will require new research approaches and opportunities. Major goals include understanding the interactions between particulate organic carbon (POC) and RDOC that contribute to sequestration efficiency, and the concurrent determination of the chemical composition of organic carbon, microbial community composition and enzymatic activity. Molecular biomarkers and isotopic tracers should be employed to link water column processes Published by Copernicus Publications on behalf of the European Geosciences Union. 5286 N. Jiao et al.: Mechanisms of microbial carbon sequestration in the ocean to sediment records, as well as to link present-day observations to paleo-evolution. Ecosystem models need to be developed based on empirical relationships derived from bioassay experiments and field investigations in order to predict the dynamics of carbon cycling along the stability continuum of POC and RDOC under potential global change scenarios. We propose that inorganic nutrient input to coastal waters may reduce the capacity for carbon sequestration as RDOC. The nutrient regime enabling maximum carbon storage from combined POC flux and RDOC formation should therefore be sought.
This work is part of a global project to develop a recommender system of economic news articles. ... more This work is part of a global project to develop a recommender system of economic news articles. Its objectives are threefold: (i) automatically multi-classify the economic new articles, (ii) recommend the articles by comparing the profiles of the users and the multi-classification of the articles, and (iii) managing the vocabulary of the economic news domain to improve the system based on the seamlessly intervention of the documentalists. In this paper we focus on the automatic multi-classification of the articles and the respective description and justification to the documentalists. While several multi-classification solutions exist they are not automatically adaptable to the problem in hands as their description of the resulting multi-classification lacks substantial correlation with the documentalists perspective. In fact, we need to consider not only the automatic classification but also the supervision of the classification and its evolution based on the documentalists supervision of the automatic classification. Accordingly, it is necessary to provide a mechanism that bridges the gap between the automatic classification mechanisms and the documentalists thesaurus, in order to support their seamless supervision of classification and of thesaurus management. Ontologies are central to our proposal, as they are used to represent and manage the thesaurus, to describe the content of the articles, and finally to automatically multi-classify them via inference process. Also, we adopt a machine learning approach for generating a prediction model for supporting the automatic classification. This paper presents a proposal for enriching the documentalist-oriented ontology with the model prediction rules, which provides the necessary capabilities to the DL reasoner for automatic multi-classification.
One of the main challenges in the domain of competitive intelligence is to harness important volu... more One of the main challenges in the domain of competitive intelligence is to harness important volumes of information from the web, and extract the most valuable pieces of information. As the amount of information available on the web grows rapidly and is very heterogeneous, this process becomes overwhelming for experts. To leverage this challenge, this paper presents a vision for a novel process that performs cross-referencing at web scale. This process uses a focused crawler and a semantic-based classifier to cross-reference textual items without expert intervention, based on Big Data and Semantic Web technologies. The system is described thoroughly, and interests of this work in progress are discussed.
Proceedings of The International Workshop on Semantic Big Data, 2017
Information from the web is a key resource exploited in the domain of competitive intelligence. T... more Information from the web is a key resource exploited in the domain of competitive intelligence. These sources represent important volumes of information to process everyday. As the amount of information available grows rapidly, this process becomes overwhelming for experts. To leverage this challenge, this paper presents a novel approach to process such sources and extract only the most valuable pieces of information. The approach is based on an unsupervised and adaptive ontology-learning process. The resulting ontology is used to enhance the performance of a focused crawler. The combination of Big Data and Semantic Web technologies allows to classify information precisely according to domain knowledge, while maintaining optimal performances. The approach and its implementation are described, and an presents the feasibility and performance of the approach.
This paper reviews progress on understanding biological carbon sequestration in the ocean with sp... more This paper reviews progress on understanding biological carbon sequestration in the ocean with special reference to the microbial formation and transformation of recalcitrant dissolved organic carbon (RDOC), the microbial carbon pump (MCP). We propose that RDOC is a concept with a wide continuum of recalcitrance. Most RDOC compounds maintain their levels of recalcitrance only in a specific environmental context (RDOC t). The ocean RDOC pool also contains compounds that may be inaccessible to microbes due to their extremely low concentration (RDOC c). This differentiation allows us to appreciate the linkage between microbial source and RDOC composition on a range of temporal and spatial scales. Analyses of biomarkers and isotopic records show intensive MCP processes in the Proterozoic oceans when the MCP could have played a significant role in regulating climate. Understanding the dynamics of the MCP in conjunction with the better constrained biological pump (BP) over geological timescales could help to predict future climate trends. Integration of the MCP and the BP will require new research approaches and opportunities. Major goals include understanding the interactions between particulate organic carbon (POC) and RDOC that contribute to sequestration efficiency, and the concurrent determination of the chemical composition of organic carbon, microbial community composition and enzymatic activity. Molecular biomarkers and isotopic tracers should be employed to link water column processes Published by Copernicus Publications on behalf of the European Geosciences Union. 5286 N. Jiao et al.: Mechanisms of microbial carbon sequestration in the ocean to sediment records, as well as to link present-day observations to paleo-evolution. Ecosystem models need to be developed based on empirical relationships derived from bioassay experiments and field investigations in order to predict the dynamics of carbon cycling along the stability continuum of POC and RDOC under potential global change scenarios. We propose that inorganic nutrient input to coastal waters may reduce the capacity for carbon sequestration as RDOC. The nutrient regime enabling maximum carbon storage from combined POC flux and RDOC formation should therefore be sought.
This paper is interested in a recommender system of economic news articles. More precisely, it fo... more This paper is interested in a recommender system of economic news articles. More precisely, it focuses on automatic profile refinement of customers which is an important task over time by taken into account logs of the user concerning especially his/her actions, reading time, and domain specific knowledge. In our approach, ontologies are used to describe and automatically refine these profiles. This work focuses on one particular type of recommender systems which is content-based recommenders. The aim of these recommender systems is to build a user profile and to improve its precision over time. Several improvements that have been made to these recommender systems over the last decade are analyzed. We find that the improvements brought by the use of semantic knowledge are not negligible, therefore semantic web approaches should be more and more used in the future. Nevertheless improvements remain possible in this domain and further research could be interesting.
Uploads
Papers by Hassan Thomas