2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2020
Modeling human behavior on the Web is often performed by sequential pattern mining (SPM). However... more Modeling human behavior on the Web is often performed by sequential pattern mining (SPM). However, the similarity between data elements often results in the decrease of the number of patterns mined. This work proposes to handle this similarity by managing multiple data sources representing different views of the data. We introduce G SPM, a behavioral pattern mining algorithm that takes advantage of multi-source data to handle the problem of data similarity. It adopts a selective mining strategy to limit the complexity and forms general patterns to limit the decrease of the patterns. Experimental results confirm that G SPM succeeds in handling the problem of item similarity. In addition, G SPM outperforms traditional approaches in terms of runtime and redundancy of the resulting set of patterns.
Data mining is the task of discovering interesting, unexpected or valuable structures in large da... more Data mining is the task of discovering interesting, unexpected or valuable structures in large datasets and transforming them into an understandable structure for further use . Different approaches in the domain of data mining have been proposed, among which pattern mining is the most important one. Pattern mining mining involves extracting interesting frequent patterns from data. Pattern mining has grown to be a topic of high interest where it is used for different purposes, for example, recommendations. Some of the most common challenges in this domain include reducing the complexity of the process and avoiding the redundancy within the patterns. So far, pattern mining has mainly focused on the mining of a single data source. However, with the increase in the amount of data, in terms of volume, diversity of sources and nature of data, mining multi-source and heterogeneous data has become an emerging challenge in this domain. This challenge is the main focus of our work where we pr...
Abstract. This paper describes the participation of MRIM team in Task 3: Patient-Centered Informa... more Abstract. This paper describes the participation of MRIM team in Task 3: Patient-Centered Information Retrieval-IRTask 1: Ad-hoc search of CLEF eHealth Evaluation lab 2016. The aim of this task is to evaluate the effectiveness of information retrieval systems when searching for health content on the web. Our submission investigates the effectiveness of word embedding for query expansion in the health domain. We experiment two variants of query expansion method using word embedding. Our first run is a baseline system with default stopping and stemming. The other two runs expand the queries using two different word embedding sources. Our three runs are conducted on Terrier platform using Dirichlet language model.
This report discusses the explanations in the domain of recommender systems: A review of the rese... more This report discusses the explanations in the domain of recommender systems: A review of the research papers in the domain, the different explanation interfaces and the evaluation criteria, our vision in this domain and its application on the e-learning project “METAL”.
Abstract. This paper details the collection, systems and evaluation methods used in the IR Task o... more Abstract. This paper details the collection, systems and evaluation methods used in the IR Task of the CLEF 2016 eHealth Evaluation Lab. This task investigates the e↵ectiveness of web search engines in providing access to medical information for common people that have no or little medical knowledge. The task aims to foster advances in the development of search technologies for consumer health search by providing resources and evaluation methods to test and validate search systems. The problem considered in this year's task was to retrieve web pages to support the information needs of health consumers that are faced by a medical condition and that want to seek relevant health information online through a search engine. As part of the evaluation exercise, we gathered 300 queries users posed with respect to 50 search task scenarios. The scenarios were developed from real cases of people seeking health information through posting requests of help on a web forum. The presence of qu...
This paper details the collection, systems and evaluation methods used in the IR Task of the CLEF... more This paper details the collection, systems and evaluation methods used in the IR Task of the CLEF 2016 eHealth Evaluation Lab. This task investigates the e↵ectiveness of web search engines in providing access to medical information for common people that have no or little medical knowledge. The task aims to foster advances in the development of search technologies for consumer health search by providing resources and evaluation methods to test and validate search systems. The problem considered in this year’s task was to retrieve web pages to support the information needs of health consumers that are faced by a medical condition and that want to seek relevant health information online through a search engine. As part of the evaluation exercise, we gathered 300 queries users posed with respect to 50 search task scenarios. The scenarios were developed from real cases of people seeking health information through posting requests of help on a web forum. The presence of query variations ...
The goals of Learning Analytics (LA) are manifold, among which helping students to understand the... more The goals of Learning Analytics (LA) are manifold, among which helping students to understand their academic progress and improving their learning process, which are at the core of our work. To reach this goal, LA relies on educational data: students' traces of activities on VLE, or academic, socio-demographic information, information about teachers, pedagogical resources, curricula, etc. The data sources that contain such information are multiple and diverse. Data mining, specifically pattern mining, aims at extracting valuable and understandable information from large datasets. In our work, we assume that multiple educational data sources form a rich dataset that can result in valuable patterns. Mining such data is thus a promising way to reach the goal of helping students. However, heterogeneity and interdependency within data lead to high computational complexity. We thus aim at designing low complex pattern mining algorithms that mine multi-source data, taking into consider...
This paper describes the participation of MRIM team in Task 3: Patient-Centered Information Retri... more This paper describes the participation of MRIM team in Task 3: Patient-Centered Information Retrieval-IRTask 1: Ad-hoc search of CLEF eHealth Evaluation lab 2016. The aim of this task is to evaluate the effectiveness of information retrieval systems when searching for health content on the web. Our submission investigates the effectiveness of word embedding for query expansion in the health domain. We experiment two variants of query expansion method using word embedding. Our first run is a baseline system with default stopping and stemming. The other two runs expand the queries using two different word embedding sources. Our three runs are conducted on Terrier platform using Dirichlet language model.
Huge amounts of digital data have been created across years due to the increasing digitization in... more Huge amounts of digital data have been created across years due to the increasing digitization in our everyday life. As a consequence, fast data collection and storage tools have been developed and thus data can be collected in huge volumes for various research and business purposes. The collected data can come from multiple data sources and can be of heterogeneous kinds thus forming heterogeneous multi-source datasets. These data could thus be analyzed in order to extract valuable information that serves research and business purposes. Data mining has been known as an important task in discovering interesting and valuable information from datasets and has gained a great interest across time. Different approaches in the domain of data mining have been proposed, among which pattern mining is the most important one. Pattern mining, including sequential pattern mining, discovers statistically relevant patterns (or sequential patterns) among data. This domain has grown to be very import...
This report discusses the general overview of the nature of data in our domain of research, the s... more This report discusses the general overview of the nature of data in our domain of research, the specific data that we have in METAL project and the related work concerning the types of data used in the domain of data mining and the associated algorithms.
Modeling human behavior on the Web is often performed by sequential pattern mining (SPM). However... more Modeling human behavior on the Web is often performed by sequential pattern mining (SPM). However, the similarity between data elements often results in the decrease of the number of patterns mined. This work proposes to handle this similarity by managing multiple data sources representing different views of the data. We introduce G SPM, a behavioral pattern mining algorithm that takes advantage of multi-source data to handle the problem of data similarity. It adopts a selective mining strategy to limit the complexity and forms general patterns to limit the decrease of the patterns. Experimental results confirm that G SPM succeeds in handling the problem of item similarity. In addition, G SPM outperforms traditional approaches in terms of runtime and redundancy of the resulting set of patterns.
Track Academic research: comprehensive evaluations of recent innovations in learning and student ... more Track Academic research: comprehensive evaluations of recent innovations in learning and student analytics approaches. Context and Purpose The goals of Learning Analytics (LA) are manifold, among which helping students to understand their academic progress and improving their learning process, which are at the core of our work. To reach this goal, LA relies on educational data: students' traces of activities on VLE, or academic, socio-demographic information, information about teachers, pedagogical resources, curricula, etc. The data sources that contain such information are multiple and diverse. Data mining, specifically pattern mining, aims at extracting valuable and understandable information from large datasets [2]. In our work, we assume that multiple educational data sources form a rich dataset that can result in valuable patterns. Mining such data is thus a promising way to reach the goal of helping students. However, heterogeneity and interdependency within data lead to a high computational complexity. We thus aim at designing low complex pattern mining algorithms that mine multi-source data, taking into consideration the dependency and heterogeneity among sources. The patterns formed
This paper describes the participation of MRIM team in Task 3: Patient-Centered Information Retri... more This paper describes the participation of MRIM team in Task 3: Patient-Centered Information Retrieval-IRTask 1: Ad-hoc search of CLEF eHealth Evaluation lab 2016. The aim of this task is to evaluate the effectiveness of information retrieval systems when searching for health content on the web. Our submission investigates the effectiveness of word embedding for query expansion in the health domain. We experiment two variants of query expansion method using word embedding. Our first run is a baseline system with default stopping and stemming. The other two runs expand the queries using two different word embedding sources. Our three runs are conducted on Terrier platform using Dirichlet language model.
Uploads
Papers by Julie BU DAHER