Academia.eduAcademia.edu

Multimedia Annotation

description35 papers
group2 followers
lightbulbAbout this topic
Multimedia Annotation is the process of adding descriptive metadata to various forms of media, such as text, images, audio, and video, to enhance their accessibility, usability, and searchability. This practice facilitates the organization, retrieval, and understanding of multimedia content in digital environments.
lightbulbAbout this topic
Multimedia Annotation is the process of adding descriptive metadata to various forms of media, such as text, images, audio, and video, to enhance their accessibility, usability, and searchability. This practice facilitates the organization, retrieval, and understanding of multimedia content in digital environments.

Key research themes

1. How can machine learning and NLP improve the quality and semantic consistency of textual annotations in multilingual multimedia archives?

This research area focuses on applying advanced machine learning (ML), natural language processing (NLP), and deep learning techniques to automatically enhance the quality, harmonization, and semantic coherence of textual annotations (e.g., keywords and tags) linked to multimedia content, especially in multilingual and heterogeneous digital libraries. Improving annotation quality aids effective search, navigation, and visualization of large multimedia repositories and addresses challenges such as language identification, spelling correction, semantic similarity, and term specialization.

Key finding: This work develops an integrated pipeline combining supervised and unsupervised machine learning and deep learning techniques—including automatic language detection, spelling error identification and correction, and word... Read more
Key finding: This paper critically examines machine learning’s (ML) capabilities and limitations for automated descriptive metadata annotation in cultural heritage and scholarly collections, highlighting the scarcity of large,... Read more
Key finding: This survey analyses AI and image processing methods for automatic metadata generation in the context of unstructured multimedia data such as video lectures. It experimentally evaluates three summarization algorithms... Read more

2. What are effective approaches to multimedia annotation that enhance collaborative reasoning and decision-making in distributed virtual and educational environments?

This research theme explores designing and evaluating multimedia annotation systems that enrich collaborative decision-making and reflective learning processes in virtual environments (VEs) and educational settings. It emphasizes multimodal annotations (audio, text, sketches, video-synchronized camera movements) combined with structured argumentation trees and shared tag vocabularies to capture provenance, facilitate asynchronous discussions, and promote critical thinking among practitioners and students. These approaches support geographically distributed teams or learners engaging with complex multimedia artifacts and professional contexts.

Key finding: Introducing a rich multimedia annotation framework embedding audio, sketches, synchronized camera movements, and structured argumentation trees, this paper shows how annotations in collaborative virtual engineering... Read more
Key finding: This experimental study involving 274 undergraduate students assesses how multimedia annotations combined with folksonomy tag strategies (broad vs. narrow tags) influence the critical and reflective quality of student... Read more
Key finding: Evaluating MobiTOP, a hierarchical, multimedia-rich, web-based location annotation system, this usability study finds positive user acceptance of features enabling hierarchical annotation creation, sharing and browsing of... Read more

3. How can user-centered methodologies and tools facilitate effective manual or semi-automatic multimedia annotation integrating semantic web technologies and user expertise?

Given that fully automated semantic annotation remains inadequate for complex multimedia, this theme examines user-centered frameworks, methodologies, and tools that assist annotators—including non-expert users—in manually or semi-automatically creating ontology-based, multimedia annotations. It considers approaches that lower barriers to ontology navigation and extension, synchronize structured annotations with multimedia playback, and enable rich interaction with multimedia fragments. These methods are designed to produce precise, interoperable annotations while bridging the semantic gap through collaborative user involvement.

Key finding: The paper proposes the SA (Selection and Addition) methodology that supports non-expert users in ontology-based multimedia annotation by semantically retrieving relevant ontology elements and allowing in-situ extension of... Read more
Key finding: Presenting the Synote system, this work introduces a web-based platform that enables users to create fine-grained synchronized multimedia annotations—termed synmarks and synnotations—that link notes, tags, bookmarks, and... Read more
Key finding: This paper introduces the LEMO Annotation Framework, a standards-based, uniform model that supports interoperable multimedia annotations across diverse content types with support for fragment addressing and web... Read more
Key finding: This foundational survey analyzes the challenges and state-of-the-art technologies in video annotation, emphasizing the semantic gap between raw multimedia data and meaningful metadata. It advocates for hybrid man-machine... Read more

All papers in Multimedia Annotation

This paper presents a system that is designed to make possible the organization and search within the collected digitized material of intangible cultural heritage. The motivation for building the system was a vast quantity of multimedia... more
University education requires students to be trained both at university and at external internship centres. Because of Covid-19, the availability of multimedia resources and examples of practical contexts has become vital. Multimedia... more
We introduce MobiTOP, a Web-based system for organizing and retrieving hierarchical location-based annotations. Each annotation contains multimedia content (such as text, images, video) associated with a location, and users are able to... more
A number of research groups and software companies have developed digital annotation tools for textual documents, web pages, images, audio and video resources. By annotations we mean subjective comments, notes, explanations or external... more
Using the Semantic Grid to Build Bridges between Museums and Indigenous Communities Jane Hunter1, Ronald Schroeter1, Bevan Koopman1, and Michael Henderson1 DSTC, University of Queensland, Brisbane, Australia 4072 {jane, ronalds, bevank,... more
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will... more
We introduce MobiTOP, a Web-based system for organizing and retrieving hierarchical location-based annotations. Each annotation contains multimedia content (such as text, images, video) associated with a location, and users are able to... more
This paper presents the design process of a desk-set tangible user interface for the navigation and manipulation of media content organized by content-based similarity with offthe-shelf/flea market devices. For intra-media navigation, a... more
Multisensory experiences have been increasingly undertaken in the digital world. With the emerging interest in immersive applications (i.e. 360 videos and virtual reality), more and more researchers and practitioners are in pursuit of... more
Multisensory experiences have been increasingly undertaken in the digital world. With the emerging interest in immersive applications (i.e. 360 videos and virtual reality), more and more researchers and practitioners are in pursuit of... more
Mobile devices offer people the opportunity to get useful tasks done during time previously thought to be unusable. Because mobile devices have small screens and are often used in divided attention scenarios, people are limited to using... more
Video annotation is an activity that aims to supplement this type of multimedia object with additional content or information about its context, nature, content, quality and other aspects. These annotations are the basis for building a... more
Human perception is inherently multi-sensorial involving five traditional senses: sight, hearing, touch, taste, and smell. In contrast to traditional multimedia, based on audio and visual stimuli, mulsemedia seek to stimulate all the... more
Although information workers may complain about meetings, they are an essential part of their work life. Consequently, busy people spend a significant amount of time scheduling meetings. We present Calendar.help, a system that provides... more
Human perception is inherently multi-sensorial involving five traditional senses: sight, hearing, touch, taste, and smell. In contrast to traditional multimedia, based on audio and visual stimuli, mulsemedia seek to stimulate all the... more
Serbia-Forum is a web application portal designed and implemented by the Mathematical Institute of the Serbian Academy of Sciences and Arts (MISANU) whose goal is to digitally make available, many units of cultural heritage belonging to... more
We introduce MobiTOP, a Web-based system for organizing and retrieving hierarchical location-based annotations. Each annotation contains multimedia content (such as text, images, video) associated with a location, and users are able to... more
We introduce MobiTOP, a Web-based system for organizing and retrieving hierarchical location-based annotations. Each annotation contains multimedia content (such as text, images, video) associated with a location, and users are able to... more
This paper presents a system that is designed to make possible the organization and search within the collected digitized material of intangible cultural heritage. The motivation for building the system was a vast quantity of multimedia... more
Video annotation is an activity that aims to supplement this type of multimedia object with additional content or information about its context, nature, content, quality and other aspects. These annotations are the basis for building a... more
In this paper, a method to extract temperature effect information using the color temperatures of video scenes with mapping to temperature effects is proposed to author temperature effects of multiple sensorial media content... more
The purpose of this research study was to explore the effectiveness of educational technology in strengthening students ’ academic achievement in English at secondary school level. All the students at secondary school level in Kohat... more
Human perception is inherently multi-sensorial involving five traditional senses: sight, hearing, touch, taste, and smell. In contrast to traditional multimedia, based on audio and visual stimuli, mulsemedia seek to stimulate all the... more
Human perception is inherently multi-sensorial involving five traditional senses: sight, hearing, touch, taste, and smell. In contrast to traditional multimedia, based on audio and visual stimuli, mulsemedia seek to stimulate all the... more
This paper aims to present a project in progress, an interactive installation for collaborative manipulation of multimedia content. The proposed setup consists in a vertical main screen and a horizontal second screen, which is used as... more
In this paper, we propose a novel semi-supervised feature analyzing framework for multimedia data understanding and apply it to three different applications: image annotation, video concept detection and 3D motion data analysis. Our... more
This study intends to evaluate the effectiveness of Electronic Glossary and Non-electronic Glossary in L2 vocabulary learning among a group of low proficiency learners of English. It also seeks to determine which glossary mode is... more
This study intends to evaluate the effectiveness of Electronic Glossary and Non-electronic Glossary in L2 vocabulary learning among a group of low proficiency learners of English. It also seeks to determine which glossary mode is... more
The Qur’an, the holy divine book of Muslims, was revealed over fourteen centuries ago, in Arabic. With the rise of Islam, the Arabic language gained popularity and became the lingua franca for large swaths of the old world. Devout Muslims... more
Human perception is inherently multi-sensorial involving five traditional senses: sight, hearing, touch, taste, and smell. In contrast to traditional multimedia, based on audio and visual stimuli, mulsemedia seek to stimulate all the... more
Human perception is inherently multi-sensorial involving five traditional senses: sight, hearing, touch, taste, and smell. In contrast to traditional multimedia, based on audio and visual stimuli, mulsemedia seek to stimulate all the... more
This paper presents a system that is designed to make possible the organization and search within the collected digitized material of intangible cultural heritage. The motivation for building the system was a vast quantity of multimedia... more
The goal of the paper is assessing the quality of end-user tags from a video labeling game as a first step in the process of integrating them with the annotations made by professionals. Tags lack precise meaning, whereas the terms and... more
Research on Quality of Experience (QoE) heavily relies on subjective evaluations of media. An important aspect of QoE concerns modeling and quantifying the subjective notions of 'beauty' (aesthetic appeal) and 'something well-known'... more
Automatic image annotation has attracted lots of research interest, and effective method for image annotation. Find effectively the correlation among labels and images is a critical task for multi-label learning. Most of the existing... more
— There is a huge wealth of multimedia web resources related to the sciences of the Holy Quran, including "Tafseer" of the Holy Quran, teaching the provisions of recitation, the stories of the Holy Quran, and many other categories of... more
— There are many Arabic websites contain phrases from the Quran. Regrettably, the Quran texts appeared in a majority of websites were suffering from many mistakes and typos. Hence, finding the correct form of Quran verses has become... more
Research progresses in multimedia computing and systems using semantic technologies have been recently and widely explored. This special issue on multimedia and semantic technologies for future computing environments provides high quality... more
Feature selection is an effective way to reduce computational cost and improve feature quality for the large-scale multimedia analysis system. In this paper, we propose a novel feature selection method in which the hinge loss function... more
Explosive growth of multimedia data has brought challenge of how to efficiently browse, retrieve and organize these data. Under this circumstance, different approaches have been proposed to facilitate multimedia analysis. Several... more
In multimedia annotation, labeling a large amount of training data by human is both time-consuming and tedious. Therefore, to automate this process, a number of methods that leverage unlabeled training data have been proposed. Normally, a... more
Labeling image collections is a tedious task, especially when multiple labels have to be chosen for each image. In this paper we introduce a new framework that extends state of the art models in word prediction to incorporate information... more
The goal of the paper is assessing the quality of end-user tags from a video labeling game as a first step in the process of integrating them with the annotations made by professionals. Tags lack precise meaning, whereas the terms and... more
In the audiovisual domain tagging games are explored as a method to collect user-generated metadata. For example, the Netherlands Institute for Sound and Vision deployed the video labelling game Waisda? to collect user tags for videos... more
Download research papers for free!