Multimedia semantics

description40 papers

group0 followers

lightbulbAbout this topic

Multimedia semantics is the study of meaning and interpretation in multimedia content, encompassing the integration of text, audio, images, and video. It focuses on how these elements convey information, emotions, and context, and how they interact to create a cohesive understanding for users.

lightbulbAbout this topic

Key research themes

1. How can ontologies and semantic frameworks improve the representation and retrieval of multimedia content?

This theme investigates the design, construction, and application of ontologies and semantic models to capture the meanings embedded in multimedia data. Unlike traditional metadata or low-level descriptors, ontologies provide a formal, structured representation of multimedia semantics, enabling interoperability, advanced querying, and bridging the semantic gap between raw media features and human interpretation. The theme underscores why effective semantic modeling is fundamental to multimedia indexing, search, and content management in distributed and heterogeneous environments.

Trends in Managing Multimedia Semantics

by Lambrini Seremeti

2021, International Journal of Wireless Networks and Broadband Technologies

Key finding: This paper outlines recent efforts to bridge low-level multimedia descriptors with human-understandable semantics via ontologies. It highlights MPEG-7's limitations and advocates ontology development for interoperable... Read more

articleView Paper downloadDownload

Semantic Multimedia

by Raphael Troncy

2016

Key finding: The work proposes COMM, a core multimedia ontology founded on DOLCE, to unify manual and automatic multimedia annotations. It critiques MPEG-7's XML-based approach and demonstrates how ontology-based annotations enable... Read more

articleView Paper downloadDownload

The Role of Explicit Semantics in Search and Browsing

by Benoit Huet and

2015

Key finding: This survey synthesizes the design decisions in Semantic Web applications for supporting search functionality over semantic data. It explicates how explicit semantic structures underlying multimedia metadata improve query... Read more

articleView Paper downloadDownload

Context-based multimedia semantics modelling and representation

by uchechukwu Eze

2022

Key finding: Although focused on context, this thesis identifies key contextual dimensions that enhance semantic understanding of multimedia beyond low-level features. It complements ontology-based approaches by addressing knowledge... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What strategies and models effectively unify multimodal information to compute semantics and sentiments from multimedia content?

This area investigates models, algorithms, and computational frameworks that integrate heterogeneous modal data (e.g., audio, visual, textual) to extract unified semantic and affective information. Given user-generated content's multimodal nature, accurately deriving meaningful semantics and sentiments requires combining features across modalities and contextual metadata. Understanding effective multimodal fusion enhances multimedia summarization, tag relevance, personalized recommendation, and affective computing.

Multimodal Semantics and Affective Computing from Multimedia Content

by Vishal Choudhary

2023, Advances in Multimedia and Interactive Technologies

Key finding: This chapter demonstrates leveraging multimodal information (text, audio, visual, gaze) and contextual metadata enables more accurate, comprehensive semantics and sentiment extraction from user-generated content than unimodal... Read more

articleView Paper downloadDownload

Understanding image-text relations and news values for multimodal news analysis

by John Bateman

2025, Frontiers in artificial intelligence

Key finding: The paper introduces a scalable framework combining taxonomy of image-text relations with journalism-derived news values to interpret multimodal news content. It empirically shows that understanding cross-modal semantic... Read more

articleView Paper downloadDownload

Analysis and comprehension of multimodal texts

by Len Unsworth

2024, The Australian Journal of Language and Literacy

Key finding: This research applies systemic functional linguistics and visual grammar to model and analyze student comprehension of image-language relations in multimodal texts. Using empirical test data, it identifies how different types... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can knowledge-driven, semantic-aware methods enhance multimedia segmentation, classification, and retrieval performance?

This research theme focuses on integrating domain knowledge, semantic reasoning, and contextual information to improve multimedia analysis tasks such as segmentation and classification. Traditional low-level feature-based segmentation and classification often face errors due to ambiguous boundaries or visually similar classes. Knowledge-driven approaches leverage semantic-level criteria, spatial and contextual relationships, and attribute-based classifiers to reduce these errors, enabling more accurate semantic labeling and retrieval of multimedia content.

Knowledge-Driven Segmentation and Classification

by Thanos Athanasiadis

2018, Metadata, Analysis and Interaction

Key finding: The paper presents methodologies that incorporate high-level semantic knowledge and context to refine initial multimedia segmentation and classification results. It demonstrates that semantic segmentation based on similarity... Read more

articleView Paper downloadDownload

Knowledge‐Driven Segmentation and Classification

by Georgios Th. Papadopoulos and

2017, Multimedia Semantics

Key finding: Closely related to the former paper, this chapter further articulates the interaction between multimedia processing and knowledge representation. It details using contextual information such as spatial relations and... Read more

articleView Paper downloadDownload

Establishing Correspondences between Attribute Spaces and Complex Concept Spaces Using Meta-PGN Classifier

by Peter L Stanchev

2022

Key finding: This work introduces the Meta-PGN classifier that extends attribute feature spaces with metadata to bridge the gap between low-level feature regularities and high-level human semantic concepts. Applied in art painting... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Multimedia semantics

The Role of Explicit Semantics in Search and Browsing

by Benoit Huet and

2011

In recent years several Semantic Web applications have been developed that support some form of search. These applications provide different types of search functionality and make use of the explicit semantics present in the data in... more

descriptionView Paper arrow_downwardDownload

Towards integrating semantics of multi-media resources and processes in e-Learning

by Emmanuel Eze

2006, Multimedia Systems

Internet-based e-Learning has experienced a boom and bust situation in the past 10 years [32]. To bring in new forces to knowledge-oriented e-Learning, this paper addresses the semantic integration issue of multi-media resources and... more

descriptionView Paper arrow_downwardDownload

RDF-Powered Semantic Video Annotation Tools with Concept Mapping to Linked Data for Next-Generation Video Indexing

by Leslie F Sikos, Ph.D.

Video annotation tools are often compared in the literature, however, most reviews mix unstructured, semi-structured, and the very few structured annotation software. This paper is a comprehensive review of video annotations tools... more

Fig. 1. The top left corner coordinates and dimensions of Rols can be used for spatial annotation of movie characters. Movie scene by United Artists

Fig. 2. A graph visualizing the RDF triples

Fig. 3. Vannotea, an early implementation of structured video annotation tools [33]

Fig. 4. Temporal segment annotation with Advene being a proprietary Dinary format, the Native tle format OF Aavene 1S not ldeal. Nev ertheless, the software is Linked Data-ready, because every annotation, relation, an view is identified by a URI and RDF/X ML output is supported. The software does no incorporate multimedia ontologies beyond FOAF and Dublin Core though. OntoMedia™ was developed in 2006 for large multimedia collection managemen using Semantic Web technologies. The graphical user interface of this standalon Java application offered easy metadata indexing and video retrieval. OntoMedia ac cepted any input media supported by QuickTime or the Java Media Framework, an could generate RDF and relational database output. Also in 2006, Bertini and his colleagues developed the Multimedia Ontology Manag er (MOM) to combine multimedia ontology engineering with automatic annotation and generate textual and auditory commentary for video sequences [34]. The automat ic video annotation was performed for entire video clips by using similarity checkin between visual ontology concepts and extracted clips, and for video sequences b' using composite concept patterns. Video clip sequences were annotated with prede fined articulated sentences curated by the RACER reasoner. Annomation,’ published in 2008 as a collaborative Linked Data-driven narrativ hypervideo application, allowed users to semantically annotate video resources usin controlled vocabularies defined in the LOD Cloud. It was restricted to predefiner videos hosted by the service, and the semantic annotations were saved in a local re nnctnamz malzing tham inarraccihla tn avtamal camantr anantc

Fig. 5. Comprehensive concept mapping to LOD in SemWebVid [37] SemWebVid was an Ajax web application released in 2010, which automatically generated YouTube’ video descriptions in RDF, taking manually added tags and closed captions into account. SemWebVid implemented natural language processing APIs to analyze the descriptors, and mapped the results to LOD concepts, using the DBpedia, Uberblic, Any23, and rdf:about APIs, and the now-discontinued Sindice API. Provenance data was color-coded, which was an original idea, however, the resulting text was not always easy to read (see Figure 5).

Fig. 6. The ConnectME hypervideo annotation suite incorporated temporal information with labels and LOD concepts

Fig. 7. In Open Video Annotation, users can take notes on the timeline, view existing annota- tions, and play annotated video fragments individually EXMARaLDA metadata files, or SRT subtitle files, the software automatically con verts television content metadata into RDF. However, the software cannot generat RDF based on the video content alone, and is basically limited to the serialization o: existing textual data as structured data. The LinkedTV Editor’? provides a user inter: face for broadcasting services, which uses the automatically generated annotations o: LinkedTV for the rapid generation of contextual information queues. Open Video Annotation” is based on open source JavaScript libraries, such as Vid- eo.js,° Annotator, * and RangeSlider.*’ The developers claim that the software is compliant with W3C’s Open Annotation data formats. Open Video Annotation was designed to provide an intuitive interface for semantic tagging and the playback of semantically enriched videos (see Figure 7).

Fig. 8. In MyStoryPlayer, metadata and classification are coupled with timestamp-based snapshot comments

Table 2. Syntax and semantics of SROTQ axioms As an example, assume a file of a video scene, namely the climax of the movie “The Good, the Bad, and the Ugly” with the trio, Tuco, Blondie, and Angel Eyes, portrayed

Table 1. Syntax and semantics of SROTQ constructors Interpretation Z consists of a set A” (the domain of Z) and an interpretation function -7 which maps each atomic concept A to a set A? € A’, each atomic role R to a binary relation R* ¢ A’? x A*, and each individual name a to an element a* € A’. Similar to the constructors, the formal meaning of the axioms is defined by their model-theoretic semantics, as shown in Table 2.

Table 3. Standards supported by structured video annotation tools Open standards are preferred to proprietary implementations, such as the temporal annotation of Annomation, the spatial and temporal fragmentation of Advene and SemT ube, and proprietary ontologies, e.g., the SALERO ontologies used by IMAS or the LinkedTV ontology implemented by the LinkedTV Editor.

Table 4. Supported data formats of structured video annotation tools While Linked Data output is expected from semantic video annotation tools, de- pendence on a particular LOD dataset can be a major design issue. A good example is the now-discontinued SemTube, which implemented Freebase as the primary LOD

Table 5. Ontology use of semantic video annotation tools

Table 6. Spatiotemporal annotation support of semantic video annotation tools

descriptionView Paper arrow_downwardDownload

The Role of Explicit Semantics in Search and Browsing

by Benoit Huet

descriptionView Paper arrow_downwardDownload

Knowledge‐Driven Segmentation and Classification

by Georgios Th. Papadopoulos and

2011, Multimedia Semantics

In this chapter a first attempt will be made to examine how the coupling of multimedia processing and knowledge representation techniques, presented separately in previous chapters, can improve analysis. No formal reasoning techniques... more

descriptionView Paper arrow_downwardDownload

Integrating Content Authentication Support in Media Services

by Charalampos Dimoulas

Encyclopedia of Information Science and Technology, 4th edition

The present chapter investigates content authentication strategies and their use in media practice. Remarkable research progress has been conducted on media veracity methods and algorithms, however, without providing that much... more

descriptionView Paper arrow_downwardDownload

Towards integrating semantics of multi-media resources and processes in e-Learning

by David Webster and

2006, Multimedia Systems

Internet-based e-Learning has experienced a boom and bust situation in the past 10 years . To bring in new forces to knowledge-oriented e-Learning, this paper addresses the semantic integration issue of multimedia resources and learning... more

descriptionView Paper arrow_downwardDownload

Features for art painting classification based on vector quantization

by Rajkumar Kannan

2012

Abstract. An approach for extracting higher-level visual features for art painting classification based on MPEG-7 descriptors is presented in this paper. The MPEG-7 descriptors give a good presentation of different types of visual... more

descriptionView Paper arrow_downwardDownload