Conference Presentations by Nikolaos Tsipas

The task of general audio detection and segmentation based in means of machine learning is very p... more The task of general audio detection and segmentation based in means of machine learning is very popular and high-demanding procedure nowadays. Most relevant works in the last decade aim at modelling audio in order to conduct a semantics analysis and a high–level categorization. A generic strategy that would detect audio events as means of transitions from one audio state to another is considered interesting and would support whole classification workflow. This work investigates the possibilities in designing a robust bimodal segmentation algorithm for audio that would perform well in different conditions without relying on complicated machine learning schemes by minimizing prior knowledge for detection model, and thus, delivering consistent performance for any input signal and computing environment. Additionally, a modern user-generated content approach for populating and updating ground truth databases is presented. Both techniques are implemented and embedded as upgrades, in a mobile software environment for smartphones.
With this submission, a set of ensemble learning based methods for the MIREX 2015 Speech / Music ... more With this submission, a set of ensemble learning based methods for the MIREX 2015 Speech / Music Classification and Detection task is proposed and evaluated. The main algorithm for the Detection task employs a self-similarity matrix analysis technique to detect homogeneous segments of audio that can be subsequently classified as music or speech by a Random Forest classifier. In addition to the main algorithm two variations are proposed, the first one employs a silence…
Papers by Nikolaos Tsipas

Special Issue 1-Dec2019, 2019
During the last two decades, citizens' participation in news production process has raised great ... more During the last two decades, citizens' participation in news production process has raised great academic and entrepreneurial interest for participatory journalism. Traditional procedures and concepts such as gatekeeping have been under discussion. News organizations redesign their websites in order to adopt tools and applications that make it possible for users to be active consumers or even co-producers of journalistic content, by liking, sharing, commenting and submitting material. At the same time, huge amounts of user-generated content are uploaded every minuteon social media platforms. Subsequently, professionals have to deal with continually available information which requires management, classification and evaluation in order to keep high journalistic standards and to avoid problems, varying from plain grammar mistakes to serious situations of fake news, hostility or hate speech. Thus, there is the obvious need for a new model of managing participatory journalism, based on semantic technologies, which will support organized collection and moderation of content in an effective way and in short time. The main objective of this paper is to define the requirements and describe the characteristics that the model should have. For this purpose, two online surveys of journalists and users were conducted in Greece, in order to gain some insights concerning the development of the model. The paper presents the key findings from the surveys and identifies the views, the preferences and the experiences as expressed by the respondents, which lead to the tendency towards a collaborative, semantic-oriented way of submitting and receiving user-generated content.
Tools and Machine Learning Techniques for Semantic Audio Analysis
ABSTRACT
Συστήματα επεξεργασίας οπτικοακουστικών πόρων

Heliyon
Photos have been used as evident material in news reporting almost since the beginning of Journal... more Photos have been used as evident material in news reporting almost since the beginning of Journalism. In this context, manipulated or tampered pictures are very common as part of informing articles, in today's misinformation crisis. The current paper investigates the ability of people to distinguish real from fake images. The presented data derive from two studies. Firstly, an online cross-sectional survey (N ¼ 120) was conducted to analyze ordinary human skills in recognizing forgery attacks. The target was to evaluate individuals' perception in identifying manipulated visual content, therefore, to investigate the feasibility of "crowdsourced validation". This last term refers to the process of gathering fact-checking feedback from multiple users, thus collaborating towards assembling pieces of evidence on an event. Secondly, given that contemporary veracity solutions are coupled with both journalistic principles and technology developments, an experiment in two phases was employed: a) A repeated measures experiment was conducted to quantify the associated abilities of Media and Image Experts (N ¼ 5 þ 5) in detecting tampering artifacts. In this latter case, image verification algorithms were put into the core of the analysis procedure to examine their impact on the authenticity assessment task. b) Apart from conducting interview sessions with the selected experts and their proper guidance in using the tools, a second experiment was also deployed on a larger scale through an online survey (N ¼ 301), aiming at validating some of the initial findings. The primary intent of the deployed analysis and their combined interpretation was to evaluate image forensic services, offered as real-world tools, regarding their comprehension and utilization by ordinary people, involved in the everyday battle against misinformation. The outcomes confirmed the suspicion that only a few subjects had prior knowledge of the implicated algorithmic solutions. Although these assistive tools often lead to controversial or even contradictory conclusions, their experimental treatment with the systematic training in their proper use boosted the participants' performance. Overall, the research findings indicate that the scores of successful detections, relying exclusively on human observations, cannot be disregarded. Hence, the ultimate challenge for the "verification industry" should be to balance between forensic automations and the human experience, aiming at defending the audience from inaccurate information propagation.
Media Assets
Springer International Publishing eBooks, 2020

Mirex 2015 : Methods for Speech / Music Detection and Classification
With this submission, a set of ensemble learning based methods for the MIREX 2015 Speech / Music ... more With this submission, a set of ensemble learning based methods for the MIREX 2015 Speech / Music Classification and Detection task is proposed and evaluated. The main algorithm for the Detection task employs a self similarity matrix analysis technique to detect homogeneous segments of audio that can be subsequently classified as music or speech by a Random Forest classifier. In addition to the main algorithm two variations are proposed, the first one employs a silence detection algorithm while the second one omits the self-similarity information and relies solely on the Random Forest classifier. For the Classification task two variants are proposed, both based on a sliding-window classification approach. In the first case a pre-trained model is used, while in the second case, a training phase exploiting training data provided during the submission evaluation, precedes classification.

Semantic Tools for Participatory
The proliferation of User-Generated Content (UGC) has led to catalytic changes both in the news p... more The proliferation of User-Generated Content (UGC) has led to catalytic changes both in the news production process and in the journalist-audience relationship. Media organizations redesign their strategies, by adopting tools of participatory journalism through which amateurs’ comment on stories, submit material for publication or share articles. Furthermore, a large amount of content becomes constantly available on social media platforms, enhancing the need of continuous monitoring and effective management. However, the coexistence of professional and amateur content raises significant issues, forcing media organizations to spend resources in order to ensure quality. Except for traditional methods, modern collection, management and validation methods are often based on semantic web services. The paper aims to offer an in-depth description of the already available semantic analysis tools in the context of participatory journalism, seeking to identify the existing use of semantic tech...
Extending Temporal Feature Integration for Semantic Audio Analysis
Journal of The Audio Engineering Society, 2017

Semantic Tools for Participatory Journalism
The proliferation of User-Generated Content (UGC) has led to catalytic changes both in the news p... more The proliferation of User-Generated Content (UGC) has led to catalytic changes both in the news production process and in the journalist-audience relationship. Media organizations redesign their strategies, by adopting tools of participatory journalism through which amateurs comment on stories, submit material for publication or share articles. Furthermore, a large amount of content becomes constantly available on social media platforms, enhancing the need of continuous monitoring and effective management. However, the coexistence of professional and amateur content raises significant issues, forcing media organizations to spend resources in order to ensure quality. Except for traditional methods, modern collection, management and validation methods are often based on semantic web services. The paper aims to offer an in-depth description of the already available semantic analysis tools in the context of participatory journalism, seeking to identify the existing use of semantic techn...

Information, 2020
Radio is evolving in a changing digital media ecosystem. Audio-on-demand has shaped the landscape... more Radio is evolving in a changing digital media ecosystem. Audio-on-demand has shaped the landscape of big unstructured audio data available online. In this paper, a framework for knowledge extraction is introduced, to improve discoverability and enrichment of the provided content. A web application for live radio production and streaming is developed. The application offers typical live mixing and broadcasting functionality, while performing real-time annotation as a background process by logging user operation events. For the needs of a typical radio station, a supervised speaker classification model is trained for the recognition of 24 known speakers. The model is based on a convolutional neural network (CNN) architecture. Since not all speakers are known in radio shows, a CNN-based speaker diarization method is also proposed. The trained model is used for the extraction of fixed-size identity d-vectors. Several clustering algorithms are evaluated, having the d-vectors as input. Th...

Journal of the Audio Engineering Society, 2020
Semantic audio analysis has become a fundamental task in modern audio applications, making the im... more Semantic audio analysis has become a fundamental task in modern audio applications, making the improvement and optimization of classification algorithms a necessity. Standard frame-based audio classification methods have been optimized and modern approaches introduce engineering methodologies that capture the temporal dependency between successive feature observations, following the process of temporal feature integration. Moreover, the deployment of the convolutional neural networks defined a new era on semantic audio analysis. The current paper attempts a thorough comparison between standard feature-based classification strategies, state-of-the-art temporal feature integration tactics and 1D/2D deep convolutional neural network setups, on typical audio classification tasks. Experiments focus on optimizing a lightweight configuration for convolutional network topologies on a Speech/Music/Other classification scheme that can be deployed on various audio information retrieval tasks, such as voice activity detection, speaker diarization, or speech emotion recognition. The outmost target of this work is to establish an optimized protocol for constructing deep convolutional topologies on general audio detection classification schemes, minimizing complexity and computational needs.

Social Sciences, 2020
During the last two decades, citizens’ participation in news production process has attracted sig... more During the last two decades, citizens’ participation in news production process has attracted significant interest from both academia and the media industry. Media production and consumption have been altered considerably and traditional concepts, such as gatekeeping, have been under discussion. Many news organisations include in their websites tools and applications that allow users to be active consumers or even co-producers of journalistic content, by liking, sharing, commenting and submitting material. At the same time, large amounts of user-generated content are uploaded every day on social media platforms. Subsequently, media organisations must deal with continually available information which requires management, classification and evaluation not only to keep high journalistic standards, but also to avoid problems. The latter category can include grammar mistakes, fake or misleading information and hate speech. All the above-mentioned parameters highlight the obvious need for...

Signal Processing: Image Communication, 2018
Sub-pixel motion estimation plays a vital role in a multitude of video applications, including en... more Sub-pixel motion estimation plays a vital role in a multitude of video applications, including encoding, audiovisual archiving/heritage and super-resolution enhancement. Most existing block-based methods rely on the implicit assumption that blocks can be accurately predicted through appropriate shifts. In particular, shifted blocks in the target frame are estimated from the associated anchor frame blocks. The present paper introduces a different strategy, which discards this assumption and treats anchor and target frame blocks equally, as sub-pixel shifted versions of an unavailable implied block. The new method attempts to construct this implied block and, by calculating the "imaginary" motion vectors that relate it to the two existing blocks, it estimates the wanted motion vectors more accurately. This approach aims at extracting motion vectors that more accurately represent the actual movements of objects, minimizing the interpolation error that is associated with sub-pixel shifting, which manifests as blurring and a lowering of contrast. The new method focuses on accurate motion estimation, paying less attention to the associated computational load. Hence, the approach is both inspired from, and proposed for, super-resolution enhancement scenarios, where higher definition motion image sequences are estimated from their available lower definition counterparts. In order to implement the new strategy, an algorithm for reversing the bilinear sub-pixel shift of a block (unshifting) is implemented and validated. Comparisons between original blocks of images and blocks that have been shifted and unshifted back to their original coordinates showcase the accuracy of the unshifting process. The proposed motion estimation method is evaluated through a number of different experimental assessment procedures and metrics, comparing it to existing high-accuracy state-of-theart motion estimation methods.

Education Sciences, 2019
Art and technology have always been very tightly intertwined, presenting strong influences on eac... more Art and technology have always been very tightly intertwined, presenting strong influences on each other. On the other hand, technological evolution led to today’s digital media landscape, elaborating mediated communication tools, thus providing new creative means of expression (i.e., new-media art). Rich-media interaction can expedite the whole process into an augmented schooling experience though art cannot be easily enclosed in classical teaching procedures. The current work focuses on the deployment of a modern-art web-guide, aiming at enhancing traditional approaches with machine-assisted blended-learning. In this perspective, “machine” has a two-folded goal: to offer highly-interdisciplinary multimedia services for both in-class demonstration and self-training support, and to crowdsource users’ feedback, as to train artificial intelligence systems on painting movements semantics. The paper presents the implementation of the “Istoriart” website through the main phases of Analys...

Journal of the Audio Engineering Society, 2016
The task of general audio detection and segmentation is quite common in contemporary audio applic... more The task of general audio detection and segmentation is quite common in contemporary audio applications where computational intensive processes are frequently involved. Machine learning is usually employed along with user-enabled tiresome data labeling, purposing to detect, segment, and semantically annotate the implicated audio events. This work focuses on generic audio detection and classification, combining hierarchical bimodal segmentation with hybrid pattern classification that is performed at different temporal resolutions. The paper presents the algorithmic perspective of a mobile back-end system, intending to facilitate the construction, validation, and continuous update of generic audio ground-truth data. The outmost target is the implementation of a system capable of performing well in different conditions, without relying on complicated pattern recognition systems and taxonomies. Therefore, minimized prior knowledge of the model is considered, aiming at delivering consistent behavior for different input signals and computing environments. Improved performance is targeted through the combination of many different modalities, while novel "classification confidence" metrics are implemented for balancing between the size and the achieved accuracy of the automated annotation outcomes. The introduced novelties can confront interfacing and operating limitations of smart phones (and generally mobile terminals), facilitating their use in massive crowdsourcing of diverse and semantically enhanced audio content.

Multimedia Tools and Applications, 2017
In this paper, an audio-driven algorithm for the detection of speech and music events in multimed... more In this paper, an audio-driven algorithm for the detection of speech and music events in multimedia content is introduced. The proposed approach is based on the hypothesis that short-time frame-level discrimination performance can be enhanced by identifying transition points between longer, semantically homogeneous segments of audio. In this context, a two-step segmentation approach is employed in order to initially identify transition points between the homogeneous regions and subsequently classify the derived segments using a supervised binary classifier. The transition point detection mechanism is based on the analysis and composition of multiple self-similarity matrices, generated using different audio feature sets. The implemented technique aims at discriminating events focusing on transition point detection with high temporal resolution, a target that is also reflected in the adopted assessment methodology. Thereafter, multimedia indexing can be efficiently deployed (for both audio and video sequences), incorporating the processes of high resolution temporal segmentation and semantic annotation extraction. The system is evaluated against three publicly available datasets and experimental results are presented in comparison with existing implementations. The proposed algorithm is provided as an open source software package in order to support reproducible research and encourage collaboration in the field.

Augmenting Social Multimedia Semantic Interaction through Audio-Enhanced Web-TV Services
Proceedings of the Audio Mostly 2015 on Interaction With Sound - AM '15, 2015
Multimedia semantic analysis is a key element in managing the exponentially growing amount of pro... more Multimedia semantic analysis is a key element in managing the exponentially growing amount of produced multimedia content, available on the web and the social media. Towards this direction, a semantically enhanced Web-TV environment providing video-on-demand and simulcast streaming services, is proposed. The system offers content management and analysis automation capabilities by exploiting information derived from the semantic analysis of the user uploaded content and the social interaction of its users through the processes of annotation and tagging. A fusion based approach is employed for the categorization of content, enabling users to combine heterogeneous semantic information, thus enhancing content exploration with rich media experience. The paper focuses on the analysis of the system's architecture, the applied methodologies for incorporating user generated classification schemes and annotations, and the evaluation of machine learning algorithms to provide innovative multimedia content exploration methods.

Mobile Audio Intelligence
Proceedings of the Audio Mostly 2015 on Interaction With Sound - AM '15, 2015
The task of general audio detection and segmentation based in means of machine learning is very p... more The task of general audio detection and segmentation based in means of machine learning is very popular and high-demanding procedure nowadays. Most relevant works in the last decade aim at modelling audio in order to conduct a semantics analysis and a high--level categorization. A generic strategy that would detect audio events as means of transitions from one audio state to another is considered interesting and would support whole classification workflow. This work investigates the possibilities in designing a robust bimodal segmentation algorithm for audio that would perform well in different conditions without relying on complicated machine learning schemes by minimizing prior knowledge for detection model, and thus, delivering consistent performance for any input signal and computing environment. Additionally, a modern user-generated content approach for populating and updating ground truth databases is presented. Both techniques are implemented and embedded as upgrades, in a mobile software environment for smartphones.
Uploads
Conference Presentations by Nikolaos Tsipas
Papers by Nikolaos Tsipas