Video and Audio Processing

description14 papers

group11 followers

lightbulbAbout this topic

Video and audio processing is the manipulation and analysis of digital video and audio signals using algorithms and software. This field encompasses techniques for encoding, decoding, compressing, enhancing, and transforming multimedia content to improve quality, facilitate transmission, and enable various applications in entertainment, communication, and data analysis.

lightbulbAbout this topic

Key research themes

1. How can integration and synchronization techniques improve real-time joint audio-video processing in multimedia applications?

This research area focuses on methodologies and frameworks to tightly integrate and synchronize audio and video streams in real-time multimedia applications, addressing challenges posed by independent capture, transmission delays, buffering, and processing constraints. Synchronization enables coherent audiovisual experiences crucial for activities such as video conferencing, live streaming, and interactive media.

An Online System for Synchronized Processing of Video and Audio Signals

by Mohamed Elhelaly

2022, 2006 Canadian Conference on Electrical and Computer Engineering

Key finding: The authors developed an online system achieving synchronization of independently captured and separately processed audio and video streams via time-stamping techniques. This approach correlates audio packets with video... Read more

articleView Paper downloadDownload

Real-Time Video and Audio in the World Wide Web

by Roy Campbell

2025, Proceedings of the Fourth International Conference on World Wide Web

Key finding: Presented Vosaic, a WWW browser extension integrating real-time video and audio into hypertext pages with no retrieval latency by extending HTTP servers to utilize a novel Video Datagram Protocol (VDP). This allowed a 44-fold... Read more

articleView Paper downloadDownload

Current trends in joint audio-video signal processing: a review

by Roland Goecke

2016, Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005.

Key finding: Comprehensively reviews methods for joint audio-video signal processing emphasizing the importance of cross-modal fusion for speech recognition and person authentication. It highlights integration challenges, current... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What advances in video compression standards and computational frameworks drive efficient multimedia processing and adaptation?

This theme explores developments in video compression standards like H.264 and predictions on future video coding paradigms, alongside computational frameworks that allow adaptable video editing and delivery. Research emphasizes improving coding efficiency, reducing bitrates, adapting video for multiple platforms, and leveraging machine learning for content-aware processing, which are fundamental for scalable, high-quality multimedia distribution.

Video Compression Using H . 264-A Review

by MANJUSHA DESHMUKH

2023

Key finding: Analyzes H.264/AVC standard's technical innovations yielding at least a twofold compression efficiency improvement over predecessors through enhanced motion estimation with small block sizes, intra prediction, a DCT-like... Read more

articleView Paper downloadDownload

The Future of Video Coding

by Nam Ling

2023, APSIPA Transactions on Signal and Information Processing

Key finding: Synthesizes expert panel insights on emerging video coding research, including the dual-track approach integrating conventional and deep learning-based coding techniques, the ongoing importance of scalable, immersive... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How do audio signal processing and structured audio standards enhance multimedia audio representation, synthesis, and user interaction?

This theme covers advances in digital audio processing including perception-informed signal modeling, synthetic audio representation within multimedia standards, and novel audio coding methods. It examines computational frameworks for representing, coding, and synthesizing both natural and structured audio, facilitating flexible, high-fidelity soundtracks and enabling enriched multimedia experiences and assistive technologies.

Audio Processing

by PRIYANJANA GHOSH (11BEC1025)

2013

Key finding: Details human auditory system characteristics underlying audio processing, emphasizing the cochlea's frequency analysis via place and volley principles. This physiological understanding guides design of digital audio... Read more

articleView Paper downloadDownload

Structured audio and effects processing in the MPEG-4 multimedia standard

by Eric Scheirer

2023, Elsevier eBooks

Key finding: Describes MPEG-4's extension for 'Structured Audio,' enabling algorithmic descriptions of synthetic sounds, musical scores, and audio effects integrated with natural audio streams. This facilitates highly compressed, flexible... Read more

articleView Paper downloadDownload

Audio Signal Processing

by Preeti Rao

2022, Studies in Computational Intelligence

Key finding: Reviews audio signal characteristics and classification methodologies, highlighting time-frequency representations and feature extraction techniques imperative for audio segmentation, retrieval, and compression. Emphasizes... Read more

articleView Paper downloadDownload

Identifying Major Components of Pictures by Audio Encoding of Colours

by Michel Vinckenbosch

2023, Lecture Notes in Computer Science

Key finding: Presents a novel approach mapping image color components to musical instrument sounds for aiding visually impaired users in constructing mental spatial images via auditory cues. Experimental results show that learned... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Video and Audio Processing

FILTWAM - A Framework for Online Game-based Communication Skills Training - Using Webcams and Microphones for Enhancing Learner Support

by Kiavash Bahreini

2023

This paper provides an overarching framework embracing conceptual and technical frameworks for improving the online communication skills of lifelong learners. This overarching framework is called FILTWAM (Framework for Improving Learning... more

descriptionView Paper arrow_downwardDownload

Near-Duplicate Video Retrieval. NIT-2013

by Ilya (w-495) Nikitin

2018, Поиск нечётких дубликатов видео

The term “Near-duplicate” is an object that is fully or partly similar to another object. There are natural and artificial near-duplicates. Natural near-duplicates are similar objects within the similar environment, while artificial near-duplicates are objects, which comes from one original object. Now lets, turn to the main aims of the the near duplicate retrieval. Firstly, the near duplicate retrieval may be helpful for the optical navigation of cruise missiles or pilotless planes. Secondly, it is helpful for grouping snippets of the search engines. Moreover, it is necessary for the video classification and copyright management. This is the most important part for the online cinema. Nowadays there are several ways to find near duplicates. First one is based on global video features. Another is based on local and global features of particular frames. Then, youtube.com uses the audio based approach while, Licenzero uses the visual words. Also the combinations of different methods are applied. We propose an algorithm of the near duplicate video retrieval based on the shot detection. A shot is a sequence of the frames, which are considerably different form the frames of another sequence. Imagine our task as the following: There are a lot of the initial videos. Somebody gives us some new video, and we should say if it is a copy of one or more of the initial videos. We compute scene descriptors for every shot of every initial video and for every shot of the given video. Then we compare these descriptors to each other.If several descriptors of the given video are coincide with descriptors of the initial videos, it means these parts of videos are duplicates. Else, the given video is unique. As a result of this work we have the approach for the near-duplicate video retrieval. Firstly, It includes length ratios and relative lengths. Secondly, we use the scene aliment. The same method is used in the computer linguistics for the sentences aliment in the bilingual corpora. It is called Gale-Church algorithm. Thirdly, we propose scene descriptors. Several experiments have been conducted.

descriptionView Paper arrow_downwardDownload

Near-Duplicate Video Retrieval. CMASS-2013

by Ilya (w-495) Nikitin

2018, Поиск нечётких дубликатов видео

Существует широкий круг задач, где требуется анализ, аудио-визуальных моделей реальности. В частности, для многих военных и гражданских приложений, необходимо наличие поиска нечетких дубликатов видео. Для мирного применения, — это... more

descriptionView Paper arrow_downwardDownload

Near-Duplicate Video Retrieval. NKP-2013

by Ilya (w-495) Nikitin

2018, Поиск нечётких дубликатов видео

XI All-Russian Conference “Neurocomputers and their application”, Мoscow: MSUPE, 19.03.2013.

Понятие «нечеткий дубликат» означает неполное или частичное совпадение текущего документа (изображения) с другим документом подобного класса.

Дубликаты бывают естественные и искусственные. Естественные дубликаты — схожие объекты, при схожих условиях. Искусственные нечеткие дубликаты — полученные на основе одного и того же оригинала.

Поиск нечетких дубликатов может быть полезен для оптической навигации беспилотных аппаратов, для определения характера ландшафта местности, составления каталогов видео, группировки сниппетов поисковых систем, фильтрация видео рекламы, и поиска пиратского видео.

Нечеткие дубликаты тесно связаны с проблемами классификации видео и поиска по видео. Но эти задачи являются самостоятельными.

На данный момент нечеткие дубликаты пытаются искать, сравнивая глобальные особенности видео, глобальные и локальные особенности кадров и множеств кадров. Сравнивают звуковой ряд (так, например делюат в ютубе). Ищут и сравнивают визуальные слова, характерные последовательности (лицензЕро). Используется комбинация методов.

Мы предлагаем алгоритм поиска нечетких дубликатов на основе сцен. Есть множество исходных файлов, для видео сцен этих файлов вычислены дескрипторы сцен. Получаем новое видео. Требуется установить является ли новое видео дубликатом существующих. Вычисляем дескрипторы сцен этого видео . Дескриптор каждой сцены нового видео сравниваем с каждым дескриптором кадой сцены каждого исходного видео.

Если на нектором временном промежутке дескрипторы совпали, считаем, считаем, что временной промежуток является дубликатом одной из частей исходного видео. Если не несовпали ни на одном — считаем видео уникальным.

Что такое сцена?

Существует три различных понятия. Кадр, Фотографический кадр — статическая картинка. Сцена, монтажный кадр — множество кадров связанных единством места и времени. Съемка, кинематографических кадр, множество кадров связанных единством съемки. 1 сцена может включать несколько съемок. Съемку часто называют <<сценой>>. Далее мы будем рассматривать съемку, но называть ее будем <<сценой>>.

Введем формальное определение. Сцена-съемка, набор кадров определенной временной области кадры которой значительно отличаются от кадров соседних опластей. Аналогичным образом можно ввести понятие звуковой сцены, правда возникает отдельная проблема выделения звуковых-кадров.

Выделение сцен происходит на основе трех базовых подходов. Это сравнение гистограм соседних кадров. Сравнение спектров кадров и сравнение векторов оптического потока. На слайде приведены первые кадры сцен рекламы на МТС, выделенные c помощью ffmpeg. ffmpeg использует сравнение гистограмм кадров.

Если исходную рекламу сжать разными кодеками, мы получим нечеткие дубликаты этого видео. Выделяя сцены в каждом видео, увидим, что точки перемены сцен для этих двух файлов не совпадают.

Если взять более продолжительное видео, и выделить сцены при более высокой чувствительности, то можно заметить, что для разных кодеков сцены определяются совсем не однородно. Более того, некоторые сцены могут не распознаться.

Для решения этой проблемы мы предлагаем использовать относительные длины сцен. Для каждой сцены вычисляем отношение длины сцены к длинам остальных сцен. Для первых четрех сцен рекламы МТС получаем следующую матрицу. В практических задачах удобнее вычислять отношения длин, для трех предыдущих сцен. Это удобно и вслучае, если все видео целиком нам недоступно.

Если относительная длина сцены одного видео отличается от длины соответствующей сцены другого видео не более чем в два раза, то можно предположить, что сцены выражают одно и то же явление. В этом случае то длину меньшей сцены можно сложить с длиной следующей сцены этого же видео, и рассматривать объединенную сцену как одну. Подобные метод называют алгоритмом Гейла-Черча. Используется в математической лингвистике для выравнивания праллельных корпусов тесктов на разных языках.

Если относительные длины двух сцен совпали, не факт что сцены действительно одинаковы. Требуется сравнивать внутренние свойства сцен. Можно взять начальные и конечные кадры.

Сравнивать можно попиксельно, на основе детектора краев. В данном случчае удобно использовать глобальные дескриптор GIST, и мешок слов.

GIST представляем из себя простой но не очень точный способ глобального описания изображения. Применим для широкого круга задач.

Мешок слов, обычно более точен, при достаточном размере словаря. Необходимо, чтобы был набор изображений, на которых можно обучиться. Хорош, если работа ведется только в рамках конкретной предметной области. Но работает медленне и потенциально бесконечен по памяти.

Т.о. получили дескриптор сцены. Он состит из вектора отношений длины сцены к длинам других сцен; Его удобно сразу хранить, с объединениями соседних сцен учитывая гиппотезу Гейла Черча; И характеристики начального и конечного кадров. В зависимоти от задачи мешок слов или GIST.

Вернемся к предложеннуму алгоритмы. Мы сравнивали дескрипторы на прямую в пространстве L2. Но это не эффективно, для больших дескрипторов.

Можно предложить использование цифровых семантических подписей. Введем бинарные подписи. Подписи должны быть близки для близких дескипторах в L2.

Самое простое, что можно предложить: Линейно чувствительные хеши. Пространство дескрипторов, делим гиперплоскостью на 2 подпространства. Назначаем дескрипторам этих подпространств подпись 0 или 1. Получили бит подписи. С увеличением числа бить ассимптотически приближаем метрику L2. Для задач подобного рода лучше себя показали обучаемые хеши. К ним относятся бустинг и ограниченная машина Больцмана. В качесвтве бустинга исользуется модификация AdaBoost. Цель — получить такие подписи, расстояние между дескрипторами вычисляется, будет вычисляться как расстояние Хемминга.

Ограниченная машина Больцмана: Вероятностная рекурентная нейронная сеть. Вероятностная версия сети Хопфилда. Или нейросетевая версия Скрытой Модели Маркова. Применяется модификация, без связей внутри слоев. Мощность словем понижается от размера входного вектора до размера требуемого кода.

В результате проделанной работы: был разработан подход поиска нечетких дубликатов видео на основе сцен. Его основные моменты: относительные длины, выравнивания сцен на основе лингвистичского алгоритма Гейла-Черча, дескрипторы сцен; Проведен ряд экспериментов, которые показали удобства подхода.

Далее нам требуется разработать собственный инструмент для поиск перемены, сейчас используется ffmpeg. лучше проработать схемы с использованием Больцмана Провести эксперименты на реальном множестве исxодных данных.

descriptionView Paper arrow_downwardDownload

Video Sequences Retrieval Algorithm

by Ilya (w-495) Nikitin

2018, Алгоритм поиска событий в видеопоследовательностях

The paper focuses on the algorithms of the event detection in content-based video retrieval. Video has a complex structure and can express the same idea in different ways. This makes the task of searching for video more complicated. Video titles and text descriptions cannot give the whole information about objects and events in the video. This creates a need for content-based video retrieval. There is a semantic gap between low-level video features, that can be extracted, and the users’ perception. The task of event detection is reduced to the task of video segmentation. Complex content-based video retrieval can be regarded as the bridge between traditional retrieval and semantic-based video retrieval. The properties of video as a time series are described. Introduced the concept of anomalies in the video. A method for event detection based on comparing moving averages with windows of different sizes is proposed. According to the classification given at the beginning of this article, our method refers to statistical methods. It differs from other methods with low computational complexity and simplicity. The video stream processing language is proposed for function-based description of video handling algorithms. So, our method is formulated in the form of a declarative description on an interpreted programming language. Unfortunately, most of the existing video processing methods use exclusively imperative approach, which often makes understanding more difficult. Examples of the use of this language are given. Its grammar is described too. As shown by experiments, the implementation of the proposed video events retrieval method, unlike their counterparts, can work for video streams too with a real-time and potentially infinite frame sequences. Such advantages within low computational requirements make implementation of the method helpful in aviation and space technology. The algorithm has some disadvantages due to necessity parameter selection for particular task classes. The theorem on near-duplicates of video is formulated at the end of the article. It asserts the near-duplicate videos express the same sequence of phenomena.

descriptionView Paper arrow_downwardDownload

The Methodology of Near-Duplicate Video Retrieval, CMASS-2013

by Ilya (w-495) Nikitin

2013, Методология поиска и идентификации нечетких дубликатов видео

descriptionView Paper arrow_downwardDownload

The Methodology of Near-Duplicate Video Retrieval, NIT-2013

by Ilya (w-495) Nikitin

2013, Методология поиска и идентификации нечетких дубликатов видеоизображений

В работе рассмотрен подход для поиска нечетких дубликатов видео. Поиск основан на сравнении относительных длин сцен в пространстве L2. Сравнение проводится с учетом гипотезы Гейла-Черча. Вводится понятие «дескриптора сцены». Для ускорения... more

descriptionView Paper arrow_downwardDownload

The Methodology of Finding and Identifying Near-Duplicate Video, TVZ-2012

by Ilya (w-495) Nikitin

2013, Методология поиска и идентификации нечетких дубликатов видеоизображений

Даны два видео файла или потока. Нужно выяснить являются ли они дубликатами друг друга. Здесь, под словом дубликат понимается не формализуемое условие: «На этих файлах изображено одно и то же?». Возможна, и другая постановка этой задачи.... more

descriptionView Paper arrow_downwardDownload

Video and Audio Processing

Key research themes

1. How can integration and synchronization techniques improve real-time joint audio-video processing in multimedia applications?

2. What advances in video compression standards and computational frameworks drive efficient multimedia processing and adaptation?

3. How do audio signal processing and structured audio standards enhance multimedia audio representation, synthesis, and user interaction?

Related Topics

All papers in Video and Audio Processing