Academia.eduAcademia.edu

Source Coding

description2,370 papers
group817 followers
lightbulbAbout this topic
Source coding is a process in information theory that involves converting data from its original form into a more compact representation. This is achieved by reducing redundancy in the data, enabling efficient storage and transmission while preserving the essential information content.
lightbulbAbout this topic
Source coding is a process in information theory that involves converting data from its original form into a more compact representation. This is achieved by reducing redundancy in the data, enabling efficient storage and transmission while preserving the essential information content.

Key research themes

1. How can source coding leverage structural and variability properties of code and source data to improve compression and retrieval?

This research theme investigates how the intrinsic structural properties of source code and the statistical variability of source data can be exploited to optimize source coding performance, including compression efficiency and information retrieval. It encompasses advances in variable-to-fixed length codes surpassing classical approaches, utilization of information variation measures such as entropy variance, and enriched metadata for source code reuse that considers both functional and quality attributes. Understanding these properties leads to improved coding schemes that balance average and worst-case coding rates, and enables better search and reuse in large code repositories.

Key finding: This paper introduces a novel variable-to-fixed length (VF) code that optimizes the average pointwise coding rate—a measure considering the expectation of the pointwise ratio of codeword length to phrase length—proving... Read more
Key finding: This study characterizes the average coding rate of multi-shot Tunstall codes constructed via multiple parsing trees, revealing that under the condition of geometric mean of leaf counts tending to infinity, both the classical... Read more
Key finding: This work advances source coding analysis by incorporating the variance (second central moment) of the information content per source symbol along with Shannon entropy, developing an interval-based approach rather than... Read more
Key finding: This paper presents a comprehensive dataset of code snippets enriched with static analysis metrics, code violations, readability scores, and source code similarity indexes, based on the CodeSearchNet corpus. By accounting for... Read more

2. What are effective approaches to source code visualization and social practices influencing programming comprehension and education?

This theme explores the visualization of source code and the social dimensions of programming as critical factors that affect both programmer understanding and educational outcomes. It covers empirical and theoretical research on how code can be perceived, shared, and collectively produced within social contexts, the role of secondary syntax like spatial layout and color in code comprehension, and novel visualization techniques to support audiences with varying programming expertise. These studies illuminate the cognitive and social facets of source code work and offer insights for designing tools and pedagogies that align with human perceptual and social processing.

Key finding: The paper identifies that live coding performances, typically projected code, present comprehension challenges for audiences unfamiliar with programming languages. It highlights how secondary syntax elements such as color and... Read more
Key finding: Through empirical classroom observations, this study demonstrates that programming is inherently a social practice governed by unwritten 'rules of the game' that go beyond formal specification-to-code transformations.... Read more
Key finding: Applying Self-Organizing Maps (SOMs) to analyze 3,882 Python modules from 25 educational repositories, the research reveals patterns in code structure and concept distribution that reflect curriculum density and learning... Read more

3. How are coding skills promoted and utilized across educational and societal contexts, and what challenges affect ethical coding practices?

This theme synthesizes research on the educational strategies and societal importance of coding skills acquisition, spanning early childhood to adult learners, formal and informal settings, and the implications for workforce readiness and social participation. It also addresses ethical concerns such as source code plagiarism, debating its causes and mitigation in educational contexts. The integration of coding with creativity, critical thinking, and democratization is explored along with the challenges of code reuse in open repositories and the impact of emerging technologies like AI in media professions.

Key finding: This narrative literature review categorizes coding skill development approaches into formal education (curricular), non-formal settings (online clubs), and informal events (hackathons). It emphasizes that coding is a... Read more
Key finding: Addressing the rise of source code plagiarism in educational environments, this paper analyzes underlying causes including students' lack of academic integrity understanding, systemic pressures, and easy access to online code... Read more
Key finding: This chapter articulates a conceptual framework positioning coding as a medium for creativity and critical thinking, contrasting dominant functionalist paradigms with emancipatory and interpretive approaches. It advances... Read more

All papers in Source Coding

We deal with zero-delay source coding of a vector-valued Gauss-Markov source subject to a mean-squared error (MSE) fidelity criterion characterized by the operational zero-delay vector-valued Gaussian rate distortion function (RDF). We... more
Code completion helps improve developers' programming productivity. However, the current support for code completion is limited to context-free code templates or a single method call of the variable on focus. Using software libraries for... more
The exact expression for the bit error rate (BER) of rectangular quadrature amplitude modulation (QAM) is given. The presented closed-form formula is independent of the bit mapping in use. It is thus particularly useful in the analysis of... more
When changing a source code entity (e.g., a function), developers must ensure that the change is propagated to related entities to avoid the introduction of bugs. Accurate change propagation is essential for the successful evolution of... more
In this paper, in order to compress and enhance 2D images transmitted over wireless channels, a new scheme called Kalman-Turbo (KT) is introduced. In this scheme, the original image is partitioned into 2 N quantization levels and each of... more
We present UZE-CC, a universal fixed-to-variable length coding scheme achieving zero-error communication over any discrete memoryless channel with positive zero-error capacity, under minimal one-bit-per-round feedback. The scheme is... more
We present USW-X, a universal Slepian–Wolf coding framework that achieves sequence-individual performance guarantees without assuming any probabilistic source model. USW-X combines rateless 2-universal hashing with a Minimum Description... more
We present EUC-X, a universal, bit-agnostic lossless compressor that operates with a constant per-step shock bound independent of sequence length. The method aggregates Krichevsky-Trofimov (KT) predictors across contexts 0..K via... more
Bit error rate (BER) performance of convolutional coded quaternary DPSK (QDPSK) with Viterbi decoding is theoretically investigated in Rayleigh fading environments. The probability density functions of the path and branch metric values of... more
In the context of multimedia wireless transmission, a link adaptation strategy is proposed, assuming that the source decoder may accept some remaining errors at the output of the channel decoder. Based on a mean bit error rate for... more
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or... more
We investigate distributed source coding of two correlated sources X and Y where messages are passed to a decoder in a cascade fashion. The encoder of X sends a message at rate R1 to the encoder of Y . The encoder of Y then sends a... more
The paper considers the source description problem with average distortion and per-symbol reproduction cost constraints. The source description cost-distortion function is then defined as the minimum of a weighted sum of the rate and the... more
This paper deals with a universal coding problem for a certain kind of multiterminal source coding network called a generalized complementary delivery network. In this network, messages from multiple correlated sources are jointly... more
Transmission of multimedia, especially transmission of image/video information, requires high reliability and high digital transmission speed. But, in case of mobile communication systems, since transmission reliability is degraded by... more
In avionics and other critical systems domains, adequacy of test suites is currently measured using the MC/DC metric on source code (or on a model in model-based development). We believe that the rigor of the MC/DC metric is highly... more
The fixed slope lossy algorithm derived from the kthorder adaptive arithmetic codeword length function is extended to the case of finite-state decoders or trellis-structured decoders. It is shown that when this algorithm is used to encode... more
This paper addresses the design of joint source-channel variable-length codes with maximal free distance for given codeword lengths. While previous design methods are mainly based on bounds on the free distance of the code, the proposed... more
Local File Inclusion (LFI) vulnerabilities continue to pose significant security risks to web applications despite being well-documented for over a decade. This paper presents a comprehensive analysis of LFI vulnerabilities, examining... more
We introduce an open-ended test grounded in algorithmic probability that can avoid benchmark contamination in the quantitative evaluation of frontier models in the context of their Artificial General Intelligence (AGI) and... more
We present an agnostic signal reconstruction method for zero-knowledge one-way communication channels in which a receiver aims to interpret a message sent by an unknown source about which no prior knowledge is available and to which no... more
Conventional cryptography deals with the encryption and decryption of traditional textual data. The advent of networked multimedia systems will make continuous media streams, such as real time audio and video, increasingly pervasive in... more
We study the joint source channel coding problem of transmitting an analog source over a Gaussian channel in two cases -(i) the presence of interference known only to the transmitter and (ii) in the presence of side information known only... more
Generative Artificial Intelligence (Gen AI) has revolutionized education by enabling personalized learning in computer programming, improving engagement and outcomes. Despite its potential, challenges like accuracy, coherence, and... more
An antidictionary code is a lossless compression algorithm using an antidictionary which is a set of minimal words that do not occur as substrings in an input string. The code was proposed by Crochemore et al. in 2000, and its asymptotic... more
In this paper, we analyze several adaptive and static data models of arithmetic compression. These models are represented by using Upgraded Petri net as our original class of the Petri nets. After the iterative processes of modeling,... more
In this paper, we analyze several adaptive and static data models of arithmetic compression. These models are represented by using Upgraded Petri net as our original class of the Petri nets. After the iterative processes of modeling,... more
Reliable image and video communications over noisy channels has been a great challenge especially for the transmission of large volume of data over unreliable and bandwidth limited channels. One technique to deal with this problem... more
Quantification of neuronal correlations in neuron population helps us to understand neural coding rules. Such quantification could also reveal how neurons encode information in normal and disease conditions like Alzheimer's and... more
This work provides an algebraic framework for source coding with decoder side information and its dual problem, channel coding with encoder side information, showing that nested concatenated codes can achieve the corresponding... more
In many scenarios, side information naturally exists in point-to-point communications. Although side information can be present in the encoder and/or decoder and thus yield several cases, the most important case that worths particular... more
Recent advances in multimedia technology opened the path for individual manipulation of the different audio objects within a multichannel mix, for both sampling and karaoke applications. This requires the transmission of these objects as... more
In this paper we derive limit theorems for the conditional distribution ofX1givenSn=snasn→ ∞, where theXiare independent and identically distributed (i.i.d.) random variables,Sn=X1+··· +Xn, andsn/nconverges orsn≡sis constant. We obtain... more
Multiterminal source coding refers to separate encoding and joint decoding of multiple correlated sources. Joint decoding requires all the messages to be decoded simultaneously which is exponentially more complex than a sequence of... more
Polar codes, introduced recently by Arıkan, are the first family of codes known to achieve capacity of symmetric channels using a low complexity successive cancellation decoder. Although these codes, combined with successive cancellation,... more
Nowadays, multiple receiver voting systems use several receivers so that at least one receiver always receives a high quality signal from portable and mobile units operating anywhere within the desired coverage area.
One of the most dangerous cybersecurity threats is control hijacking attacks, which hijack the control of a victim application, and execute arbitrary system calls assuming the identity of the victim program's effective user. System call... more
A web application is a program that is executed on the web server should be written in security awareness to prevent malicious data reaches sensitive sinks that cause different types of vulnerabilities and affect in the behaviour of web... more
Web applications play a very important role in many fields and become an integral part of the daily lives of millions of users to offer business and convenience services ,Most of the web applications increase their adoption of database... more
The multiterminal source coding problem has gained prominence recently due to its relevance to sensor networks. In general, rate-distortion in multiterminal source coding is an open problem. However, variations of multiterminal source... more
The multiterminal source coding problem has gained prominence recently due to its relevance to sensor networks. In general, rate-distortion in multiterminal source coding is an open problem. However, variations of multiterminal source... more
This paper considers the optimization of a class of joint source-channel codes described by finite-state encoders (FSEs) generating variable-length codes. It focuses on FSEs associated to joint source-channel integer arithmetic codes,... more
Notations absences et qui ont toujours trouver un mot d'encouragement. S'il est une chose que je peux affirmer, c'est d'avoir noué de belles amitiés partout où je suis allée, j'ai beaucoup de chance. Ed infine, ma non per importanza, ai... more
In this paper we propose a Multiple Description Image Coding(MDIC) scheme to generate two compressed and balanced rates descriptions in the wavelet domain (Daubechies biorthogonal (9, 7) wavelet) using pairwise correlating transform... more
During software development, design rules and contracts in the source code are often encoded through regularities, such as API usage protocols, coding idioms and naming conventions. The structural regularities that govern a program can... more
In this paper the relation between nonanticipative rate distortion function (RDF) and Bayesian filtering theory is further investigated on general Polish spaces. The relation is established via an optimization on the space of conditional... more
In this paper we analyze the probabilistic matching of sources with memory to channels with memory so that symbol-by-symbol code with memory without anticipation are optimal, with respect to an average distortion and excess distortion... more
Eight new properties of absolute moment block truncation coding (AMBTC) are presented with proof. The main purposes of this paper are twofold: (1) provide fundamental insights into the AMBTC algorithm and (2) show that AMBTC is the... more
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal... more
Download research papers for free!