The classification and separation of speech and music signals have attracted attention by many re... more The classification and separation of speech and music signals have attracted attention by many researchers. The purpose of the classification process is needed to build two different libraries: speech library and music library, from a stream of sounds. However, the separation process is needed in a cocktail-party problem to separate speech from music and remove the undesired one. In this paper, a review of the existing classification and separation algorithms is presented and discussed. The classification algorithms will be divided into three categories: time-domain, frequency-domain, and time-frequency domain approaches. The time-domain approaches used in literature are: the zero-crossing rate (ZCR), the short-time energy (STE), the ZCR and the STE with positive derivative, with some of their modified versions, the variance of the roll-off, and the neural networks. The frequency-domain approaches are mainly based on: spectral centroid, variance of the spectral centroid, spectral flux, variance of the spectral flux, roll-off of the spectrum, cepstral residual, and the delta pitch. The time-frequency domain approaches have not been yet tested thoroughly in literature; so, the spectrogram and the evolutionary spectrum will be introduced. Also, some new algorithms dealing with music and speech separation and segregation processes will be presented.
This paper is an exploration of how we do things with music—that is, the way that we use music as... more This paper is an exploration of how we do things with music—that is, the way that we use music as an “esthetic technology” to enact micro-practices of emotion regulation, communicative expression, identity construction, and interpersonal coordination that drive core aspects of our emotional and social existence. The main thesis is: from birth, music is directly perceived as an affordance-laden structure. Music, I argue, affords a sonic world, an exploratory space or “nested acoustic environment” that further affords possibilities for, among other things, (1) emotion regulation and (2) social coordination. When we do things with music, we are engaged in the work of creating and cultivating the self, as well as creating and cultivating a shared world that we inhabit with others. I develop this thesis by first introducing the notion of a “musical affordance”. Next, I look at how “emotional affordances” in music are exploited to construct and regulate emotions. I summon empirical research on neonate music therapy to argue that this is something we emerge from the womb knowing how to do. I then look at “social affordances” in music, arguing that joint attention to social affordances in music alters how music is both perceived and appropriated by joint attenders within social listening contexts. In support, I describe the experience of listening to and engaging with music in a live concert setting. Thinking of music as an affordance-laden structure thus reaffirms the crucial role that music plays in constructing and regulating emotional and social experiences in everyday life.
We report five experiments in which listeners heard the beginnings of classical minuets (or simil... more We report five experiments in which listeners heard the beginnings of classical minuets (or similar dances). The phrase in either measures 1-2 or measures 3-4 was selected as a target, tested at the end of the excerpt. A "beep" indicated the test item, which was a continuation of the minuet as written. Test items were targets (repetitions of the selected phrase), similar lures (imitations of targets), or different lures, and occurred after delays of 4-5, 15, or 30 s. We estimated the proportion of correct discriminations of targets from similar lures and targets from different lures. In Experiment 1, discrimination of targets from similar lures (but not of targets from different lures) improved between 5 and 15 s. Experiment 2 extended this result to a delay of 30 s. Discrimination of targets from similar lures improved over time, especially for second-phrase targets. This improvement was due mainly to decreasing false alarms to similar lures. Experiments 3 and 4 replaced the continuous music with silence and with a repetitive "oom-pah-pah" pattern, and the improvement in discrimination of targets from similar lures disappeared. Experiment 5 removed listeners' expectations of being tested, and the improvement also disappeared. Results are considered in the framework of current theories of memory, and their implications for the listener's experience of hearing music are discussed.
We address how listeners perceive temporal regularity in music performances, which are rich in te... more We address how listeners perceive temporal regularity in music performances, which are rich in temporal irregularities. A computational model is described in which a small system of internal self-sustained oscillations, operating at different periods with specific phase and period relations, entrains to the rhythms of music performances. Based on temporal expectancies embodied by the oscillations, the model predicts the categorization of temporally changing event intervals into discrete metrical categories, as well as the perceptual salience of deviations from these categories. The model's predictions are tested in two experiments using piano performances of the same music with different phrase structure interpretations (Experiment 1) or different melodic interpretations (Experiment 2). The model successfully tracked temporal regularity amidst the temporal fluctuations found in the performances. The model's sensitivity to performed deviations from its temporal expectations compared favorably with the performers' structural (phrasal and melodic) intentions. Furthermore, the model tracked normal performances (with increased temporal variability) better than performances in which temporal fluctuations associated with individual voices were removed (with decreased variability). The small, systematic temporal irregularities characteristic of human performances (chord asynchronies) improved tracking, but randomly generated temporal irregularities did not. These findings suggest that perception of temporal regularity in complex musical sequences is based on temporal expectancies that adapt in response to temporally fluctuating input.
In this paper I discuss four computational distinctions at the heart of natural computation, and ... more In this paper I discuss four computational distinctions at the heart of natural computation, and thus relevant to the central and most interesting question of cognitive science: "how the brain computes the mind". I assume that we can think of cognition as a form of computation, implemented by the tissues of the nervous system, and that the unification of high-level computational theories of cognitive function with detailed, local-level understanding of synapses and neurons is the core goal of cognitive (neuro)science. Thus I am concerned here with how the brain computes the mind, following Alan Turing's seminal gambit , and much of subsequent cognitive science, in thinking that intelligence is a kind of computation performed by the brain. By thus asserting that the brain is a kind of computer, I must immediately clarify that the natural computations performed by the brain differ dramatically from those implemented by modern digital computers . Computation (the acquisition, processing and transformation of information) is a more general process than the serial, binary computation performed by common digital computers. From this viewpoint, the assertion that the brain is a kind of computer is a mild one. It amounts to nothing but the everyday assumption that the brain is an organ responsible for acquiring, remembering, processing and evaluating sensory stimuli, and using the knowledge thus acquired to plan and generate appropriate action.
As the Gestalt psychologists knew, and as William James and Christian Ehrenfels already saw in th... more As the Gestalt psychologists knew, and as William James and Christian Ehrenfels already saw in the 1890s, a melody is a prime example of an integrated whole in perception and memory. Changing any aspect of a melody--its pitches, rhythm, timbre, tempo, harmony, even its articulation--has an impact on how the other aspects are perceived and remembered. This does not mean that we can't analyze a melody in terms of its features. It just means we need to be cautious in drawing conclusions about the effects of those features, and be aware of the interactions operating among our variables. With that in mind we can arrive at a very good model of the relative contributions of the various analytic features of melodies.
Neuropsycliological studies have suggested that inugery processes may be mediated by neuronal mec... more Neuropsycliological studies have suggested that inugery processes may be mediated by neuronal mechanisms similar to those used in perception.To test this hypothesis,and to explore the neuml basis for song imagery, 12 normal subjects were scanned using the water bolus method to measure ccrebral blood flow (CHF) during the performance of three tasks. In the control condition subjects saw pairs of words on each triiil ;itid judged which word was longer. In the perceptual condition subjects also viewed pairs of words, this time drawn from ;I familiar song; simultaneously they heard the corresponding song, and their task was to judge the change in pitch of the two cued words within the song. In the imagery condition. subjects performed precisely tlie same judgment as in the perceptual condition, but with no auditory input. Thus, to perform the imagery task correctly an internal auditory representation must be accessed. Paired-image subtraction of the resulting pattern o f CRE together with matched MRI for ;ma-Summary of paradigm, showing stimuli presented and responses elicited during each of the three experimental conditions. Note that all three tasks involved similar visual input, but only the perceptual task involved true auditory input. Both perceptual and imagery tasks required the identical judgment of pitch change as cued by the visual words. Note 1 . Results reported in this paper are for all 12 participants. A separate analysis of the PET activation data was also undertaken for the nine subjects who performed best on the imagery task (mean performance of this subgroup was 82% correct), in an attempt to remove any noise that might be contributed by those whose performance was poor, and who therefore might not have been performing the task as intended. This analysis revealed a pattern of results virtually identical to that of the group as a whole, and therefore will not be discussed further.
Uploads
Papers by Mind W Melody