And now for something completely different

Vincent Van Heuven

Outline

Business and Management

Marketing

And now for something completely different

Vincent Van Heuven

2020

visibility

…

description

1 page

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

I will discuss a recent court case in the Netherlands, in which forensic phonetic expertise was called upon to help settle a dispute over trade name infringement. In 2014, Dutch brewer Grolsch launched a beer called Kornuit /kɔrˈnoeyt/. Recently, supermarket chain Lidle released a beer under the name Kordaat /kɔrˈda:t/. I was asked by Grolsch to shed light on the phonetic similarity between the brand names. Using the Levenshtein distance metric , the phonetic difference between the names is 29 percent. To show that the similarity between the brand names was very likely to be intentional rather than accidental (as Lidle would have it), I established the statistical distribution of the similarity of Dutch word pairs. I selected the 3000 most frequent monomorphemic content words from and computed the Levenshtein distance for all 4,498,500 non-identical word pairs (using Gabmap software, . Distances ≤ 29% occur in .5 percent of the word pairs, which arguably shows that the name Kordaat was not accidentally chosen by Lidl. In my talk I will explain the Levenshtein metric and motivate the decisions made to obtain the distribution of distances between Dutch word pairs.

Evan G Cohen

2022

Many researches have studied the similarity between languages (e.g. Eden 2018; Crowley and Bowern, 2010; Longobardi and Guardiano, 2009, 2017), but there is no research which quantifies the similarity between languages. The final goal of this study is to examine whether similarity can be measured and quantified using the scales of the acoustical prominence of several phonetic and phonological properties, while merging them into one universal scale of prominence. However, since there is no research in which similarity is measured by phonetic and phonological features alone, the goal of my thesis was to examine which features should be placed in this scale in the first place. This study contains two experiments, a preliminary one and a main one. In the preliminary experiment, 132 Hebrew speakers rated their familiarity level with each of the 35 languages that appeared in the main experiment. In the main experiment, 362 Hebrew speakers listened to 20 sets of three recordings, a base language and two additional languages, and were asked which of the two additional languages was more similar to the base language. The similarity was determined by the number of the shared features between the base language and the other language, and the features (a total of 41) were taken mostly from the World Atlas of Language Structures Online (WALS) and from Bradlow et al. (2010). One of the additional languages shared more features with the base language (the similar language) and the other language shared fewer features with it (the dissimilar language). The results showed a significant inclination to choose the more similar language over the dissimilar one. These findings suggest that the similarity can be measured by phonetic and phonological features. However, we know that not all features were created equal; thus, this model can be upgraded by weighting the features, so that more prominent features v will have more weight in similarity quantification. I leave the weighting of the features for future research.

downloadDownload free PDF View PDFchevron_right

Søren Wichmann

Language Dynamics and Change, 0

Previous work using lexical data from around the world has suggested that distances among language varieties are distributed such that varieties are typically either rather similar, qualifying as dialects of one another, or rather dissimilar, qualifying as different languages, with a scarcity of varieties that are around halfway similar. Wichmann (2019) observed that there is a bimodal distribution of distances with two roughly normal distributions separated by a valley. The previous work was based on a database mostly containing either descriptions of single languages or surveys covering several close varieties, so the bimodal distribution could potentially be an artifact of the underlying sample. Here we test whether a similar distribution is found when using another source of data and an unbiased sample drawn from the cells of a geographical grid (of Central Europe). The data consists of 18 lexemes from 274 doculects. Using Bayesian Beta regression and leave-one-out crossvalidation, we show that the data follows a bimodal distribution which is robust to sampling, and also to at least some aspects of the data (coarse-vs. fine-grained phonetic transcriptions).

downloadDownload free PDF View PDFchevron_right

Measures of lexical distance between languages

Dieter Peters

The idea of measuring distance between languages seems to have its roots in the work of the French explorer Dumont D'Urville (1832) [13]. He collected comparative word lists for various languages during his voyages aboard the Astrolabe from 1826 to 1829 and, in his work concerning the geographical division of the Pacific, he proposed a method for measuring the degree of relation among languages. The method used by modern glottochronology, developed by Morris Swadesh in the 1950s, measures distances from the percentage of shared cognates, which are words with a common historical origin. Recently, we proposed a new automated method which uses the normalized Levenshtein distances among words with the same meaning and averages on the words contained in a list. Recently another group of scholars, Bakker et al. (2009) [8] and Holman et al. (2008) [9], proposed a refined version of our definition including a second normalization. In this paper we compare the information content of our definition with the refined version in order to decide which of the two can be applied with greater success to resolve relationships among languages.

downloadDownload free PDF View PDFchevron_right

Statistical Problems and Solutions in Onomastic Research - Exemplified by a Comparison of Given Name Distributions in Germany Throughout the 20th Century

Gert Wagner

SSRN Electronic Journal, 2000

This series presents research findings based either directly on data from the German Socio-Economic Panel Study (SOEP) or using SOEP data as part of an internationally comparable data set (e.g. CNEF, ECHP, LIS, LWS, CHER/PACO). SOEP is a truly multidisciplinary household panel study covering a wide range of social and behavioral sciences: economics, sociology, psychology, survey methodology, econometrics and applied statistics, educational science, political science, public health, behavioral genetics, demography, geography, and sport science.

downloadDownload free PDF View PDFchevron_right

Perceptive evaluation of Levenshtein dialect distance measurements using Norwegian dialect data1

Charlotte Gooskens

2004

The Levenshtein dialect distance method has proven to be a successful method for measuring phonetic distances between Dutch dialects. The aim of the present investigation is to validate the Levenshtein dialect distance with perceptual data from a language area other than the Dutch, namely Norway. We calculate the correlation between the Levenshtein distances and the distances between 15 Norwegian dialects as judged by Norwegian listeners. We carry out this analysis to see the degree to which the average Levenshtein distances correspond to the psychoacoustic perception of the speakers of the dialects.

downloadDownload free PDF View PDFchevron_right

Wichmann, Søren, Eric W. Holman, Dik Bakker, and Cecil H. Brown. 2010. Evaluating linguistic distance measures. Physica A. 389: 3632-3639 (doi:10.1016/j.physa.2010.05.011)

Søren Wichmann

2010

In Ref. , Petroni and Serva discuss the use of Levenshtein distances (LD) between words referring to the same concepts as a tool for establishing overall distances among languages which can then subsequently be used to derive phylogenies. The authors modify the raw LD by dividing the LD by the length of the longer of the two words compared, to produce what could be called LDN (normalized LD). Other scholars have used a further modification, where they divide the LDN by the average LDN among words not referring to the same concept. This produces what could be called LDND. The authors of Ref.

downloadDownload free PDF View PDFchevron_right

Morphological predictability and acoustic duration of interfixes in Dutch compounds

Mirjam Ernestus

The Journal of the Acoustical Society of America, 2007

Abstract This study explores the effects of informational redundancy, as carried by a word's morphological paradigmatic structure, on acoustic duration in read aloud speech. The hypothesis that the more predictable a linguistic unit is, the less salient its realization, was tested on the basis of the acoustic duration of interfixes in Dutch compounds in two datasets: One for the interfix -s-(1155 tokens) and one for the interfix -e(n)-(742 tokens). Both datasets show that the more probable the interfix is, given the compound and its constituents, the longer it is realized. These findings run counter to the predictions of informationtheoretical approaches and can be resolved by the Paradigmatic Signal Enhancement Hypothesis. This hypothesis argues that whenever selection of an element from alternatives is probabilistic, the element's duration is predicted by the amount of paradigmatic support for the element: The most likely alternative in the paradigm of selection is realized longer.

downloadDownload free PDF View PDFchevron_right

Applying the Levenshtein Distance to Catalan dialects

Esteve Clua

2012

In recent years, dialectometry has gained interest among Catalan dialectologists. As a consequence, a specific dialectometric approach has been developed at the University of Barcelona, which aims at increasing the accuracy of final groupings by means of discriminating the predictable components of the language from its unpredictable ones. Another popular method to obtain dialect distances is the Levenshtein Distance (LD) which has never been applied to a Catalan corpus so far. The goal of this paper is to present the results of applying the LD to a corpus of Catalan linguistic data, and to compare the results from this analysis both with the results from Barcelona and the traditional classifications of Catalan dialectology.

downloadDownload free PDF View PDFchevron_right

Utilizing Phonetic Similarity for Cross-source and Cross-language Toponym Matching - a Benchmark and Prototype

Sinai Rusinek

Research Square (Research Square), 2024

The writings of one ancient civilization often overlap in time and space with others. Many of these sources comprise unstructured text in ancient languages, causing scholars studying these civilizations to be siloed, often relying on sources in specific languages. Most recent efforts to extract structured information from historical scripts into place (toponym) and people databases (prospographies) have followed this pattern, focusing on one civilization and selected sources. The path to creating a common database runs through aligning names or toponyms between sources from disparate languages utilizing different scripts. Existing multilingual orthographic (string-based) comparison often relies on transliteration to a common script (Latin/English). Transliteration often creates multiple options and even more confusion. However, when integrating sources that overlap in space and time, the languages often share a common phonetic background. This commonality may prove beneficial. In this work, we present a benchmark for comparing toponyms from two linguistically and culturally related languages, namely Hebrew and Arabic. We provide a benchmark comprised of a set of dataset pairs created from historical sources written in Medieval variants of these languages, later historical Gazetteers and a modern dataset curated from Wikidata. We empirically evaluate several toponym comparison approaches over the benchmark: transliteration to a common script, direct transliteration, and phonetic comparison using a common phonetic representation. We discuss the results and the limitations of the various methods and outline future work.

downloadDownload free PDF View PDFchevron_right

Fumito Tsuchiya

Procedia Computer Science, 2014

Drug name similarity is one of major reasons of medical accidents. In order to prevent from the accidents, one of the best ways is to avoid approving drugs that has the names similar to that of existing drugs. It is well-known that there are two kinds of drug name similarity, look-alikeness and sound-alikeness. Nabeta et. al. proposed a look-alikeness similarity index,which excludes the sound-alikeness. Though, in Japan, oral prescription is basically prohibited, emergent situation can force a doctor to prescribe orally. In such a situation, medical accidents can occur.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (4)

Baayen, R. H., Piepenbrock, R. & Gulikers, L. (1995). CELEX2 LDC96L14. Web Download. Philadelphia: Linguistic Data Consortium.
Heeringa, W. J. (2004). Measuring dialect pronunciation differences using Levenshtein distance. Doctoral dissertation, University of Groningen.
Leinonen, T., Çöltekin, Ç. & Nerbonne, J. (2016). Using Gabmap. Lingua, 178, 71-83.
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707-710.

Kamil Stachowski

Studia Linguistica Universitatis Iagellonicae …, 2011

This paper argues that automatic phonetic comparison will only return true results if the languages in question have similar and comparably lenient phonologies. In the situation where their phonologies are incompatible and / or restrictive, linguistic knowledge of both of them is necessary to obtain results matching human perception. Whilst the case is mainly exemplified by Levenshtein distance and Russian loanwords in Dolgan, the conclusion is also applicable to the approach as a whole.

downloadDownload free PDF View PDFchevron_right

tina M lowrey

2012

for comments on the manuscript.

downloadDownload free PDF View PDFchevron_right

Brand Name Confusion: Subjective and Objective Measures of Orthographic Similarity

Sarah Kelly

Journal of experimental psychology. Applied, 2017

Determining brand name similarity is vital in areas of trademark registration and brand confusion. Students rated the orthographic (spelling) similarity of word pairs (Experiments 1, 2, and 4) and brand name pairs (Experiment 5). Similarity ratings were consistently higher when words shared beginnings rather than endings, whereas shared pronunciation of the stressed vowel had small and less consistent effects on ratings. In Experiment 3 a behavioral task confirmed the similarity of shared beginnings in lexical processing. Specifically, in a task requiring participants to decide whether 2 words presented in the clear (a probe and a later target) were the same or different, a masked prime word preceding the target shortened response latencies if it shared its initial 3 letters with the target. The ratings of students for word and brand name pairs were strongly predicted by metrics of orthographic similarity from the visual word identification literature based on the number of shared l...

downloadDownload free PDF View PDFchevron_right

Phonetic Distance Between Dutch Dialects

Erik Hout

CLIN VI, Papers from …, 1996

Traditional dialectology relies on identifying language features which are common to one dialect area while distinguishing it from others. It has difficulty in dealing with partial matches of features and with nonoverlapping language patterns. This paper applies Levenshtein distance-a measure of string distance-to pronunciations to overcome both of these difficulties. Partial matches may be quantified, and nonoverlapping patterns may be included in weighted averages of phonetic distance. The result accords with traditonal dialectology to a satisfying degree.

downloadDownload free PDF View PDFchevron_right

Evaluating linguistic distance measures

Søren Wichmann

Physica A: Statistical Mechanics and its Applications, 2010

... Levenshtein Distance Normalized Divided'). The paper is a reply to Ref. [13], where the authors also discuss the two versions, claiming that LDN may be a more adequatemeasure of linguistic distance than LDND. Our reply seeks to ...

downloadDownload free PDF View PDFchevron_right

Name phylogeny: a generative model of string variation

Nicholas Andrews

Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012

Many linguistic and textual processes involve transduction of strings. We show how to learn a stochastic transducer from an unorganized collection of strings (rather than string pairs). The role of the transducer is to organize the collection. Our generative model explains similarities among the strings by supposing that some strings in the collection were not generated ab initio, but were instead derived by transduction from other, "similar" strings in the collection. Our variational EM learning algorithm alternately reestimates this phylogeny and the transducer parameters. The final learned transducer can quickly link any test name into the final phylogeny, thereby locating variants of the test name. We find that our method can effectively find name variants in a corpus of web strings used to refer to persons in Wikipedia, improving over standard untrained distances such as Jaro-Winkler and Levenshtein distance.

downloadDownload free PDF View PDFchevron_right

Perceptive evaluation of Levenshtein dialect distance measurements using Norwegian dialect data

Charlotte Gooskens

Language Variation and Change, 2004

The Levenshtein dialect distance method has proven to be a successful method for measuring phonetic distances between Dutch dialects. The aim of the present investigation is to validate the Levenshtein dialect distance with perceptual data from a language area other than the Dutch, namely Norway. We calculate the correlation between the Levenshtein distances and the distances between 15 Norwegian dialects as judged by Norwegian listeners. We carry out this analysis in order to see the degree to which the average Levenshtein distances correspond to the psycho-acoustic perception of the speakers of the dialects.

downloadDownload free PDF View PDFchevron_right

The Place of Forensic Linguistics in the Resolution of Trademark Conflicts: Case of DOUBLEMINT & DOUBIEMLNT

帕特 Patrick S A D I MAKANGILA

International Journal of Applied Linguistics & English Literature, 2021

Forensic linguistics focusing on word choice and spelling, it can be useful while resolving language crime, trademark infringement, and so forth. In our days, trademarks are one of the most infringed intellectual properties in the world in terms of values. Trademark could be a single word, a combination of words and symbols, design, or logo that distinguishes a company or products from others in the industry. When someone acquires a registered trademark, he is granted an exclusive right to its usage and it strongly prohibits other organizations from using it. This paper shows the way an expert in Forensic Linguistics should use his skill and knowledge to handle the conflict among similar trademarks. From brand name (how it is written, upper-cases or lower-cases, how many letters make this brand name, how it sounds, how it looks like, and so forth) to logo (design, usage of colors, sharp and so forth). The expert in Forensic Linguistics will try to find out scientific evidence that may help judges in decision-making. The present study scrutinized the place of forensic linguistics in the resolution of trademark conflicts, the scientific techniques, and methodologies utilized to analyze the similarities and differences between the trademarks in conflict. This research showed the importance of associating an expert in Forensic Linguistics in the Community Trademark conflicts in order to come up with a conclusion based on scientific evidence; the place of forensic linguistics and other related disciplines in revolving the issues of trademark infringement.

downloadDownload free PDF View PDFchevron_right

Valls et al. (2012): Applying the Levenshtein Distance to Catalan Dialects: A brief comparison of two dialectometric approaches

John Nerbonne, Maria-Rosa Lloret, Esteve Clua

Verba. Anuario galego de filoloxía 39: 35-61, 2012

"In recent years, dialectometry has gained interest among Catalan dialectologists. As a consequence, a specific dialectometric approach has been developed at the University of Barcelona, which aims at increasing the accuracy of final groupings by means of discriminating the predictable components of the language from its unpredictable ones. Another popular method to obtain dialect distances is the Levenshtein distance (LD) which has never been applied to a Catalan corpus so far. The goal of this paper is to present the results of applying the LD to a corpus of Catalan linguistic data, and to compare the results from this analysis both with the results from Barcelona and the traditional classifications of Catalan dialectology."

downloadDownload free PDF View PDFchevron_right

Inducing a measure of phonetic similarity from pronunciation variation

John Nerbonne

2012

Structuralists famously observed that language is “un systême oû tout se tient”(Meillet, 1903, p. 407), insisting that the system of relations of linguistic units was more important than their concrete content. This study attempts to derive content from relations, in particular phonetic (acoustic) content from the distribution of alternative pronunciations used in different geographical varieties.

downloadDownload free PDF View PDFchevron_right

And now for something completely different

Sign up for access to the world's latest research

Abstract

Related papers

References (4)

Related papers

Related topics