GX202411 reconstruction for Tangut phonology by Xun Gong
Tangut Pronunciation Database — 西夏文拟音数据库
2021-2025 by Xun Gong

krent käṣṣintse - Tocharian Studies in Honor of Douglas Q. Adams, 2025
(Note: this is a draft version and contains some errors; updated version will be uploaded after p... more (Note: this is a draft version and contains some errors; updated version will be uploaded after publication.)
The nasal preinitial hypothesis (Gong 2021) leaves Tangut R.21 (GHC: {-jaa}) as an unresolved anomaly. This essay argues, based primarily on Chinese transcription evidence, that R.21 represents a distinct final-aw, phonologically /-aɣ/. The proposed phonological history for this rime, etymologically comparable to Japhug Rgyalrong-aʁ, suggests the presence of a Pre-Tangut palatal element-j-and points to a more convoluted fate of Proto-Tangut-ʁ than previously assumed in the literature. This reinterpretation draws attention to an archaic layer of Chinese loanwords within R.21: 𘊹 kaw¹ for 脚 jiǎo (OC *k(r)ak), 𗁞 tśhaw² for 尺 chǐ (OC *tʰAk), and 𘘔 dźaw¹ for 石 shí (OC *dAk). These Tangut forms notably preserve the vocalism and coda (as /-ɣ/) of Old Chinese rhyme group 鐸 *-ak, thus predating the widespread vowel fronting that produced Middle Chinese *-ek. A comparative study puts the Tangut forms as part of an older borrowing layer-one that retains features of OC *-ak, potentially reflecting a spirantized coda (/-aɣ/). across languages of Central Eurasia: Toch. B cāk*, Sogdian š'γ, and Bactrian σαγο /tsaɣᵘ/ from 石 shí; Toch. B cak* and Khotanese chā for 尺 chǐ. Further east, this pattern is echoed in Middle Korean cáh and Japanese saka for 尺 chǐ. In contrast, a younger stratum exhibits signs of vowel fronting and subsequent raising, as seen in Old Uyghur šïq, čïq, and Tibetan sheg. The distinct phonological shapes provide a solid starting point for tracing the historical and geographical trajectories by which Chinese units of measure spread. Sometime this transmission occurred as part of Chinese state administrative technology; at other times, it was shaped by more intricate contact dynamics, particularly along the northwestern edges of the Sinosphere.

Bulletin of Chinese Linguistics, 2025
This essay proposes the following revised tone values for Tangut monosyllables: Tone 1 (𗗔 nye¹ = ... more This essay proposes the following revised tone values for Tangut monosyllables: Tone 1 (𗗔 nye¹ = píng) as a falling tone (HL) and Tone 2 (𗨁 phu² = shàng) as a mid-level tone (M). While largely consistent with the established "1 = H, 2 = L" theory (Nishida 1964; Arakawa 1999), these contour values better account for the available evidence. (1) Regular tonal correspondences are identified between Tangut monosyllables and Northern Horpa (= Shàngzhài, West Rgyalrongic). The HL:M contrast in the latter aligns with the broader areal typology of "polysyllabic-type" Rgyalrong languages. (2) The use of Chinese tone labels by native Tangut scholars could plausibly refl ect the tone values of a spoken variety of Late Northwestern Middle Chinese, analogous to Sino-Japanese kan'on readings and the Middle Japanese shōten notation. (3) This revision addresses the "Tangut tone paradox," wherein tonal correspondences between Tangut and the Rma and Prinmi languages (Sims 2020) appear inverted. The proposed falling contour for Tone 1 allows for a plausible scenario of tone reversal in Tangut and Northern Horpa, potentially representing a shared innovation.

Archiv Orientalní, 2021
Gong Hwang-cherng proposed that the Tangut language has a distinction between short and long vowe... more Gong Hwang-cherng proposed that the Tangut language has a distinction between short and long vowels. To date, however, no reliable correlates have been found regarding the actual phonological nature of the distinction. A careful examination of Chinese loanwords in Tangut and Sino-Tangut pronunciation reveals that the "vowel length" distinction should be revised to that of the presence vs. absence of a nasal preinitial. The pair 𗻍₃₈₀₆ "weed" vs. 𗽰₂₁₃₈ "tomb," borrowed respectively from Chinese 蒲 bu and 墓 muH (the latter from a Northwest-type reflex with *mb-), hitherto reconstructed as buʶ¹ {bu¹} vs. buuʶ² {buu²}, should be revised to buʶ¹ vs. mbuʶ². The reconstructed nasal preinitial not only has a close typological parallel in Modern West Rgyalrongic, but is equally reflected in other sources of evidence, most strikingly Sanskrit transcription and fǎnqiè. The revision solves a large number of problems in the historical phonology of Tangut, though not without raising some new ones, especially in connection with the treatment of Proto-Rgyalrongic preinitials before nasals.
2016-2020 by Xun Gong

Language and Linguistics, 2020
Tangut, a mediaeval Qiangic language (Sino-Tibetan family) has a distinction of three grades (děn... more Tangut, a mediaeval Qiangic language (Sino-Tibetan family) has a distinction of three grades (děng 等). The traditional Sofronov-Gong reconstruction of this distinction supposes different degrees of medial yod: Grade I {-Ø-}, Grade II {-i-}, Grade III {-j-}. The yods, however, are not supported by the transcriptional evidence.
Based on cognates between Tangut and Rgyalrongic languages, this study proposes the uvularization hypothesis: Tangut syllables have contrastive uvularization. Grade I/II syllables are uvularized, while Grade III syllables are plain. For phonological velars, uvularized syllables trigger a uvular allophone, while plain syllables trigger a velar allophone.
Tangut uvularization is an instance of a common typological feature in Qiangic languages, that of guttural secondary vocalic articulations (GSVA), variously termed uvularization, velarization, tenseness or Retracted Tongue Root (RTR). Recognizing Tangut grades as a case of Qiangic GSVA has far-ranging potential consequences for Sino-Tibetan comparative linguistics.

How many vowels are there in Lhasa Tibetan?
Linguistics of the Tibeto-Burman Area, 2020
Lhasa Tibetan is described in a number of independent research traditions which give different ac... more Lhasa Tibetan is described in a number of independent research traditions which give different accounts of its phonology. To what extent do these discrepancies reflect real dialectal or idiolectal differences? To what extent do they reflect different analyses of the same system?
In this paper, we examine one aspect of Lhasa Tibetan phonology on which different descriptions show substantial discrepancies: vowels. Different descriptions of Lhasa Tibetan transcribe from 8 to more than 16 vowel qualities, ascribing to them different degrees of phonemicity. A detailed comparison of the transcription systems shows that all reflect the same underlying system of 12 vowel sounds, which agrees with the transcription conventions of the Seattle Tibetanists. The discrepancies among the systems mostly concern four vowels, namely ɔ, ə, ɪ and ʊ. These vowels, which started as allophonic variants of other vowels, later appear in a set of words which cannot be explained as allophony, and hence are unambiguous phonemes in contemporary Lhasa Tibetan.
Folia Linguistica Historica, 2020
This paper proposes that Tangut should be classified as a West Gyalrongic language in the Sino-Ti... more This paper proposes that Tangut should be classified as a West Gyalrongic language in the Sino-Tibetan/Trans-Himalayan family. We examine lexical commonalities, case marking, partial reduplication, and verbal morphology in Tangut and in modern West Gyalrongic languages, and point out nontrivial shared innovations between Tangut and modern West Gyalrongic languages. The analysis suggests a closer genetic relationship between Tangut and Modern West Gyalrongic than between Tangut and Modern East Gyalrongic.This paper is the first study that tackles the exact linguistic affiliation of the Tangut language based on the comparative method.

Journal of Language Relationship, 2019
While consonant clusters, taken broadly to include presyllables, are commonly hypothesized for Ol... more While consonant clusters, taken broadly to include presyllables, are commonly hypothesized for Old Chinese, little direct evidence is available for establishing the early forms of specific words.
A number of Vietnamese words borrowed from Chinese have initial consonant lenition in Vietnamese, which corresponds to presyllables in conservative Vietic languages, e.g.: Chinese 劍 kiæmH “sword” is borrowed as Vietnamese 劍 gươm [ɣ-] and Rục təkɨəm. Baxter and Sagart (2014) understand such words as reflecting Old Chinese preinitials, relying on conservative Vietic languages for the identity of the preinitial.
This essay examines a hitherto overlooked source: Old Vietnamese, a language attested in the single document 佛說大報父母恩重經 Phật thuyết Đại báo phụ mẫu ân trọng kinh (Nguyễn Ngọc San 1982, Shimizu 1996, Hoàng Thị Ngọ 1999), which writes certain words, monosyllabic in modern Vietnamese, in an orthography suggesting sesquisyllabic phonology, e.g. rắn ‘snake’ is written 破散 (phá tản < phaH sanX), cf. Rục pəsiɲ³. For a number of words, Old Vietnamese provides the only testimony of the form of the Vietic borrowing.
The small list of currently known sesquisyllabic words of Chinese origin attested in this document includes examples of both words with a secure initial Chinese cluster and words with plausible Vietic prefixation. On the one hand, we find the word *s–kương ‘mirror’ borrowed from Chinese 鏡 kiæŋH. Here, the Vietnamese *s- is corroborated by its morphological derivation, which is an instance of the Sino-Tibetan instrumental deverbal
*sV-. On the other hand, for the word 阿唱 *ʔ–ɕướng ‘to chant’, likely borrowed from Middle Chinese 唱 tɕhaŋH, the Old Vietnamese form could reflect a dummy prefix *ʔ- (Section 4) that exists in other Vietic languages.
Fifth Workshop on Sino- Tibetan Languages of Southwest China, Tianjin, Nankai University, 2019
52nd International Conference on Sino-Tibetan Languages and Linguistics (ICSTLL#52), Sydney, University of Sydney., 2019
25 June 2019, ‘xSG→3-type direct marking in Rgyalrongic and Tangut’. 52nd International Conferenc... more 25 June 2019, ‘xSG→3-type direct marking in Rgyalrongic and Tangut’. 52nd International Conference on Sino-Tibetan Languages and Linguistics (ICSTLL#52), Sydney, University of Sydney.
51st International Conference on Sino-Tibetan Languages and Linguistics (ICSTLL#51), 2018
51st International Conference on Sino-Tibetan Languages and Linguistics (ICSTLL#51), 2018
Colloquium ‘Tangoutologie’, Arras, Université d’Artois – Arras / ENS Ulm, 2018
23 November 2018. ‘A hypothesis on the nature of Tangut i/ə stem alternation’. Colloquium ‘Tangou... more 23 November 2018. ‘A hypothesis on the nature of Tangut i/ə stem alternation’. Colloquium ‘Tangoutologie’, Arras, Université d’Artois – Arras / ENS Ulm.
Encyclopedia of Chinese Language and Linguistics, 2017
Consonant clusters like *pr-, *sn- and *-ks are postulated by various scholars for Old Chinese (O... more Consonant clusters like *pr-, *sn- and *-ks are postulated by various scholars for Old Chinese (OC). The debate about their existence and inventory runs through the modern history of Chinese historical phonology and remains the most thorny and interesting aspect of the field...
Workshop “Recent Advances in Tangut Studies”, 2017

SCRIPTA, 2017
Logographic writing systems for morphologically rich languages bring into
sharp relief the inher... more Logographic writing systems for morphologically rich languages bring into
sharp relief the inherent tension in all human writing systems between the
lexico-morphemic and the phonetic tendencies. When the same lexical root
has, by allomorphy, several different phonetic forms, the writing system
cannot avoid either different graphemes for the same root or different
readings for the same grapheme.
This essay examines the case of verb stem alternation in Tangut, an extinct
Sino-Tibetan language mainly attested from the 11th to the 13th century,
with a syllabic and logographic writing system modeled after the Chinese. It is
shown that both the form and the distribution of the stems are closely related
to Rgyalrongic languages, especially Zbu Rgyalrong and Gexi Horpa.
This essay proposes a typology of three possible strategies of logographic
writing systems to represent root allomorphy, under which the Tangut case
is analyzed: underdifferentiation, with all phonetic forms represented by the
same grapheme, overdifferentiation, with different phonetic forms represented
by different graphemes as if they are different roots, and categoriography,
with systematic means of distinguishing between different allomorphs of the
same root. Tangut predominantly prefers overdifferentiation, followed by an
incipient form of categoriography. The Tangut orthography stands in stark
contrast to other logographic writing systems (Sumerian, Chinese, Japanese
and Middle Iranian), where underdifferentiation is predominant, followed by
categoriography. The highly deviant nature of Tangut script is hypothesized
as resulting from the imitation of the mature Chinese script, where one
character has ideally one single reading.
Cahiers de Linguistique Asie Orientale, 2016
Bulletin of the School of Oriental and African Studies, Jun 2016
In this study, a reconstruction is offered for the phonetic evolution of rhymes from Old Tibetan ... more In this study, a reconstruction is offered for the phonetic evolution of rhymes from Old Tibetan to modern-day Amdo Tibetan dialects. e relevant sound changes are proposed, along with their relative chronological precedence and the dating of some specific changes. Most interestingly, although Amdo Tibetan, identically to its ancestor Old Tibetan, does not have phonemic length, this study shows that Amdo Tibetan derives from an intermediate stage which, like many other Tibetan dialects, does make the distinction.
Fourth Work- shop on Sino-Tibetan Languages of Southwest China, 2016
Uploads
GX202411 reconstruction for Tangut phonology by Xun Gong
Database: https://github.com/semakosa/tangut-pronunciation-db
2021-2025 by Xun Gong
The nasal preinitial hypothesis (Gong 2021) leaves Tangut R.21 (GHC: {-jaa}) as an unresolved anomaly. This essay argues, based primarily on Chinese transcription evidence, that R.21 represents a distinct final-aw, phonologically /-aɣ/. The proposed phonological history for this rime, etymologically comparable to Japhug Rgyalrong-aʁ, suggests the presence of a Pre-Tangut palatal element-j-and points to a more convoluted fate of Proto-Tangut-ʁ than previously assumed in the literature. This reinterpretation draws attention to an archaic layer of Chinese loanwords within R.21: 𘊹 kaw¹ for 脚 jiǎo (OC *k(r)ak), 𗁞 tśhaw² for 尺 chǐ (OC *tʰAk), and 𘘔 dźaw¹ for 石 shí (OC *dAk). These Tangut forms notably preserve the vocalism and coda (as /-ɣ/) of Old Chinese rhyme group 鐸 *-ak, thus predating the widespread vowel fronting that produced Middle Chinese *-ek. A comparative study puts the Tangut forms as part of an older borrowing layer-one that retains features of OC *-ak, potentially reflecting a spirantized coda (/-aɣ/). across languages of Central Eurasia: Toch. B cāk*, Sogdian š'γ, and Bactrian σαγο /tsaɣᵘ/ from 石 shí; Toch. B cak* and Khotanese chā for 尺 chǐ. Further east, this pattern is echoed in Middle Korean cáh and Japanese saka for 尺 chǐ. In contrast, a younger stratum exhibits signs of vowel fronting and subsequent raising, as seen in Old Uyghur šïq, čïq, and Tibetan sheg. The distinct phonological shapes provide a solid starting point for tracing the historical and geographical trajectories by which Chinese units of measure spread. Sometime this transmission occurred as part of Chinese state administrative technology; at other times, it was shaped by more intricate contact dynamics, particularly along the northwestern edges of the Sinosphere.
2016-2020 by Xun Gong
Based on cognates between Tangut and Rgyalrongic languages, this study proposes the uvularization hypothesis: Tangut syllables have contrastive uvularization. Grade I/II syllables are uvularized, while Grade III syllables are plain. For phonological velars, uvularized syllables trigger a uvular allophone, while plain syllables trigger a velar allophone.
Tangut uvularization is an instance of a common typological feature in Qiangic languages, that of guttural secondary vocalic articulations (GSVA), variously termed uvularization, velarization, tenseness or Retracted Tongue Root (RTR). Recognizing Tangut grades as a case of Qiangic GSVA has far-ranging potential consequences for Sino-Tibetan comparative linguistics.
In this paper, we examine one aspect of Lhasa Tibetan phonology on which different descriptions show substantial discrepancies: vowels. Different descriptions of Lhasa Tibetan transcribe from 8 to more than 16 vowel qualities, ascribing to them different degrees of phonemicity. A detailed comparison of the transcription systems shows that all reflect the same underlying system of 12 vowel sounds, which agrees with the transcription conventions of the Seattle Tibetanists. The discrepancies among the systems mostly concern four vowels, namely ɔ, ə, ɪ and ʊ. These vowels, which started as allophonic variants of other vowels, later appear in a set of words which cannot be explained as allophony, and hence are unambiguous phonemes in contemporary Lhasa Tibetan.
A number of Vietnamese words borrowed from Chinese have initial consonant lenition in Vietnamese, which corresponds to presyllables in conservative Vietic languages, e.g.: Chinese 劍 kiæmH “sword” is borrowed as Vietnamese 劍 gươm [ɣ-] and Rục təkɨəm. Baxter and Sagart (2014) understand such words as reflecting Old Chinese preinitials, relying on conservative Vietic languages for the identity of the preinitial.
This essay examines a hitherto overlooked source: Old Vietnamese, a language attested in the single document 佛說大報父母恩重經 Phật thuyết Đại báo phụ mẫu ân trọng kinh (Nguyễn Ngọc San 1982, Shimizu 1996, Hoàng Thị Ngọ 1999), which writes certain words, monosyllabic in modern Vietnamese, in an orthography suggesting sesquisyllabic phonology, e.g. rắn ‘snake’ is written 破散 (phá tản < phaH sanX), cf. Rục pəsiɲ³. For a number of words, Old Vietnamese provides the only testimony of the form of the Vietic borrowing.
The small list of currently known sesquisyllabic words of Chinese origin attested in this document includes examples of both words with a secure initial Chinese cluster and words with plausible Vietic prefixation. On the one hand, we find the word *s–kương ‘mirror’ borrowed from Chinese 鏡 kiæŋH. Here, the Vietnamese *s- is corroborated by its morphological derivation, which is an instance of the Sino-Tibetan instrumental deverbal
*sV-. On the other hand, for the word 阿唱 *ʔ–ɕướng ‘to chant’, likely borrowed from Middle Chinese 唱 tɕhaŋH, the Old Vietnamese form could reflect a dummy prefix *ʔ- (Section 4) that exists in other Vietic languages.
sharp relief the inherent tension in all human writing systems between the
lexico-morphemic and the phonetic tendencies. When the same lexical root
has, by allomorphy, several different phonetic forms, the writing system
cannot avoid either different graphemes for the same root or different
readings for the same grapheme.
This essay examines the case of verb stem alternation in Tangut, an extinct
Sino-Tibetan language mainly attested from the 11th to the 13th century,
with a syllabic and logographic writing system modeled after the Chinese. It is
shown that both the form and the distribution of the stems are closely related
to Rgyalrongic languages, especially Zbu Rgyalrong and Gexi Horpa.
This essay proposes a typology of three possible strategies of logographic
writing systems to represent root allomorphy, under which the Tangut case
is analyzed: underdifferentiation, with all phonetic forms represented by the
same grapheme, overdifferentiation, with different phonetic forms represented
by different graphemes as if they are different roots, and categoriography,
with systematic means of distinguishing between different allomorphs of the
same root. Tangut predominantly prefers overdifferentiation, followed by an
incipient form of categoriography. The Tangut orthography stands in stark
contrast to other logographic writing systems (Sumerian, Chinese, Japanese
and Middle Iranian), where underdifferentiation is predominant, followed by
categoriography. The highly deviant nature of Tangut script is hypothesized
as resulting from the imitation of the mature Chinese script, where one
character has ideally one single reading.