How People use Twitter in Different Languages
2011, Text
Sign up for access to the world's latest research
Abstract
In this paper we describe how Twitter is used in various languages. We observe notable differences between languages regarding the use of hashtags, links, mentions, and conversations. We propose two dimensions that can be used to classify languages, each of which is likely to require different ways of analysis.
Related papers
arXiv (Cornell University), 2021
The need for a comprehensive study to explore various aspects of online social media has been instigated by many researchers. This paper gives an insight into the social platform, Twitter. In this present work, we have illustrated stepwise procedure for crawling the data and discuss the key issues related to extracting associated features that can be useful in Twitter-related research while crawling these data from Application Programming Interfaces (APIs). Further, the data that comprises of over 86 million tweets have been analysed from various perspective including the most used languages, most frequent words, most frequent users, countries with most and least tweets and re-tweets, etc. The analysis reveals that the users' data associated with Twitter has a high affinity for researches in the various domain that includes politics, social science, economics, and linguistics, etc. In addition, the relation between Twitter users of a country and its human development index has been identified. It is observed that countries with very high human development indices have a relatively higher number of tweets compared to low human development indices countries. It is envisaged that the present study shall open many doors of researches in information processing and data science.
2017
This paper presents the Nordic Tweet Stream, a cross-disciplinary digital humanities project that downloads Twitter messages from Denmark, Finland, Iceland, Norway and Sweden. The paper first introduces some of the technical aspects in creating a real-time monitor corpus that grows every day, and then two case studies illustrate how the corpus could be used as empirical evidence in studies focusing on the global spread of English. Our approach in the case studies is sociolinguistic, and we are interested in how widespread multilingualism which involves English is in the region, and what happens to ongoing grammatical change in digital environments. The results are based on 6.6 million tweets collected during the first four months of data streaming. They show that English was the most frequently used language, accounting for almost a third. This indicates that Nordic Twitter users choose English as a means of reaching wider audiences. The preference for English is the strongest in Denmark and the weakest in Finland. Tweeting mostly occurs late in the evening, and high-profile media events such as the Eurovision Song Contest produce considerable peaks in Twitter activity. The prevalent use of informal features such as univerbated verb forms (e.g., gotta for (HAVE) got to) supports previous findings of the speech-like nature of written Twitter data, but the results indicate that tweeters are pushing the limits even further.
This paper looked into the tweets of five prominent personalities in each of the following fields-education, entertainment, social life, politics and personal level-and analyzed the tone as well as the typing styles embedded in the lexical, grammatical and rhetorical features of the tweets. The content words or lexical features of English used in the five categories of tweets studied were neutral number nouns, singulars and plurals and proper nouns; unmarked adverbs, adverb particles and wh-adverbs; unmarked adjectives, comparatives and superlatives; the base form of the verb " be " , past form of the verb " be " ,-ing form of the verb " be " , infinitive of the verb " be " , past participle of the verb " be " ,-s form of the verb " be " , base form of the verb " do " , infinitive of the verb " do " , infinitive form of the verb " have " , base form of the lexical verb, past tense form of the lexical verb,-ing form of the lexical verb, infinitive of the lexical verb , past participle form of lexical verb and-s form of the lexical verb. Majority of the Twitter users from the five categories used lexical verbs followed by nouns, adjectives and adverbs in their tweets. The dominant grammatical features of English used in Twitter are prepositions; indefinite, personal, reflexive and wh-pronouns; auxiliary verbs, the base form of the verb " be " , past form of the verb " be " ,-ing form of the verb " be " , infinitive of the verb " be " , past participle of the verb " be " ,-s form of the verb " be " , base form of the verb " do " , past form of the verb " do, infinitive of the verb " do " , infinitive form of the verb " have " , base form of the lexical verb, past tense form of the lexical verb,-ing form of the lexical verb, infinitive of lexical verb, past participle form of the lexical verb and-s form of the lexical verb; conjunctions; articles and interjections. Among the tweets analyzed, more posts utilized formal rather than informal language. More emoticons than punctuation marks were used by Twitter users to express themselves. Moreover, the Twitter users analyzed had more positive than negative sentiments in their tweet posts. Future researchers can expand this study and look into the other grammatical features of Twitter English that may be a basis for instructional materials development.
Proceedings of the 26th ACM Conference on Hypertext & Social Media - HT '15, 2015
Using Twitter during academic conferences is a way of engaging and connecting an audience inherently multicultural by the nature of scientific collaboration. English is expected to be the lingua franca bridging the communication and integration between native speakers of different mother tongues. However, little research has been done to support this assumption. In this paper we analyzed how integrated language communities are by analyzing the scholars' tweets used in 26 Computer Science conferences over a time span of five years. We found that although English is the most popular language used to tweet during conferences, a significant proportion of people also tweet in other languages. In addition, people who tweet solely in English interact mostly within the same group (English monolinguals), while people who speak other languages tend to show a more diverse interaction with other lingua groups. Finally, we also found that the people who interact with other Twitter users (by mentions or replies) show a more diverse language distribution, while people who do not interact mostly post tweets in a single language. These results suggest a relation between the number of languages a user speaks, which can affect the interaction dynamics of online communities.
2015
A number of recent studies have addressed issues concerning Computer Mediated Communication (henceforth CMC), either providing classification schemes for the features of computer-mediated discourse and of the grammar electronic language in particular (Herring 2007, 2012) or focussing on the stylistic diversity of Internet language (Crystal 2006) – also from a sociolinguistic perspective (Androutsopoulos 2011) – or again attempting a methodological reflection on the use of data in language-focused research on CMC (Androutsopoulos&Beißwenger 2008, Facchinetti 2013). However, the linguistic specificities of social networks have not been tackled in detail so far; this is particularly the case of the micro-blogging platform Twitter.
2020
Defined as a form of tagging that allows social media users to embed metadata in their posts, hashtags initially served to categorize topics and make them searchable online. Originating first on Twitter in 2007, hashtags have spread to other platforms, such as Instagram, Facebook, and Youtube. In addition to functioning as topic markers, hashtags have developed more complex linguistic functions. The ubiquity of this feature in the online medium, which now occupies a significant portion of our everyday communication is thus worthy of investigation. Although this topic has been researched in different disciplines, such as information diffusion, marketing, as well as sociology and public opinion, hashtags have not yet received enough attention from linguistic research. Using a sample of hashtags from a corpus of Instagram posts by Egyptian and Arab participants, this research thus aims to examine the characteristics of hashtags from a linguistic perspective, with particular focus on hashtags in the Arabic language. The study primarily seeks to determine the morpho-syntactic features of these recently emerging linguistic items according to the taxonomy proposed by Caleffi (2015). It also explores the pragmatic functions of hashtags based on Zappavigna's (2015) view of hashtags as technologically discursive tools. The analysis points out that most of the hashtags in the data serve the experiential function and come as suffixes. The findings reveal both similarities and differences between English and Arabic hashtags.
2011
Abstract Hashtags are used in Twitter to classify messages, propagate ideas and also to promote specific topics and people. In this paper, we present a linguistic-inspired study of how these tags are created, used and disseminated by the members of information networks. We study the propagation of hashtags in Twitter grounded on models for the analysis of the spread of linguistic innovations in speech communities, that is, in groups of people whose members linguistically influence each other.
Int. J. Interact. Mob. Technol., 2021
The present article is concerned with identifying the linguistic and extralinguistic features of Instagram, Twitter, and Live Journal hypertexts, depending on their functional focus. The relevance of the topic is due to the need for a more detailed study of Internet communication from the point of view of functional and stylistic aspects. The study provides a comparative analysis of Instagram, Twitter, and Live Journal online services based on the Russian language material. The results are correlated with the questionnaire data on the studied problem. The article discusses graphic, lexical, stylistic, syntactic, and spelling features. The authors conducted a comparative analysis of the hypertexts of Instagram, Twitter, and Live Journal online services in the context of their functions; identified linguistic and extralinguistic features of the hypertext of the services under study; established the relationship between the language of the text and the function implemented. It has been...
Journal of Logical and Algebraic Methods in Programming, 2017
More than a personal microblogging site, Twitter has been transformed by common use to an information publishing venue, which public characters, media channels and common people daily rely on for, e.g., news reporting and consumption, marketing, and social messaging. The use of Twitter in a cooperative and interactive setting calls for the precise awareness of the dynamics regulating message spreading. In this paper, we describe Twitlang, a language for modelling the interactions among Twitter accounts. The associated operational semantics allows users to precisely determine the effects of their actions on Twitter, such as post, reply-to or delete tweets. The language is implemented in the form of a Maude interpreter, Twitlanger, which takes a language term as an input and explores the computations arising from the term. By combining the strength of Twitlanger and the Maude model checker, it is possible to automatically verify communication properties of Twitter accounts. We illustrate the benefits of our executable formalisation by means of an application scenario inspired from real life. While the scenario highlights the benefits of adopting Twitter for a cooperative use in the everyday life, our analysis shows that appropriate settings are essential for a proper usage of the platform, in respect of fulfilling those communication properties expected within collaborative and interactive contexts.
This paper presents a multilingual study on, per single post of microblog text, (a) how much can be said, (b) how much is written in terms of characters and bytes, and (c) how much is said in terms of information content in posts by different organizations in different languages. Focusing on three different languages (English, Chinese and Japanese), this research analyses Weibo and Twitter accounts of major embassies and news agencies. We first establish our criterion for quantifying ``how much can be said'' in a digital text based on the openly available Universal Declaration of Human Rights and the translated subtitles from TED talks. These parallel corpora allow us to determine the number of characters and bits needed to represent the same content in different languages and character encodings. We then derive the amount of information that is actually contained in microblog posts authored by selected accounts on Weibo and Twitter. Our results confirm that languages with larger character sets such as Chinese and Japanese contain more information per character than English, but the actual information content contained within a microblog text varies depending on both the type of organization and the language of the post. We conclude with a discussion on the design implications of microblog text limits for different languages.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (8)
- REFERENCES
- D. L. Altheide. Qualitative Media Analysis (Qualitative Re- search Methods). Sage Pubn Inc, 1996.
- S. Carter, M. Tsagkias, and W. Weerkamp. Semi-supervised priors for microblog language identification. In Dutch-Belgian Information Retrieval workshop (DIR 2011), 2011.
- G. Golovchinsky and M. Efron. Making sense of twitter search. In Proceedings of CHI 2010 Workshop on Microblog- ging: What and How Can We Learn From It?, 2010.
- B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 60 (11):2169-2188, 2009.
- T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twit- ter users: real-time event detection by social sensors. In Pro- ceedings of the 19th international conference on World wide web (WWW 2010), pages 851-860, 2010.
- A. Tumasjan, T. Sprenger, P. Sandner, and I. Welpe. Predicting elections with twitter: What 140 characters reveal about polit- ical sentiment. In International AAAI Conference on Weblogs and Social Media (ICWSM 2010), pages 178-185, 2010.
- S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen. Microblog- ging during two natural hazards events: what twitter may con- tribute to situational awareness. In Proceedings of the 28th in- ternational conference on Human factors in computing systems (CHI 2010), pages 1079-1088, 2010.