How People use Twitter in Different Languages

Manos Tsagkias

Outline

Natural Language Processing

How People use Twitter in Different Languages

Manos Tsagkias

2011, Text

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

In this paper we describe how Twitter is used in various languages. We observe notable differences between languages regarding the use of hashtags, links, mentions, and conversations. We propose two dimensions that can be used to classify languages, each of which is likely to require different ways of analysis.

Shahab Saquib Sohail

arXiv (Cornell University), 2021

The need for a comprehensive study to explore various aspects of online social media has been instigated by many researchers. This paper gives an insight into the social platform, Twitter. In this present work, we have illustrated stepwise procedure for crawling the data and discuss the key issues related to extracting associated features that can be useful in Twitter-related research while crawling these data from Application Programming Interfaces (APIs). Further, the data that comprises of over 86 million tweets have been analysed from various perspective including the most used languages, most frequent words, most frequent users, countries with most and least tweets and re-tweets, etc. The analysis reveals that the users' data associated with Twitter has a high affinity for researches in the various domain that includes politics, social science, economics, and linguistics, etc. In addition, the relation between Twitter users of a country and its human development index has been identified. It is observed that countries with very high human development indices have a relatively higher number of tweets compared to low human development indices countries. It is envisaged that the present study shall open many doors of researches in information processing and data science.

downloadDownload free PDF View PDFchevron_right

Utilizing Multilingual Language Data in (Nearly) Real Time: The Case of the Nordic Tweet Stream

Magnus Levin

2017

This paper presents the Nordic Tweet Stream, a cross-disciplinary digital humanities project that downloads Twitter messages from Denmark, Finland, Iceland, Norway and Sweden. The paper first introduces some of the technical aspects in creating a real-time monitor corpus that grows every day, and then two case studies illustrate how the corpus could be used as empirical evidence in studies focusing on the global spread of English. Our approach in the case studies is sociolinguistic, and we are interested in how widespread multilingualism which involves English is in the region, and what happens to ongoing grammatical change in digital environments. The results are based on 6.6 million tweets collected during the first four months of data streaming. They show that English was the most frequently used language, accounting for almost a third. This indicates that Nordic Twitter users choose English as a means of reaching wider audiences. The preference for English is the strongest in Denmark and the weakest in Finland. Tweeting mostly occurs late in the evening, and high-profile media events such as the Eurovision Song Contest produce considerable peaks in Twitter activity. The prevalent use of informal features such as univerbated verb forms (e.g., gotta for (HAVE) got to) supports previous findings of the speech-like nature of written Twitter data, but the results indicate that tweeters are pushing the limits even further.

downloadDownload free PDF View PDFchevron_right

Linguistic Features of English in Twitter

Cecille Jalbuena

This paper looked into the tweets of five prominent personalities in each of the following fields-education, entertainment, social life, politics and personal level-and analyzed the tone as well as the typing styles embedded in the lexical, grammatical and rhetorical features of the tweets. The content words or lexical features of English used in the five categories of tweets studied were neutral number nouns, singulars and plurals and proper nouns; unmarked adverbs, adverb particles and wh-adverbs; unmarked adjectives, comparatives and superlatives; the base form of the verb " be " , past form of the verb " be " ,-ing form of the verb " be " , infinitive of the verb " be " , past participle of the verb " be " ,-s form of the verb " be " , base form of the verb " do " , infinitive of the verb " do " , infinitive form of the verb " have " , base form of the lexical verb, past tense form of the lexical verb,-ing form of the lexical verb, infinitive of the lexical verb , past participle form of lexical verb and-s form of the lexical verb. Majority of the Twitter users from the five categories used lexical verbs followed by nouns, adjectives and adverbs in their tweets. The dominant grammatical features of English used in Twitter are prepositions; indefinite, personal, reflexive and wh-pronouns; auxiliary verbs, the base form of the verb " be " , past form of the verb " be " ,-ing form of the verb " be " , infinitive of the verb " be " , past participle of the verb " be " ,-s form of the verb " be " , base form of the verb " do " , past form of the verb " do, infinitive of the verb " do " , infinitive form of the verb " have " , base form of the lexical verb, past tense form of the lexical verb,-ing form of the lexical verb, infinitive of lexical verb, past participle form of the lexical verb and-s form of the lexical verb; conjunctions; articles and interjections. Among the tweets analyzed, more posts utilized formal rather than informal language. More emoticons than punctuation marks were used by Twitter users to express themselves. Moreover, the Twitter users analyzed had more positive than negative sentiments in their tweet posts. Future researchers can expand this study and look into the other grammatical features of Twitter English that may be a basis for instructional materials development.

downloadDownload free PDF View PDFchevron_right

Language, Twitter and Academic Conferences

Christoph Trattner

Proceedings of the 26th ACM Conference on Hypertext & Social Media - HT '15, 2015

Using Twitter during academic conferences is a way of engaging and connecting an audience inherently multicultural by the nature of scientific collaboration. English is expected to be the lingua franca bridging the communication and integration between native speakers of different mother tongues. However, little research has been done to support this assumption. In this paper we analyzed how integrated language communities are by analyzing the scholars' tweets used in 26 Computer Science conferences over a time span of five years. We found that although English is the most popular language used to tweet during conferences, a significant proportion of people also tweet in other languages. In addition, people who tweet solely in English interact mostly within the same group (English monolinguals), while people who speak other languages tend to show a more diverse interaction with other lingua groups. Finally, we also found that the people who interact with other Twitter users (by mentions or replies) show a more diverse language distribution, while people who do not interact mostly post tweets in a single language. These results suggest a relation between the number of languages a user speaks, which can affect the interaction dynamics of online communities.

downloadDownload free PDF View PDFchevron_right

English in social media: A linguistic analysis of tweets

Roberta Facchinetti

2015

A number of recent studies have addressed issues concerning Computer Mediated Communication (henceforth CMC), either providing classification schemes for the features of computer-mediated discourse and of the grammar electronic language in particular (Herring 2007, 2012) or focussing on the stylistic diversity of Internet language (Crystal 2006) – also from a sociolinguistic perspective (Androutsopoulos 2011) – or again attempting a methodological reflection on the use of data in language-focused research on CMC (Androutsopoulos&Beißwenger 2008, Facchinetti 2013). However, the linguistic specificities of social networks have not been tackled in detail so far; this is particularly the case of the micro-blogging platform Twitter.

downloadDownload free PDF View PDFchevron_right

The Linguistic Characteristics and Functions of Hashtags: #Is it a New Language

Arab World English Journal (AWEJ)

2020

Defined as a form of tagging that allows social media users to embed metadata in their posts, hashtags initially served to categorize topics and make them searchable online. Originating first on Twitter in 2007, hashtags have spread to other platforms, such as Instagram, Facebook, and Youtube. In addition to functioning as topic markers, hashtags have developed more complex linguistic functions. The ubiquity of this feature in the online medium, which now occupies a significant portion of our everyday communication is thus worthy of investigation. Although this topic has been researched in different disciplines, such as information diffusion, marketing, as well as sociology and public opinion, hashtags have not yet received enough attention from linguistic research. Using a sample of hashtags from a corpus of Instagram posts by Egyptian and Arab participants, this research thus aims to examine the characteristics of hashtags from a linguistic perspective, with particular focus on hashtags in the Arabic language. The study primarily seeks to determine the morpho-syntactic features of these recently emerging linguistic items according to the taxonomy proposed by Caleffi (2015). It also explores the pragmatic functions of hashtags based on Zappavigna's (2015) view of hashtags as technologically discursive tools. The analysis points out that most of the hashtags in the data serve the experiential function and come as suffixes. The findings reveal both similarities and differences between English and Arabic hashtags.

downloadDownload free PDF View PDFchevron_right

Analyzing the dynamic evolution of hashtags on twitter: a language-based approach

Virgilio Almeida

2011

Abstract Hashtags are used in Twitter to classify messages, propagate ideas and also to promote specific topics and people. In this paper, we present a linguistic-inspired study of how these tags are created, used and disseminated by the members of information networks. We study the propagation of hashtags in Twitter grounded on models for the analysis of the spread of linguistic innovations in speech communities, that is, in groups of people whose members linguistically influence each other.

downloadDownload free PDF View PDFchevron_right

Linguistic Analysis of Insta, Twit Posts and LJ Blogs in the Context of Their Functions (Based on the Russian Language)

Yuliya Kalugina

Int. J. Interact. Mob. Technol., 2021

The present article is concerned with identifying the linguistic and extralinguistic features of Instagram, Twitter, and Live Journal hypertexts, depending on their functional focus. The relevance of the topic is due to the need for a more detailed study of Internet communication from the point of view of functional and stylistic aspects. The study provides a comparative analysis of Instagram, Twitter, and Live Journal online services based on the Russian language material. The results are correlated with the questionnaire data on the studied problem. The article discusses graphic, lexical, stylistic, syntactic, and spelling features. The authors conducted a comparative analysis of the hypertexts of Instagram, Twitter, and Live Journal online services in the context of their functions; identified linguistic and extralinguistic features of the hypertext of the services under study; established the relationship between the language of the text and the function implemented. It has been...

downloadDownload free PDF View PDFchevron_right

A language-based approach to modelling and analysis of Twitter interactions

Alessandro Maggi

Journal of Logical and Algebraic Methods in Programming, 2017

More than a personal microblogging site, Twitter has been transformed by common use to an information publishing venue, which public characters, media channels and common people daily rely on for, e.g., news reporting and consumption, marketing, and social messaging. The use of Twitter in a cooperative and interactive setting calls for the precise awareness of the dynamics regulating message spreading. In this paper, we describe Twitlang, a language for modelling the interactions among Twitter accounts. The associated operational semantics allows users to precisely determine the effects of their actions on Twitter, such as post, reply-to or delete tweets. The language is implemented in the form of a Maude interpreter, Twitlanger, which takes a language term as an input and explores the computations arising from the term. By combining the strength of Twitlanger and the Maude model checker, it is possible to automatically verify communication properties of Twitter accounts. We illustrate the benefits of our executable formalisation by means of an application scenario inspired from real life. While the scenario highlights the benefits of adopting Twitter for a cooperative use in the everyday life, our analysis shows that appropriate settings are essential for a proper usage of the platform, in respect of fulfilling those communication properties expected within collaborative and interactive contexts.

downloadDownload free PDF View PDFchevron_right

How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter

Han-teng Liao

This paper presents a multilingual study on, per single post of microblog text, (a) how much can be said, (b) how much is written in terms of characters and bytes, and (c) how much is said in terms of information content in posts by different organizations in different languages. Focusing on three different languages (English, Chinese and Japanese), this research analyses Weibo and Twitter accounts of major embassies and news agencies. We first establish our criterion for quantifying ``how much can be said'' in a digital text based on the openly available Universal Declaration of Human Rights and the translated subtitles from TED talks. These parallel corpora allow us to determine the number of characters and bits needed to represent the same content in different languages and character encodings. We then derive the amount of information that is actually contained in microblog posts authored by selected accounts on Weibo and Twitter. Our results confirm that languages with larger character sets such as Chinese and Japanese contain more information per character than English, but the actual information content contained within a microblog text varies depending on both the type of organization and the language of the post. We conclude with a discussion on the design implications of microblog text limits for different languages.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (8)

REFERENCES
D. L. Altheide. Qualitative Media Analysis (Qualitative Re- search Methods). Sage Pubn Inc, 1996.
S. Carter, M. Tsagkias, and W. Weerkamp. Semi-supervised priors for microblog language identification. In Dutch-Belgian Information Retrieval workshop (DIR 2011), 2011.
G. Golovchinsky and M. Efron. Making sense of twitter search. In Proceedings of CHI 2010 Workshop on Microblog- ging: What and How Can We Learn From It?, 2010.
B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 60 (11):2169-2188, 2009.
T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twit- ter users: real-time event detection by social sensors. In Pro- ceedings of the 19th international conference on World wide web (WWW 2010), pages 851-860, 2010.
A. Tumasjan, T. Sprenger, P. Sandner, and I. Welpe. Predicting elections with twitter: What 140 characters reveal about polit- ical sentiment. In International AAAI Conference on Weblogs and Social Media (ICWSM 2010), pages 178-185, 2010.
S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen. Microblog- ging during two natural hazards events: what twitter may con- tribute to situational awareness. In Proceedings of the 28th in- ternational conference on Human factors in computing systems (CHI 2010), pages 1079-1088, 2010.

Nina Asmus

2019

This is an Open Access article distributed under the terms of the Creative Commons Attribution-Noncommercial 4.0 Unported License, permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

downloadDownload free PDF View PDFchevron_right

An Insight Into Twitter: A Corpus Based

Irina Argüelles-Álvarez

Revista de Lingüística y Lenguas Aplicadas, 2012

The aim of this paper is to study the use of Spanish and English in the micro-blogging social network Twitter from a contrastive point of view. A quantitative research methodology is applied in order firstly, to identify specific common characteristics of language, organization and content in the medium and secondly, to find eventual differences in the use of a particular language. To carry out the experiment, two corpora were constructed using language data from Twitter, one in Spanish with a total number of 4,027,746 words and another with similar characteristics in English with a total number of 4,655,992 words. From the results obtained, the conclusion is that there are a number of very general discourse and organizational features common to the two corpora under study. It is also concluded that there are some particular characteristics which differentiate the use of English and Spanish in the medium.

downloadDownload free PDF View PDFchevron_right

Dude, srsly?: The Surprisingly Formal Nature of Twitter's Language

Sahib Mukhija

Twitter has become the de facto information sharing and communication platform. Given the factors that influence language on Twitter – size limitation as well as communication and content-sharing mechanisms – there is a continuing debate about the position of Twitter's language in the spectrum of language on various established mediums. These include SMS and chat on the one hand (size limitations) and email (communication), blogs and newspapers (content sharing) on the other. To provide a way of determining this, we propose a computational framework that offers insights into the linguistic style of all these mediums. Our framework consists of two parts. The first part builds upon a set of linguistic features to quantify the language of a given medium. The second part introduces a flexible factorization framework, SOCLIN, which conducts a psycholinguistic analysis of a given medium with the help of an external cognitive and affective knowledge base. Applying this analytical framework to various corpora from several major mediums, we gather statistics in order to compare the linguistics of Twitter with these other mediums via a quantitative comparative study. We present several key insights: (1) Twitter's language is surprisingly more conservative, and less informal than SMS and online chat; (2) Twitter users appear to be developing linguistically unique styles; (3) Twit-ter's usage of temporal references is similar to SMS and chat; and (4) Twitter has less variation of affect than other more formal mediums. The language of Twitter can thus be seen as a projection of a more formal register into a size-restricted space.

downloadDownload free PDF View PDFchevron_right

How to analyze the language of social networking sites

Michael Szurawitzki

downloadDownload free PDF View PDFchevron_right

The Twitter of Babel: Mapping World Languages through Microblogging Platforms

Bruno Goncalves

2012

Abstract: Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data" proxies" of social life are still open.

downloadDownload free PDF View PDFchevron_right

I Tweet, Therefore I Am: Indexing Identity through Language Choice on Twitter Posts of @AsliSemarang Account

Nina Setyaningsih

As a trending means of communication, the Internet has a major influence in language use. Internet communication forms such as e-mails, instant messaging, blogging, and social networking sites are enriching the way people communicate. However, the dominance of English as an international language in those media can generate a serious threat to local languages including Semarangan Javanese. The choice of language in social networking sites can indicate the status of the language in the community. Furthermore, the observation on this kind of medium can be related to the phenomenon of indexicality (Taylor-Leech, 2012). According to Bucholtz & Hall (2004), indexicality points to the social phenomena behind language choice. This paper outlines identity as produced in linguistic interaction particularly on Twitter. It investigates how the language used on @AsliSemarang Twitter posts indexes a local identity especially that of Semarang, Central Java. To achieve this aim, 40 tweets posted by administrators and followers of @AsliSemarang were examined. The result shows that the indices are realized in hash tags with typical Semarangan Javanese words, address terms, interjections, particles, code alternation, and other Semarangan Javanese lexicons. It can be concluded that these features index the identity of Semarangan Javanese speakers. The research also suggests that social networking media can be a valuable tool to maintain local identity especially a language and its varieties and thus preserving the richness of Indonesian culture.

downloadDownload free PDF View PDFchevron_right

TLA: Twitter Linguistic Analysis

Tushar Sarkar

2021

Linguistics have been instrumental in developing a deeper understanding of human nature. Words are indispensable to bequeath the thoughts, emotions, and purpose of any human interaction, and critically analyzing these words can elucidate the social and psychological behavior and characteristics of these social animals. Social media has become a platform for human interaction on a large scale and thus gives us scope for collecting and using that data for our study. However, this entire process of collecting, labeling, and analyzing this data iteratively makes the entire procedure cumbersome. To make this entire process easier and structured, we would like to introduce TLA(Twitter Linguistic Analysis). In this paper, we describe TLA and provide a basic understanding of the framework and discuss the process of collecting, labeling, and analyzing data from Twitter for a corpus of languages while providing detailed labeled datasets for all the languages and the models are trained on thes...

downloadDownload free PDF View PDFchevron_right

Different Language Usage on Social Media

Ijaems Journal

The research presented the effects of Social Media on the formation of new words that are being used by the Social Media users that often includes in the formal use of language in the academe. The emergence of different Social Networking Sites (SNSs) such as Facebook, Twitter, and E-mail have driven a more advanced change in the way people communicate. The study aimed to assess how Social Media affects the formal English Language used in the academe. The result of the study also highlighted how often do Social Media users of the Central Luzon State University, College of Education use the proper abbreviations, exclamatory spelling of emoticons, use letter homophones, acronyms, commit misspelled words, use shortening of words, use numbers to represents words, and use combination of two different language in their papers. The qualitative method of research used the survey technique and was utilized for gathering data. The questionnaires serve as the instrument for collecting data. 50 students of the College of Education, English Majors of the said university are the respondents.

downloadDownload free PDF View PDFchevron_right

Twitter in Foreign Language Classes

Lara Lomicka

Handbook of Research on Learning Outcomes and Opportunities in the Digital Age

This chapter looks at the potential of the mircroblogging tool Twitter as a multifaceted resource for foreign language learners and educators. It highlights how this microblogging and social networking service provides authentic settings that are both dynamic and communicative, and which facilitate the cultural enrichment of first-year French learners, by enhancing their socio-pragmatic awareness and developing their multiliteracy skills in a second language. We argue for the importance of making students aware of this linguistic culture from an early stage of their language studies. This invisible second language culture is rarely discussed in traditional classrooms and only sporadically presented in foreign language textbooks; however, it can easily be experienced in digital environments like Twitter, making it an ideal context for such exposure. Our results suggest that the incorporation of linguistic cultural elements is indispensable to the development of intercultural communic...

downloadDownload free PDF View PDFchevron_right

How People use Twitter in Different Languages

Sign up for access to the world's latest research

Abstract

Related papers

References (8)

Related papers

Related topics