Papers by Georgios Smaragdakis

arXiv (Cornell University), May 10, 2022
Ransomware attacks are among the most severe cyber threats. They have made headlines in recent ye... more Ransomware attacks are among the most severe cyber threats. They have made headlines in recent years by threatening the operation of governments, critical infrastructure, and corporations. Collecting and analyzing ransomware data is an important step towards understanding the spread of ransomware and designing effective defense and mitigation mechanisms. We report on our experience operating Ransomwhere, an open crowdsourced ransomware payment tracker to collect information from victims of ransomware attacks. With Ransomwhere, we have gathered 13.5k ransom payments to more than 87 ransomware criminal actors with total payments of more than $101 million. Leveraging the transparent nature of Bitcoin, the cryptocurrency used for most ransomware payments, we characterize the evolving ransomware criminal structure and ransom laundering strategies. Our analysis shows that there are two parallel ransomware criminal markets: commodity ransomware and Ransomware as a Service (RaaS). We notice that there are striking differences between the two markets in the way that cryptocurrency resources are utilized, revenue per transaction, and ransom laundering efficiency. Although it is relatively easy to identify choke points in commodity ransomware payment activity, it is more difficult to do the same for RaaS. Kingdom [37], Australia , Canada , and law enforcement agencies, such as the FBI and Europol [11], have also launched similar programs to defend against ransomware and offer help to victims. To the criminal actors behind these attacks, the resulting disruption is just 'collateral damage'. A handful of groups and individuals, with names such as NetWalker, Conti, REvil and DarkSide, have received tens of millions in USD as ransom. But this is just the top of the food chain in an ecosystem with many grey areas, especially when it comes to laundering illicit proceedings. In this article, we will provide a closer look at the ecosystem behind many of the attacks plaguing businesses and societies, known as Ransomware as a Service (RaaS). Cryptocurrencies remain the payment method of choice for criminal ransomware actors. While many cryptocurrencies exist, Bitcoin is preferred due to its network effects, resulting in wide exchange options. Bitcoin's sound monetary features as a medium of exchange, unit of account and store of value make it as attractive to criminals as it is to regular citizens. According to the U.S. Department of Treasury, based on data from the first half of 2021, the "vast majority" of reported ransomware payments were made in Bitcoin . Law enforcement agencies have started to disrupt ransomware actors by obtaining personal information of users from Bitcoin exchange platforms. This is realized through anti-money laundering regulations such as Know Your Customer (KYC), which require legal identity verification during registration with the service. While cryptocurrencies such as Bitcoin are enablers of ransomware, blockchain technology also offers unprecedented opportunities for forensic analysis and intelligence gathering. Using our crowdsourced ransomware payment tracker, Ransomwhere, we compile a dataset of 7,321 Bitcoin addresses which received ransom payments, based on which we shed light on the structure and state of the ransomware ecosystem. Our contributions are as follows: • We collect and analyze the largest public dataset of ransomware activity to date, which includes 13,497 ransom payments to 87 criminal actors over the last five years, worth more than 101 million USD.
Während der COVID-19-Pandemie ist das Internet für die Aufrechterhaltung vieler Gesellschaftsbere... more Während der COVID-19-Pandemie ist das Internet für die Aufrechterhaltung vieler Gesellschaftsbereiche als Kommunikationsinfrastruktur von essentieller Bedeutung, beispielspeise im Homeschooling oder im Homeoffice. So sehr die Pandemie unser gesellschaftliches Leben verändert, so sehr verändert sie auch den Internetverkehr und es stellen sich aufgrund des sprunghaften Anstieges der Internetnutzung grundsätzliche Fragen der Robustheit der Internetinfrastruktur. In der vorliegenden Studie haben wir die Änderungen des Internetverkehrs während der ersten Welle der COVID-19-Pandemie untersucht.

arXiv (Cornell University), Feb 1, 2023
During the first days of the 2022 Russian invasion of Ukraine, Russia's media regulator blocked a... more During the first days of the 2022 Russian invasion of Ukraine, Russia's media regulator blocked access to many global social media platforms and news sites, including Twitter, Facebook, and the BBC. To bypass the information controls set by Russian authorities, pro-Ukrainian groups explored unconventional ways to reach out to the Russian population, such as posting war-related content in the user reviews of Russian business available on Google Maps or Tripadvisor. This paper provides a first analysis of this new phenomenon by analyzing the creative strategies to avoid state censorship. Specifically, we analyze reviews posted on these platforms from the beginning of the conflict to September 2022. We measure the channeling of war messages through user reviews in Tripadvisor and Google Maps, as well as in VK, a popular Russian social network. Our analysis of the content posted on these services reveals that users leveraged these platforms to seek and exchange humanitarian and travel advice, but also to disseminate disinformation and polarized messages. Finally, we analyze the response of platforms in terms of content moderation and their impact.

IEEE Access, 2023
The Dark Web, primarily Tor, has evolved to protect user privacy and freedom of speech through an... more The Dark Web, primarily Tor, has evolved to protect user privacy and freedom of speech through anonymous routing. However, Tor also facilitates cybercriminal actors who utilize it for illicit activities. Quantifying the size and nature of such activity is challenging, as Tor complicates indexing by design. This paper proposes a methodology to estimate both size and nature of illicit commercial activity on the Dark Web. We demonstrate this based on crawling Tor for single-vendor Dark Web Shops, i.e., niche storefronts operated by single cybercriminal actors or small groups. Based on data collected from Tor, we show that just in 2021, Dark Web Shops generated at least 113 million USD in revenue. Sexual abuse is the top illicit revenue category, followed by financial crime at a great distance. We also compare Dark Web Shops' activity with a large Dark Web Marketplace, showing that these are parallel economies. Our methodology contributes towards automated analysis of illicit activity in Tor. Furthermore our analysis sheds light on the evolving Dark Web Shop ecosystem and provides insights into evidence-based policymaking regarding criminal Dark Web activity.

In this paper, we show that adoption of the SNMPv3 network management protocol standard offers a ... more In this paper, we show that adoption of the SNMPv3 network management protocol standard offers a unique-but likely unintendedopportunity for remotely fingerprinting network infrastructure in the wild. Specifically, by sending unsolicited and unauthenticated SNMPv3 requests, we obtain detailed information about the configuration and status of network devices including vendor, uptime, and the number of restarts. More importantly, the reply contains a persistent and strong identifier that allows for lightweight Internetscale alias resolution and dual-stack association. By launching active Internet-wide SNMPv3 scan campaigns, we show that our technique can fingerprint more than 4.6 million devices of which around 350k are network routers. Not only is our technique lightweight and accurate, it is complementary to existing alias resolution, dualstack inference, and device fingerprinting approaches. Our analysis not only provides fresh insights into the router deployment strategies of network operators worldwide, but also highlights potential vulnerabilities of SNMPv3 as currently deployed. • Networks → Network protocols; Network management.

Consumer Internet of ings (IoT) devices are extremely popular, providing users with rich and dive... more Consumer Internet of ings (IoT) devices are extremely popular, providing users with rich and diverse functionalities, from voice assistants to home appliances. ese functionalities o en come with signi cant privacy and security risks, with notable recent largescale coordinated global a acks disrupting large service providers. us, an important rst step to address these risks is to know what IoT devices are where in a network. While some limited solutions exist, a key question is whether device discovery can be done by Internet service providers that only see sampled ow statistics. In particular, it is challenging for an ISP to e ciently and e ectively track and trace activity from IoT devices deployed by its millions of subscribers-all with sampled network data. In this paper, we develop and evaluate a scalable methodology to accurately detect and monitor IoT devices at subscriber lines with limited, highly sampled data in-the-wild. Our ndings indicate that millions of IoT devices are detectable and identi able within hours, both at a major ISP as well as an IXP, using passive, sparsely sampled network ow headers. Our methodology is able to detect devices from more than 77% of the studied IoT manufacturers, including popular devices such as smart speakers. While our methodology is e ective for providing network analytics, it also highlights signi cant privacy consequences.
We present PaDIS Emulator, a fully automated platform to evaluate CDN-ISP collaboration for bette... more We present PaDIS Emulator, a fully automated platform to evaluate CDN-ISP collaboration for better content delivery, traffic engineering, and cost reduction. The PaDIS Emulator enables researchers as well as CDN and ISP operators to evaluate the benefits of collaboration using their own operational networks, configuration, and cost functions. The PaDIS Emulator consists of three components: the network emulation, the collaboration mechanism, and the performance monitor. These layers provide scalable emulation of the interaction between an ISP or a number of ISPs with multiple CDNs and vice versa. PaDIS Emulator design is flexible in order to implement a wide range of collaboration mechanisms on virtualized or real hardware, and evaluate them before introduction to operational networks.

Due to the COVID-19 pandemic, many governments imposed lockdowns that forced hundreds of millions... more Due to the COVID-19 pandemic, many governments imposed lockdowns that forced hundreds of millions of citizens to stay at home. The implementation of confinement measures increased Internet traffic demands of residential users, in particular, for remote working, entertainment, commerce, and education, which, as a result, caused traffic shifts in the Internet core. In this paper, using data from a diverse set of vantage points (one ISP, three IXPs, and one metropolitan educational network), we examine the effect of these lockdowns on traffic shifts. We find that the traffic volume increased by 15-20% almost within a weekwhile overall still modest, this constitutes a large increase within this short time period. However, despite this surge, we observe that the Internet infrastructure is able to handle the new volume, as most traffic shifts occur outside of traditional peak hours. When looking directly at the traffic sources, it turns out that, while hypergiants still contribute a significant fraction of traffic, we see (1) a higher increase in traffic of non-hypergiants, and (2) traffic increases in applications that people use when at home, such as Web conferencing, VPN, and gaming. While many networks see increased traffic demands, in particular, those providing services to residential users, academic networks experience major overall decreases. Yet, in these networks, we can observe substantial increases when considering applications associated to remote working and lecturing. • Networks → Network measurement.

Today, there is an increasing number of peering agreements between Hypergiants and networks that ... more Today, there is an increasing number of peering agreements between Hypergiants and networks that benefit millions of end-user. However, the majority of Autonomous Systems do not currently enjoy the benefit of interconnecting directly with Hypergiants to optimally select the path for delivering Hypergiant traffic to their users. In this paper, we develop and evaluate an architecture that can help this long tail of networks. With our proposed architecture, a network establishes an out-of-band communication channel with Hypergiants that can be two or more AS hops away and, optionally, with the transit provider. This channel enables the exchange of network information to better assign requests of end-users to appropriate Hypergiant servers. Our analysis using operational data shows that our architecture can optimize, on average, 15% of Hypergiants' traffic and 11% of the overall traffic of networks that do not interconnect with Hypergiants. The gains are even higher during peak hours when available capacity can be scarce, up to 46% for some Hypergiants. • Networks → Network architectures.

arXiv (Cornell University), Oct 7, 2021
BGP communities are a popular mechanism used by network operators for traffic engineering, blackh... more BGP communities are a popular mechanism used by network operators for traffic engineering, blackholing, and to realize network policies and business strategies. In recent years, many research works have contributed to our understanding of how BGP communities are utilized, as well as how they can reveal secondary insights into real-world events such as outages and security attacks. However, one fundamental question remains unanswered: "Which ASes tag announcements with BGP communities and which remove communities in the announcements they receive?" A grounded understanding of where BGP communities are added or removed can help better model and predict BGP-based actions in the Internet and characterize the strategies of network operators. In this paper we develop, validate, and share data from the first algorithm that can infer BGP community tagging and cleaning behavior at the AS-level. The algorithm is entirely passive and uses BGP update messages and snapshots, e.g. from public route collectors, as input. First, we quantify the correctness and accuracy of the algorithm in controlled experiments with simulated topologies. To validate in the wild, we announce prefixes with communities and confirm that more than 90% of the ASes that we classify behave as our algorithm predicts. Finally, we apply the algorithm to data from four sets of BGP collectors: RIPE, RouteViews, Isolario, and PCH. Tuned conservatively, our algorithm ascribes community tagging and cleaning behaviors to more than 13k ASes, the majority of which are large networks and providers. We make our algorithm and inferences available as a public resource to the BGP research community. • Networks → Network protocols; Network management.

Communications of The ACM, Jul 25, 2023
A data-driven, follow-the-money approach to characterize the ransomware ecosystem uncovers two pa... more A data-driven, follow-the-money approach to characterize the ransomware ecosystem uncovers two parallel ransomware criminal markets: commodity ransomware and Ransomware as a Service (RaaS). RANSOMWARE, A FORM of malware designed to encrypt a victim's files and make them unusable without payment, has quickly become a threat to the functioning of many institutions and corporations around the globe. In 2021 alone, ransomware caused major hospital disruptions in Ireland, 18 empty supermarket shelves in the Netherlands, 2 the closing of 800 supermarkets in Sweden, and gasoline shortages in the U.S. In a recent report, the European Union Agency for Cybersecurity (ENISA) ranked ransomware as the "prime threat for 2020-2021." The U.S. government reacted to high-profile attacks against U.S. industries by declaring ransomware a national security threat and announcing a "coordinated campaign to counter ransomware." 1 Other governments, including the U.K., 25 Australia, 28 Canada, 29 and law enforcement agencies, such as the FBI 31 and Europol, 32 have launched similar programs to defend against ransomware and offer help to victims. To the criminal actors behind these attacks, the resulting disruption is just 'collateral damage.' A handful of groups and individuals, with names such as NetWalker, Conti, REvil, and DarkSide, have received tens of millions of dollars as ransom. But this is just the top of the food chain in an ecosystem with many predators and prey, especially when it comes to laundering illicit proceedings. In this article, we will provide a closer look at the ecosystem behind many of the attacks plaguing businesses and societies, known as Ransomware as a Service (RaaS). Cryptocurrency remains the payment method of choice for criminal ransomware actors. While many cryptocurrencies exist, Bitcoin is preferred due to its network effects, resulting in wide exchange options. Bitcoin's sound monetary features as a medium of exchange, unit of account, and store of value make it as attractive to criminals as it is to regular citizens. According to the U.S. Department of Treasury, based on data from 2021, the "vast majority" of reported ransomware payments were made in Bitcoin. However, sig-A Tale of Two Markets: Investigating the Ransomware Payments Economy key insights ˽ This research effort collects and analyzes the largest public dataset of ransomware activity to date, which includes 13,497 ransom payments to 87 criminal actors over the last five years, worth more than $101 million. ˽ Analysis of the evolving ransomware ecosystem shows that there are two parallel ransomware markets: commodity and RaaS. ˽ Analysis of more than 13,000 transfers shows striking differences in laundering time, use of exchanges, and other means to cash out ransom payments. ˽ Defending against professionally operated RaaS is challenging; the authors propose ways to trace back RaaS cryptocurrency activity.

Delay tolerant bulk data transfers on the internet
Performance evaluation review, Jun 15, 2009
Many emerging scientific and industrial applications require transferring multiple Tbytes of data... more Many emerging scientific and industrial applications require transferring multiple Tbytes of data on a daily basis. Examples include pushing scientific data from particle accelerators/colliders to laboratories around the world, synchronizing data-centers across continents, and replicating collections of high definition videos from events taking place at different time-zones. A key property of all above applications is their ability to tolerate delivery delays ranging from a few hours to a few days. Such Delay Tolerant Bulk (DTB) data are currently being serviced mostly by the postal system using hard drives and DVDs, or by expensive dedicated networks. In this work we propose transmitting such data through commercial ISPs by taking advantage of already-paid-for off-peak bandwidth resulting from diurnal traffic patterns and percentile pricing. We show that between sender-receiver pairs with small time-zone difference, simple source scheduling policies are able to take advantage of most of the existing off-peak capacity. When the time-zone difference increases, taking advantage of the full capacity requires performing store-and-forward through intermediate storage nodes. We present an extensive evaluation of the two options based on traffic data from 200+ links of a large transit provider with PoPs at three continents. Our results indicate that there exists huge potential for performing multi Tbyte transfers on a daily basis at little or no additional cost.

Internet of Things (IoT) devices are becoming increasingly ubiquitous, e.g., at home, in enterpri... more Internet of Things (IoT) devices are becoming increasingly ubiquitous, e.g., at home, in enterprise environments, and in production lines. To support the advanced functionalities of IoT devices, IoT vendors as well as service and cloud companies operate IoT backendsÐthe focus of this paper. We propose a methodology to identify and locate them by (a) compiling a list of domains used exclusively by major IoT backend providers and (b) then identifying their server IP addresses. We rely on multiple sources, including IoT backend provider documentation, passive DNS data, and active scanning. For analyzing IoT traffic patterns, we rely on passive network flows from a major European ISP. Our analysis focuses on the top IoT backends and unveils diverse operational strategiesÐfrom operating their own infrastructure to utilizing the public cloud. We find that the majority of the top IoT backend providers are located in multiple locations and countries. Still, a handful are located only in one country, which could raise regulatory scrutiny as the client IoT devices are located in other regions. Indeed, our analysis shows that up to 35% of IoT traffic is exchanged with IoT backend servers located in other continents. We also find that at least six of the top IoT backends rely on other IoT backend providers. We also evaluate if cascading effects among the IoT backend providers are possible in the event of an outage, a misconfiguration, or an attack.

Amplification Distributed Denial of Service (DDoS) attacks' traffic and harm are at an all-time h... more Amplification Distributed Denial of Service (DDoS) attacks' traffic and harm are at an all-time high. To defend against such attacks, distributed attack mitigation platforms, such as traffic scrubbing centers that operate in peering locations, e.g., Internet Exchange Points (IXP), have been deployed in the Internet over the years. These attack mitigation platforms apply sophisticated techniques to detect attacks and drop attack traffic locally, thus, act as sensors for attacks. However, it has not yet been systematically evaluated and reported to what extent coordination of these views by different platforms can lead to more effective mitigation of amplification DDoS attacks. In this paper, we ask the question: "Is it possible to mitigate more amplification attacks and drop more attack traffic when distributed attack mitigation platforms collaborate?" To answer this question, we collaborate with eleven IXPs that operate in three different regions. These IXPs have more than 2,120 network members that exchange traffic at the rate of more than 11 Terabits per second. We collect network data over six months and analyze more than 120k amplification DDoS attacks. To our surprise, more than 80% of the amplification DDoS are not detected locally, although the majority of the attacks are visible by at least three IXPs. A closer investigation points to the shortcomings, such as the multi-protocol profile of modern amplification attacks, the duration of the attacks, and the difficulty of setting appropriate local attack traffic thresholds that will trigger mitigation. To overcome these limitations, we design and evaluate a collaborative architecture that allows participant mitigation platforms to exchange information about ongoing amplification attacks. Our evaluation shows that it is possible to collaboratively detect and mitigate the majority of attacks with limited exchange of information and drop as much as 90% more attack traffic locally.
With the increasing cloud usage for access to fast and well-connected computational power, cloud ... more With the increasing cloud usage for access to fast and well-connected computational power, cloud outages have also become a growing risk for businesses and individuals alike. We derive a method to analyze publicly available BGP data to measure the visibility of cloud providers' outages on the Internet control plane. We then utilize this method to analyze an outage of Cloudflare, a large DNS and content provider. Cloudflare's outage study shows that visible traces can be found in BGP, enabling data-driven outage studies.

arXiv (Cornell University), Oct 2, 2020
BGP communities are widely used to tag prefix aggregates for policy, traffic engineering, and int... more BGP communities are widely used to tag prefix aggregates for policy, traffic engineering, and inter-AS signaling. Because individual ASes define their own community semantics, many ASes blindly propagate communities they do not recognize. Prior research has shown the potential security vulnerabilities when communities are not filtered. This work sheds light on a second unintended side-effect of communities and permissive propagation: an increase in unnecessary BGP routing messages. Due to its transitive property, a change in the community attribute induces update messages throughout established routes, just updating communities. We ground our work by characterizing the handling of updates with communities, including when filtered, on multiple real-world BGP implementations in controlled laboratory experiments. We then examine 10 years of BGP messages observed in the wild at two route collector systems. In 2020, approximately 25% of all announcements modify the community attribute, but retain the AS path of the most recent announcement; an additional 25% update neither community nor AS path. Using predictable beacon prefixes, we demonstrate that communities lead to an increase in update messages both at the tagging AS and at neighboring ASes that neither add nor filter communities. This effect is prominent for geolocation communities during path exploration: on a single day, 63% of all unique community attributes are revealed exclusively due to global withdrawals. • Networks → Network measurement; Network protocol design.

arXiv (Cornell University), Jun 1, 2016
In this study, we report on techniques and analyses that enable us to capture Internet-wide activ... more In this study, we report on techniques and analyses that enable us to capture Internet-wide activity at individual IP addresslevel granularity by relying on server logs of a large commercial content delivery network (CDN) that serves close to 3 trillion HTTP requests on a daily basis. Across the whole of 2015, these logs recorded client activity involving 1.2 billion unique IPv4 addresses, the highest ever measured, in agreement with recent estimates. Monthly client IPv4 address counts showed constant growth for years prior, but since 2014, the IPv4 count has stagnated while IPv6 counts have grown. Thus, it seems we have entered an era marked by increased complexity, one in which the sole enumeration of active IPv4 addresses is of little use to characterize recent growth of the Internet as a whole. With this observation in mind, we consider new points of view in the study of global IPv4 address activity. Our analysis shows significant churn in active IPv4 addresses: the set of active IPv4 addresses varies by as much as 25% over the course of a year. Second, by looking across the active addresses in a prefix, we are able to identify and attribute activity patterns to network restructurings, user behaviors, and, in particular, various address assignment practices. Third, by combining spatio-temporal measures of address utilization with measures of traffic volume, and sampling-based estimates of relative host counts, we present novel perspectives on worldwide IPv4 address activity, including empirical observation of under-utilization in some areas, and complete utilization, or exhaustion, in others.

IEEE Transactions on Network and Service Management, Dec 1, 2022
Big data analytics platforms have played a critical role in the unprecedented success of data-dri... more Big data analytics platforms have played a critical role in the unprecedented success of data-driven applications. However, real-time and streaming data applications, and recent legislation, e.g., GDPR in Europe, have posed constraints on exchanging and analyzing data, especially personal data, across geographic regions. To address such constraints data has to be processed and analyzed in-situ and aggregated results have to be exchanged among the different sites for further processing. This introduces additional network delays due to the geographic distribution of the sites and potentially affecting the performance of analytics platforms that are designed to operate in datacenters with low network delays. In this paper, we show that the three most popular big data analytics systems (Apache Storm, Apache Spark, and Apache Flink) fail to tolerate round-trip times more than 30 milliseconds even when the input data rate is low. The execution time of distributed big data analytics tasks degrades substantially after this threshold, and some of the systems are more sensitive than others. A closer examination and understanding of the design of these systems show that there is no winner in all wide-area settings. However, we show that it is possible to improve the performance of all these popular big data analytics systems significantly amid even transcontinental delays (where inter-node delay is more than 30 milliseconds) and achieve performance comparable to this within a datacenter for the same load.

2008 Proceedings IEEE INFOCOM - The 27th Conference on Computer Communications, Apr 1, 2008
In an n-way broadcast application each one of n overlay nodes wants to push its own distinct larg... more In an n-way broadcast application each one of n overlay nodes wants to push its own distinct large data file to all other n-1 destinations as well as download their respective data files. BitTorrent-like swarming protocols are ideal choices for handling such massive data volume transfers. The original BitTorrent targets one-to-many broadcasts of a single file to a very large number of receivers and thus, by necessity, employs an almost random overlay topology. n-way broadcast applications on the other hand, owing to their inherent n-squared nature, are realizable only in small to medium scale networks. In this paper, we show that we can leverage this scale constraint to construct optimized overlay topologies that take into consideration the end-to-end characteristics of the network and as a consequence deliver far superior performance compared to random and myopic (local) approaches. We present the Max-Min and Max-Sum peer-selection policies used by individual nodes to select their neighbors. The first one strives to maximize the available bandwidth to the slowest destination, while the second maximizes the aggregate output rate. We design a swarming protocol suitable for n-way broadcast and operate it on top of overlay graphs formed by nodes that employ Max-Min or Max-Sum policies. Using trace-driven simulation and measurements from a PlanetLab prototype implementation, we demonstrate that the performance of swarming on top of our constructed topologies is far superior to the performance of random and myopic overlays. Moreover, we show how to modify our swarming protocol to allow it to accommodate selfish nodes.
Uploads
Papers by Georgios Smaragdakis