Papers by jaswinder singh

Operating Systems Review, 1994
The two predominant multiprocessor communication paradigms are implicit communication through a s... more The two predominant multiprocessor communication paradigms are implicit communication through a shared address space and explicit communication via message passing. A shared address space presents programmers with a favorable programming abstraction, and hardware cache-coherent shared-address-space machines have been shown to perform well on tasks that require fine-grain communication. Message passing machines, on the other hand, perform well on tasks that require coarse-grain communication. Integrating a coarse-grain communication facility into a hardware cache-coherent shared-address-space machine offers the potential for a favorable programming abstraction and good performance over a wide range of communication grain sizes. In this type of machine, fine-grain communication is managed by the cache-coherence hardware, while coarse-grain communication is managed by a block transfer facility external to the main processor.
SPLASH: Stanford parallel applications for shared-memory
ACM Sigarch Computer Architecture News, 1992
Abstract We present the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of paral... more Abstract We present the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications a'or use in the design and evaluation of shared-memory multiproeessing systems. Our goal is to provide a suite of realistic applications that will serve as a well-...
ACM Sigarch Computer Architecture News, 1993
The distribution of resources among processors, memory and caches is a crucial question faced by ... more The distribution of resources among processors, memory and caches is a crucial question faced by designers of large-scale parallel machines. If a machine is to solve problems with a certain data set size, should it be built with a large number of processors each with a small amount of memory, or a smaller number of processors each with a large amount of memory? How much cache memory should be provided per processor for cost-effectiveness? And how do these decisions change as larger problems are NII on larger machines?
The Performance Advantages of Integrating Message Passing in Cache-Coherent Multiprocessors
Page 1. The Performance Advantages of Integrating Message Passing in Cache-Coherent Multiprocesso... more Page 1. The Performance Advantages of Integrating Message Passing in Cache-Coherent Multiprocessors Steven Cameron Woo, Jaswinder Pal Singh and John L. Hennessy Computer Systems Laboratory Stanford University Stanford, CA 94305 ...
Journal of Parallel and Distributed Computing, 1995
Hierarchical N-body methods, which are based on a fundamental insight into the nature of many phy... more Hierarchical N-body methods, which are based on a fundamental insight into the nature of many physical processes, are increasingly being used to solve large-scale problems in a variety of scientific/engineering domains. Applications that use these methods are challenging to parallelize effectively, however, owing to their nonuniform, dynamically changing characteristics and their need for long-range communication.
THE EFFECTS OF LATENCY, OCCUPANCY, AND BANDWIDTH IN DISTRIBUTED SHARED MEMORY MULTIPROCESSORS
Page 1. THE EFFECTS OF LATENCY, OCCUPANCY, AND BANDWIDTH IN DISTRIBUTED SHARED MEMORY MULTIPROCES... more Page 1. THE EFFECTS OF LATENCY, OCCUPANCY, AND BANDWIDTH IN DISTRIBUTED SHARED MEMORY MULTIPROCESSORS Chris Holt, Mark Heinrich, Jaswinder Pal Singh, Edward Rothberg, and John Hennessy Technical Report No. CSL-TR-95-660 January 1995 ...
Scaling Parallel Programs for Multiprocessors: Methodology and Examples
IEEE Computer, 1993
Page 1. Scaling Parallel Programs for Multiprocessors: Methodology and Examples Jaswinder Pal Sin... more Page 1. Scaling Parallel Programs for Multiprocessors: Methodology and Examples Jaswinder Pal Singh, John L. Hennessy, and Anoop Gupta Stanford University This approach scales all relevant parameters under considerations imposed by the application domain. ...

Operating Systems Review, 1994
The two predominant multiprocessor communication paradigms are implicit communication through a s... more The two predominant multiprocessor communication paradigms are implicit communication through a shared address space and explicit communication via message passing. A shared address space presents programmers with a favorable programming abstraction, and hardware cache-coherent shared-address-space machines have been shown to perform well on tasks that require fine-grain communication. Message passing machines, on the other hand, perform well on tasks that require coarse-grain communication. Integrating a coarse-grain communication facility into a hardware cache-coherent shared-address-space machine offers the potential for a favorable programming abstraction and good performance over a wide range of communication grain sizes. In this type of machine, fine-grain communication is managed by the cache-coherence hardware, while coarse-grain communication is managed by a block transfer facility external to the main processor.

Sigplan Notices, 1994
Several multiprocessors have been proposed that offer programmable implementations of scalable ca... more Several multiprocessors have been proposed that offer programmable implementations of scalable cache coherence as well as support for message passing. In the FLASH machine, flexibility is obtained by the use of a programmable node controller, called MAGIC, through which all transactions in a node pass. We use the actual code sequences that implement the cache coherence protocol, together with a detailed simulator of the MAGIC design, to evaluate the performance costs of flexibility. We compare the performance of FLASH to an idealized hardwired machine on representative applications. In many cases, the overhead of the programmable protocol can be hidden behind the memory access time. When the miss rates are low, the performance differences between the ideal machine and FLASH are small. At high miss rates, performance is not good for either machine, though the increased remote access latencies and the contention within MAGIC can lead to larger performance losses for the flexible design. The results of our initial investigations point to a number of improvements that could be made to increase robustness in a flexible design such as FLASH.
Parallelizing the simulation of ocean eddy currents

IEEE Computer, 1994
Shared-address-space multiprocessors are effective vehicles for speeding up visualization and ima... more Shared-address-space multiprocessors are effective vehicles for speeding up visualization and image synthesis algorithms. This article demonstrates excellent parallel speedups on some well-known sequential algorithms. S everal recent algorithms have substantially sped up complex and timeconsuming visualization tasks. In particular, novel algorithms for radiosity computation' and volume r e n d e r i r~g~.~ have demonstrated performance far superior to earlier methods. Despite these advances, visualization of complex scenes or data sets remains computationally expensive. Rendering a 256 x 256 x 256-voxel volume data set takes about 5 seconds per frame on a 100-MHz Silicon Graphics Indigo workstation using Levoy's ray-casting algorithm2 and about a second per frame using a new shear-warp algorithm.' These times are much larger than the 0.03 second per frame required for real-time rendering or the 0.1 second per frame required for interactive rendering. Realistic radiosity and ray-tracing computations are much more time-consuming.
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors
Page 1. An Empirical Comparison of the Kendall Square Research KSR-1 and Stanford DASH Multiproce... more Page 1. An Empirical Comparison of the Kendall Square Research KSR-1 and Stanford DASH Multiprocessors Jaswinder Pal Singh, Truman Joe, Anoop Gupta and John L. Hennessy Computer System Laboratory Stanford University ...
Uploads
Papers by jaswinder singh