580 California St., Suite 400
San Francisco, CA, 94104
Figure 1 The NCCD Data Science Platform includes several core components from Hadoop for data storage, ingestion, and analysis. Other core infrastructure components, such as Apache Zookeeper and Ambari, are also used to support the platform. A high-performance cluster and GPU-enabled nodes for data analysis were integrated into the platform through installation of the Hadoop client applications.