Overview of computing environment

In addition to desktop computing for faculty and students, the Department of Biostatistics hosts a high performance Linux computing facility. Known as the High Performance Scientific Computing Center (HPSCC), it is housed in the Department of Biostatistics and is also used by the Departments of Molecular Microbiology and Immunology (MMI), Epidemiology, and others.   The HPSCC's mission is to create a high-performance computing environment to support research and teaching in biostatistics, statistical genetics, computational biology and bioinformatics.

Current HPSCC resources include a high-performance Linux computing cluster housed in a dedicated 250 sq. ft. server room adjacent to the Department of Biostatistics' Collaboration and Computing Lab (aka the Genome Cafe).  The cluster is comprised of  a 38-node (84-cpu-cores) 64-bit AMD Opteron-based computing cluster, with 282GB total DDR-SDRAM capable of running at approximately 500 billion theoretical operations per second*.

Cluster Compute Node Characteristics

Quantity

CPU
Cores

Memory
(GB)

Disk 1

(GB)

Disk 2

(GB)

15

2

2

80

 

7

2

4

80

 

8

2

8

80

 

2

2

16

80

 

2

2

16

80

500

4

4

32

750

750

A dedicated 4-core AMD Opteron-based server with 8GB memory and mirrored 73GB disks serves as the head node for the cluster.  Cluster job submission, scheduling, execution, and accounting is under the control of Sun Microsystems's Grid Engine software (SGE).  The Grid Engine project, sponsored by Sun Microsystems, is an open source community effort to facilitate the adoption of distributed computing solutions.

Users access the cluster via another 4-cpu Opteron-based login/development server with 16GB memory and 2x148GB mirrored local disk storage.  Additional servers provide specialized functions including dedicated database access, NFS shares, and tape backup.

The heart of the system is a Global File System (GFS) that provides access to > 5 TB of disk storage over a fiber-channel storage array network (SAN).  GFS allows users to access their home directories from any of the servers that are currently attached to the SAN.  From each of the cluster compute nodes, access to home directories is provided seamlessly via NFS connections to the GFS servers.   Scheduled nightly and weekly tape backups of users' home directories are performed and backup tape sets are cycled to an off-site facility weekly.  Additionally, users have access to a 7.5 TB network attached storage (NAS) disk array for static storage.  Major statistical and mathematical computing packages are available, including R, SAS, STATA, Matlab, and Mathematica.

For security reasons, the system is situated behind the School of Public Health firewall.  The cluster is accessible from outside the firewall via a secure shell (SSH) cut-through on the login/development server. Outside of the School's firewall, the HPSCC maintains web servers for serving departmental pages and hosting individual faculty projects.

In order to maintain state-of-the-art facilities, upgrades are included in the HPSCC operating budget; operating costs are shared by all users.  More than 100 active users access the login/development server and/or the cluster.     Users' research areas include microarray analysis, gene sequencing, Bayesian model selection, analysis of high throughput genotype data in large population based cohorts, and statistical methods for environmental epidemiology. 

Marvin Newhouse and Jiong Yang, with combined experience in Linux, Windows, Macintosh, and Solaris, support the faculty and students at the departmental level.   Together with other key faculty and students they offer computer support through an email help system BITSUPPORT ( bitsupport at jhsph.edu ) for cluster and systems issues and BITHELP ( bithelp at jhsph.edu ) for applications issues.   The School's Information Systems group provides support for school-wide technology issues.

HPSCC Scientific Director:  Fernando J. Pineda
Technology Director/Manager:
  Marvin Newhouse
Systems Engineer:
  Jiong Yang
Administrative Assistant:
  Cindy Hockett

*Composite Theoretical Performance (CTP) calculations ("Calculations") for AMD Opteron microprocessors. Official calculations stated by AMD are in Millions of Theoretical Operations Per Second (MTOPS) and are based upon a formula in the United States Department of Commerce Export Administration Regulations 15 CFR 774 (Advisory Note 4 for Category 4).