ASRC Federal Holding Company HPC (High Performance Computing) System Administrator in Greenbelt, Maryland
Position : HPC (High Performance Computing) System Administrator
Location : Greenbelt, MD
Title: HPC System Administrator
Location: Greenbelt, MD
ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are committed to a culture of continuous, standards-driven process improvement and assimilation of industry best practices. We are seeking an HPC System Administrator for our NASA NACS High Performance Computing contract.
This position is a member of an HPC Support team focusing on storage hardware and software for two supercomputing clusters. You will specialize in both the monitoring and management of storage systems and storage-related network management for a large supercomputer.
Duties and Responsibilities :
Hardware testing and daily maintenance/monitoring, LUN configuration and presentation with various controller OS's, filesystem and cluster management with GPFS)
Monitor and maintain Discover's storage hardware (spinning disk and NVMe-based) and backend storage network (Fibre Channel)
Monitor and maintain Discover's GPFS cluster, including all 3700 clients and 60 NSD servers (plus managers and quorum nodes)
Monitor and maintain Discover's 3 high-speed interconnect fabrics (2 FDR InfiniBand and 1 Omni-Path OPA100 fabric, including cables, switches, firmware, and software-level such as the SM's)
Address user tickets and resolve issues in various cluster areas
Attend meetings with high-priority user groups to keep open channels of communication and address concerns they may have
Maintain test and development system to keep it consistent with the production cluster
Consult the customer on new cluster hardware purchases (both storage and compute)
Assist with benchmarking new products (storage systems and switches) that will potentially be used in production
Test and verify hardware such as storage and high-speed fabrics to validate it for production
Bachelor's degree in Computer Science, Management Information Systems or other technical discipline plus 3 years of relevant work experience or equivalent
Experience with HPC parallel filesystems (e.g., GPFS, Lustre)
Experience with storage systems (data/metadata/IO server configurations in GPFS, spinning disk, SSD, and NVMe)
Experience with high-speed interconnect networking (e.g., InfiniBand, Omni-Path, Fibre Channel) - cabling, cards, switches, OFED/MOFED, etc.
Working knowledge of scripting and programming languages such as C, C++, Fortran Bash, CSH, TSCH, Perl, Python, Ruby.
Good organization skills to balance and prioritize work, and ability to multitask
Good communication skills to communicate with support personnel, customer, and managers.
US citizenship and the ability to obtain a Public Trust security clearance are mandatory requirements for this position
ASRC Federal and its Subsidiaries are Equal Opportunity / Affirmative Action employers. All qualified applicants will receive consideration for employment without regard to race, gender, color, age, sexual orientation, gender identification, national origin, religion, marital status, ancestry, citizenship, disability, protected veteran status, or any other factor prohibited by applicable law .