Infrastructure Engineer – 2022386

The Pittsburgh Supercomputing Center (PSC) is a joint research center of Carnegie Mellon University and the University of Pittsburgh. Established in 1986, for nearly 40 years we’ve provided university, government, and industrial researchers access to several of the most powerful computational systems in the country for unclassified research. The work done at PSC advances science across a wide variety of fields, including artificial intelligence, medical imaging, weather modeling, cell biology, and genomics.

PSC is seeking an experienced Infrastructure Engineer to join our new Advanced Systems Infrastructure Team. The focus of this group is the maintenance and development of PSC’s shared infrastructure. We work with the other operations and development teams to build and support the systems needed to keep our advanced clusters, and the center as a whole, running smoothly. The ideal candidate will have a versatile skill set and a willingness to pick up new things, but is not expected to be a day-one expert on anything specific.

Core responsibilities include:

Routine maintenance and system administration of Linux servers
Codification of system configuration into our configuration management system
Participating in on-call rotations
Deployment and maintenance of web applications
Care and feeding of VM hosting environments
Development, testing, and deployment of new technologies as needed
Writing and maintaining internal documentation
Working with the cluster operations team, as well as other groups at PSC, to support their infrastructure-related needs.
Other duties as assigned.

Flexibility, excellence, and passion are vital qualities within PSC. Inclusion, collaboration, and cultural sensitivity are valued competencies at CMU. Therefore, we are in search of a team member who is able to effectively interact with a varied population of internal and external partners at a high level of integrity. We are looking for someone who shares our values and who will support the mission of the university through their work.

You should demonstrate:

Attention to detail
Do not consider a job complete until everything is cleaned up, squared away, and documented.
Enjoy troubleshooting interesting problems, where you can’t just Google the answer.
Communicating technical concepts clearly.
Willingness to teach and learn.
Some experience with the following technologies: Python, or other programming languages in a similar niche, such as Ruby or Go; Bash, Kubernetes, Prometheus, or other monitoring systems; Puppet, or other configuration management systems; DNS, LDAP, HTTP, other common networking protocols, Git, CI/CD systems, SQL, Docker, or other container systems such as Podman.

Qualifications:

Bachelors Degree
At least 2 years of professional experience in the field. Prior experience in HPC environments is a plus, but not required. People with a background in DevOps, Site Reliability Engineering, or related areas are especially encouraged to apply.
A combination of education and relevant experience from which comparable knowledge is demonstrated may be considered.

Requirements:

Successful background check

Additional Information:

This is a full–time (37.5 of hours), exempt position

Work Posture: This position is operating on a hybrid schedule, with an on-campus/in-office presence 3 days a week.

Are you interested in this exciting opportunity? Please apply