Senior Engineer High-Performance Computing

Bengaluru, India View on Map

Post Date: February 6, 2025

Apply Before : March 23, 2025

0 Click(s)

View(s) 0

Job Details

Job ID :2158

Experience 10
Career Level Managerial (Strategic Contributor)

Preferred Skills

AI networking Python

Job Description

We are seeking a highly skilled HPC/GPU Operations Engineer to manage, optimize, and maintain high-performance computing (HPC) infrastructure, with a focus on GPU-accelerated workloads. The ideal candidate will be responsible for ensuring the reliability, efficiency, and scalability of HPC systems used for scientific computing, AI/ML, and data-intensive applications. With 6-10 years of experience

Career Level – IC3

HPC & GPU System Management

Administer and maintain HPC clusters, GPU nodes, and high-speed interconnects.

Deploy and configure GPU-accelerated workloads for AI/ML, scientific computing, and simulations.

Monitor system performance, troubleshoot issues, and optimize resource utilization.

Software & Middleware Support

Install, configure, and maintain HPC-related software, libraries, and tools (CUDA, OpenMP, MPI, etc.).

Support containerized workflows using Docker, Singularity, or similar technologies.

Ensure compatibility of software stacks with GPU architectures (NVIDIA, AMD, Intel).

Performance Optimization & Monitoring

Tune GPU and CPU performance for specific workloads, including benchmarking and profiling.

Utilize monitoring tools (e.g., Prometheus, Grafana, Slurm, Ganglia) to track system health and efficiency.

Optimize scheduling and resource allocation in workload managers (Slurm, PBS, LSF, etc.).

Security & Compliance

Ensure system security and access control for HPC resources.

Apply software patches, firmware updates, and security best practices.

Assist in regulatory compliance for HPC environments.

User Support & Documentation

Provide support to researchers, data scientists, and engineers using HPC resources.

Develop and maintain documentation on best practices, troubleshooting, and system usage.

Conduct training sessions or workshops on HPC/GPU computing.

Required Qualifications

Technical Skills

Experience managing HPC clusters and GPU-based computing environments.

Proficiency in Linux system administration, scripting (Bash, Python), and automation (Ansible, Terraform).

Knowledge of parallel computing, GPU programming (CUDA, OpenCL), and HPC frameworks.

Familiarity with networking (Infiniband, RDMA), storage (Lustre, GPFS, NFS), and virtualization.

Application ends in 10d 6h 43min

OR apply with

An easy way to apply for this job. Use the following social media.

Related Jobs !

Application ends in 10d 6h 43min

OR apply with

An easy way to apply for this job. Use the following social media.

More Jobs From Oracle

Snr Manager Applied Science (OCI/GenAI)
Cloud Security India
Technical Architect- OCI TA
Cloud Security Bengaluru, India
Customer Success Services-Oracle SAAS – Fusion HCM Technical Analyst
Cloud Security Bengaluru, India

View all jobs