Principal/Sr Staff Engineer – AI Infrastructure Management (AIMS), UI, Orchestration & Lifecycle Management - Riyadh, KSA
📍 Job Overview
Job Title: Principal/Sr Staff Engineer – AI Infrastructure Management (AIMS), UI, Orchestration & Lifecycle Management
Company: Qualcomm
Location: Riyadh, KSA
Job Type: FULL_TIME
Category: Engineering / Software Engineering / AI Infrastructure Management
Date Posted: 2026-05-15T00:00:00
Experience Level: 10+ Years (Principal/Sr Staff)
Remote Status: On-site
🚀 Role Summary
-
Lead the technical development and strategic direction of the AI Infrastructure Management Suite (AIMS), a critical software platform for large-scale AI data center deployments.
-
Drive the design, implementation, and operationalization of core AIMS components, encompassing operator-facing UI, robust orchestration services, and comprehensive lifecycle management workflows.
-
Spearhead the development of reliable, scalable, and observable control-plane services essential for managing AI infrastructure from rack to fleet levels.
-
Foster cross-functional collaboration with UI, platform, firmware, and hardware teams to ensure seamless integration and cohesive system behavior across the entire AI stack, from silicon to software.
📝 Enhancement Note: This role is positioned at a Principal/Senior Staff Engineer level, indicating a strong emphasis on technical leadership, architectural design, and hands-on problem-solving within a complex AI infrastructure environment. The focus on AIMS suggests a need for deep understanding of data center operations, cloud-native technologies, and the specific demands of AI/ML workloads.
📈 Primary Responsibilities
-
Design, implement, and take ownership of major components within the AIMS platform, including operator-facing User Interfaces (UI), orchestration services, and lifecycle management workflows.
-
Drive technical execution for all operational phases (Day-0 through Day-2), covering infrastructure provisioning, configuration management, software upgrades, system expansion, and decommissioning of AI infrastructure.
-
Architect and build reliable, observable, and scalable control-plane services responsible for managing racks, clusters, and entire fleets of AI systems.
-
Collaborate closely with UI, platform, firmware, and hardware engineering teams to ensure cohesive system behavior and integration from the silicon level through to the software layer.
-
Actively participate in and lead design reviews and code reviews, contributing to debugging efforts and performance analysis of complex distributed systems.
-
Serve as a technical mentor for other senior and staff engineers, elevating the team's overall engineering capabilities through guidance, best practices, and exemplary technical execution.
-
Develop and maintain operator-facing web applications and dashboards leveraging Node.js, focusing on intuitive workflows for monitoring, inventory management, telemetry visualization, and lifecycle operations.
-
Implement orchestration services for resource abstraction, scheduling, and workload lifecycle management tailored for AI workloads, ensuring clean integration with underlying rack and node management systems.
-
Utilize Infrastructure as Code (IaC) principles with Terraform and configuration management with Ansible for automated provisioning, configuration, and updates of AI infrastructure.
-
Design and implement safe rollout, upgrade, rollback, and recovery mechanisms suitable for mission-critical, fleet-scale AI deployments.
-
Develop Python-based automation scripts and services to support complex lifecycle workflows and enhance operational tooling.
📝 Enhancement Note: The responsibilities highlight a blend of strategic architectural input and deep, hands-on execution across multiple technology domains, including UI development, distributed systems (Kubernetes), IaC (Terraform), and configuration management (Ansible). The emphasis on "operator-facing" and "Day-0 through Day-2 operations" points to a role deeply embedded in the practicalities of managing and automating complex data center infrastructure.
🎓 Skills & Qualifications
Education:
-
Bachelor's degree in Engineering, Information Systems, Computer Science, or a related field, coupled with 8+ years of Software Engineering or equivalent experience.
-
Master's degree in Engineering, Information Systems, Computer Science, or a related field, coupled with 7+ years of Software Engineering or equivalent experience.
Experience:
- Demonstrated experience in building and operating large-scale infrastructure or platform software within production environments.
Required Skills:
-
Strong proficiency in Node.js for backend development and building infrastructure-centric UI systems.
-
Hands-on experience with Terraform for Infrastructure as Code (IaC) and managing complex cloud or data center environments.
-
Expertise in Ansible for configuration management, automation, and deployment orchestration.
-
Solid understanding and practical experience with Kubernetes and Docker for deploying, scaling, and managing distributed systems.
-
Strong Python programming skills, essential for developing orchestration services, automation scripts, and operational tooling.
-
Experience with C++ for systems-level programming or performance-critical components.
-
Proficiency in designing, implementing, and operating distributed systems.
-
Experience in UI development, specifically for operator-facing dashboards and management interfaces.
-
Understanding of control-plane design principles for managing complex infrastructure.
Preferred Skills:
-
Experience with telemetry, monitoring, and observability tools for distributed systems.
-
Knowledge of hardware-level interactions or firmware management in a data center context.
-
Familiarity with cloud computing platforms and services.
-
Experience with large-scale AI/ML model training and deployment infrastructure.
-
Proven ability to mentor and guide other engineers.
📝 Enhancement Note: The experience requirements are tiered based on educational attainment, reflecting a typical structure for senior engineering roles. The "4+ years of work experience with Programming Language such as C, C++, Java, Python, etc." is a baseline that is exceeded by the overall experience level expectation (Principal/Sr Staff). The presence of both C++ and Python indicates a need for both systems-level performance optimization and higher-level automation/scripting capabilities.
📊 Process & Systems Portfolio Requirements
Portfolio Essentials:
-
Demonstrate successful implementation of Infrastructure as Code (IaC) using Terraform, showcasing the automation of complex environment provisioning and management.
-
Provide case studies of automated configuration management and deployment orchestration using Ansible, highlighting efficiency gains and consistency improvements.
-
Showcase experience with Kubernetes deployments, including complex application packaging, scaling, and lifecycle management within containerized environments.
-
Present examples of building and managing operator-facing UIs or dashboards that improve operational visibility and simplify complex tasks.
Process Documentation:
-
Documented workflows for automated infrastructure provisioning, configuration, and updates using tools like Terraform and Ansible.
-
Examples of implemented lifecycle automation for software deployments, including safe upgrade, rollback, and recovery mechanisms.
-
Evidence of developing processes for monitoring, inventory management, and telemetry visualization within complex infrastructure environments.
-
Demonstrated process optimization for Day-0 through Day-2 operations, reducing manual intervention and improving operational efficiency.
-
Examples of designing and documenting safe rollout, expansion, and decommissioning procedures for large-scale systems.
📝 Enhancement Note: For a Principal/Sr Staff Engineer role focused on infrastructure management and automation, a portfolio should strongly emphasize hands-on experience with IaC, configuration management, container orchestration, and the development of operational tooling. Candidates should be prepared to discuss the architectural decisions, trade-offs, and impact of their work in these areas, particularly concerning scalability, reliability, and operational efficiency.
💵 Compensation & Benefits
Salary Range:
- Given the Principal/Sr Staff Engineer title, the location in Riyadh, KSA, and the specific technical expertise required (AI Infrastructure, Orchestration, Lifecycle Management), a competitive compensation package is expected. Based on industry benchmarks for similar senior engineering roles in major technology hubs, a base salary range of SAR 40,000 - SAR 70,000+ per month (approximately USD 10,600 - 18,600+ per month) is a reasonable estimate. This would translate to an annual base salary of SAR 480,000 - 840,000+ (approximately USD 128,000 - 224,000+ annually).
Benefits:
-
Financial:
- Competitive base salary inclusive of housing and transport allowances.
- Stock (RSUs) and performance-related bonuses, reflecting company and individual performance.
- Employee Stock Purchase Scheme (ESPP) allowing employees to purchase Qualcomm stock at a discount.
- Child Education Allowance to support family expenses.
-
Work-Life Balance & Family:
- Generous paid leave: 16 weeks fully paid Maternity Leave and 6 weeks fully paid Paternity Leave.
- Relocation and immigration support for candidates requiring it.
-
Health & Wellness:
- Comprehensive Life and Medical Insurance coverage.
Working Hours:
- A standard full-time work week is typically 40 hours. However, given the nature of senior engineering roles and critical infrastructure management, flexibility may be required to address urgent operational needs or project deadlines. The role is on-site, suggesting adherence to local business hours with potential for on-call responsibilities.
📝 Enhancement Note: The salary estimate is based on high-level industry data for senior engineering roles in technology hubs and the specified location. It's crucial for candidates to understand that this is an estimate, and the actual offer will depend on Qualcomm's internal compensation structure, the candidate's specific experience and qualifications, and prevailing market rates in Riyadh. The benefits package is quite comprehensive, particularly the generous parental leave and the "Live+ Well" program.
🎯 Team & Company Context
🏢 Company Culture
Industry: Semiconductors, Wireless Technology, AI, Data Center Infrastructure
Company Size: Qualcomm is a large, multinational corporation with tens of thousands of employees globally. Its significant presence in Riyadh indicates a strategic investment in AI and data center capabilities within Saudi Arabia, aligning with the Kingdom's Vision 2030 digital transformation initiatives.
Founded: 1985. Qualcomm has a long history of innovation in mobile technology and is now a major player in areas like AI, automotive, and IoT.
Team Structure:
-
The AI Infrastructure Management (AIMS) team is likely a specialized engineering group focused on building and operating the software platform that underpins Qualcomm's large-scale AI data center deployments.
-
This role reports into a senior engineering leadership position, likely a Director or VP of Engineering managing AI infrastructure or data center operations.
Methodology:
-
The team operates with a strong emphasis on Software Engineering best practices, including Agile methodologies, CI/CD pipelines, rigorous code and design reviews, and a data-driven approach to system design and performance analysis.
-
A focus on Infrastructure as Code (IaC) and extensive automation is central to managing complex, large-scale deployments efficiently and reliably.
-
Emphasis on observability, reliability engineering, and robust lifecycle management for software and infrastructure components.
Company Website: https://www.qualcomm.com/
📝 Enhancement Note: Qualcomm's expansion into Riyadh signifies a commitment to building cutting-edge AI and data center capabilities locally. The culture will likely blend a global, established tech company's rigor with the dynamism of a growing regional hub focused on future technologies. The emphasis on collaboration and technical excellence is a hallmark of such organizations.
📈 Career & Growth Analysis
Operations Career Level: Principal/Sr Staff Engineer. This level signifies a highly experienced individual contributor who is expected to provide deep technical expertise, architectural leadership, and mentor other senior engineers. They are often responsible for defining technical strategy in their domain and driving complex, critical projects to completion.
Reporting Structure: This role will likely report to a Director or VP of Engineering within the AI Infrastructure or Data Center Operations division. The engineer will collaborate closely with product managers and architects, and will be expected to lead technical initiatives and mentor other engineers on the team.
Operations Impact: The AIMS platform is fundamental to Qualcomm's ability to deploy and manage AI infrastructure at scale. Success in this role directly impacts the efficiency, reliability, and cost-effectiveness of these critical deployments, which in turn supports Qualcomm's broader AI, cloud, and advanced connectivity strategies. This role has a direct impact on Qualcomm's ability to deliver AI-powered solutions and services.
Growth Opportunities:
-
Technical Specialization: Deepen expertise in AI infrastructure, distributed systems, cloud-native technologies, and large-scale data center operations, potentially leading to Principal/Distinguished Engineer roles.
-
Leadership Development: Transition into management roles (e.g., Engineering Manager, Director of Engineering) by leveraging mentoring and technical leadership experience gained at this level.
-
Architectural Influence: Play a key role in defining the future architecture and technology roadmap for AI infrastructure management at Qualcomm globally.
-
Cross-Functional Exposure: Gain broad exposure to hardware, firmware, software, and product development across Qualcomm's diverse business units.
📝 Enhancement Note: The Principal/Sr Staff Engineer designation implies a high degree of autonomy and influence. Growth opportunities are likely focused on continued technical mastery, architectural leadership, or a transition into formal management roles, supported by a large, globally recognized technology company.
🌐 Work Environment
Office Type: This is an on-site role in Riyadh, KSA, within a modern office environment established by Qualcomm to support its growing regional operations, likely including dedicated R&D and engineering facilities.
Office Location(s): Riyadh, Saudi Arabia. Specific office details would be confirmed during the interview process, but it's expected to be in a professional business district.
Workspace Context:
-
The workspace will foster a collaborative environment, encouraging interaction with a diverse team of engineers and product professionals.
-
Access to state-of-the-art development tools, high-performance computing resources, and potentially direct interaction with data center infrastructure will be available.
Work Schedule:
- Standard full-time work hours (approximately 40 hours per week) are expected, aligned with local business practices in Riyadh. However, the nature of infrastructure management may necessitate flexibility for on-call duties or addressing critical operational issues outside of standard hours to ensure the continuous operation of AI data centers.
📝 Enhancement Note: The on-site requirement in Riyadh suggests a dynamic and growing regional technology hub environment. Candidates should expect a professional, collaborative setting with access to modern tools and potentially direct engagement with critical infrastructure.
📄 Application & Portfolio Review Process
Interview Process:
-
Initial Screening: HR or recruiter call to assess basic qualifications, interest, and cultural fit.
-
Technical Phone Screen: An interview with a senior engineer or manager focusing on core technical skills, experience with key technologies (Node.js, Terraform, Kubernetes, Python), and general problem-solving abilities.
-
On-site/Virtual Loop (Multiple Rounds):
- System Design Interview: Focus on designing complex distributed systems, specifically AI infrastructure management components, orchestration, or lifecycle automation. Expect questions about scalability, reliability, fault tolerance, and observability.
- Coding/Problem-Solving Interviews: Hands-on coding exercises, likely in Python or Node.js, to assess algorithmic thinking, data structure knowledge, and clean code implementation. May involve debugging scenarios or implementing specific functionalities.
- Behavioral/Leadership Interview: Assess leadership potential, mentorship capabilities, conflict resolution, and alignment with Qualcomm's values and culture. This is where your experience as a Principal/Sr Staff engineer will be evaluated.
- Domain-Specific Interview: May delve deeper into AI infrastructure specifics, data center operations, or detailed aspects of UI, orchestration, or lifecycle management.
-
Final Round/Executive Interview: A discussion with senior leadership (e.g., Director/VP) to finalize assessment of technical leadership, strategic thinking, and overall fit.
Portfolio Review Tips:
-
Focus on Impact: For each project presented, clearly articulate the problem you solved, the technical approach you took, your specific contributions, and the quantifiable impact (e.g., efficiency gains, cost savings, reduced downtime, improved developer productivity).
-
Showcase IaC and Automation: Highlight projects demonstrating robust use of Terraform and Ansible. Be prepared to discuss state management, module design, idempotency, and complex deployment strategies.
-
Kubernetes Expertise: Present examples of deploying and managing complex applications on Kubernetes. Discuss challenges related to scaling, service discovery, networking, and stateful workloads.
-
UI/Operator Experience: If applicable, show examples of operator-facing dashboards or tools you've developed, emphasizing usability and how they streamlined operational tasks.
-
Code Quality & Architecture: Be ready to discuss architectural decisions, trade-offs, and best practices in your code. If possible, share anonymized code snippets or diagrams that illustrate your design patterns.
-
Mentorship and Leadership: Provide examples of how you've mentored junior engineers, led technical discussions, or influenced technical direction on past projects.
Challenge Preparation:
-
System Design: Practice designing scalable, fault-tolerant systems for tasks like fleet management, resource orchestration, or automated provisioning. Think about APIs, data models, failure modes, and monitoring strategies.
-
Coding: Sharpen your skills in Python and Node.js. Be prepared for typical LeetCode-style problems, but also for practical coding challenges related to API integration, data processing, or automation scripts.
-
Terraform/Ansible: Review best practices for writing reusable modules, managing complex state, and implementing secure and efficient deployments.
-
Kubernetes: Understand core concepts like Pods, Deployments, Services, StatefulSets, Operators, and Helm charts. Be prepared to discuss deployment strategies and troubleshooting.
📝 Enhancement Note: The interview process for a Principal/Sr Staff Engineer role is rigorous and multi-faceted, designed to assess not only technical depth but also leadership, strategic thinking, and communication skills. A well-curated portfolio that demonstrates tangible impact and technical expertise in the required areas is crucial.
🛠 Tools & Technology Stack
Primary Tools:
-
Node.js: For backend services and operator-facing UI development. Candidates should be proficient in its ecosystem and best practices.
-
Python: For orchestration services, automation tooling, and scripting. Expertise in Python for systems programming and automation is key.
-
Terraform: The primary tool for Infrastructure as Code (IaC), used for provisioning and managing AI infrastructure declaratively.
-
Ansible: For configuration management, application deployment, and automating operational tasks across servers and infrastructure.
-
Kubernetes: The core platform for container orchestration, managing the deployment, scaling, and operation of services.
-
Docker: For containerization, packaging applications and their dependencies.
Analytics & Reporting:
- Tools for monitoring and telemetry visualization (specifics not provided, but common in this space include Prometheus, Grafana, ELK stack, Datadog).
CRM & Automation:
-
While not a direct CRM role, the AIMS platform acts as a system of record and control for infrastructure. Automation tools and workflow engines are central.
-
Integration tools and methodologies for connecting different components of the AIMS platform and with underlying hardware/firmware systems.
📝 Enhancement Note: This role requires deep expertise in a modern cloud-native and DevOps-oriented technology stack. Proficiency in Node.js, Python, Terraform, Ansible, and Kubernetes is non-negotiable and will be heavily scrutinized.
👥 Team Culture & Values
Operations Values:
-
Technical Excellence: A commitment to high-quality code, robust architecture, and deep understanding of underlying systems.
-
Automation First: A strong belief in automating all repetitive tasks and managing infrastructure through code.
-
Reliability & Scalability: Designing and building systems that are resilient, performant, and can scale to meet demanding requirements.
-
Collaboration & Mentorship: A culture of teamwork, knowledge sharing, and actively supporting the growth of colleagues.
-
Data-Driven Decision Making: Utilizing metrics and telemetry to inform design choices, troubleshoot issues, and measure operational impact.
-
Customer Focus (Internal & External): Understanding the needs of operators and end-users, and delivering solutions that meet those needs effectively.
Collaboration Style:
-
Cross-functional Integration: Active engagement with hardware, firmware, software, and product teams to ensure a cohesive and integrated AI infrastructure solution.
-
Open Communication: Encouraging transparent discussions, constructive feedback during code and design reviews, and proactive problem-solving.
-
Agile Execution: Working in iterative cycles, adapting to new requirements, and maintaining a focus on delivering value incrementally.
-
Knowledge Sharing: Actively participating in technical discussions, documentation, and internal presentations to disseminate expertise.
📝 Enhancement Note: Qualcomm, as a global tech leader, likely fosters a culture of innovation, technical rigor, and collaborative problem-solving. The emphasis on AI infrastructure in Riyadh suggests a forward-looking and dynamic environment, aligning with Saudi Arabia's Vision 2030.
⚡ Challenges & Growth Opportunities
Challenges:
-
Complexity of Scale: Managing and orchestrating large-scale AI data center deployments presents significant technical challenges related to performance, reliability, and distributed systems coordination.
-
Integration Across Domains: Ensuring seamless integration between UI, orchestration, lifecycle management, and underlying hardware/firmware requires meticulous design and execution.
-
Rapid Evolution of AI: The fast-paced nature of AI technology necessitates continuous learning and adaptation of the infrastructure management platform to support new hardware, software frameworks, and workloads.
-
Balancing Innovation and Stability: Implementing new features and improvements while maintaining the stability and reliability of critical production infrastructure.
-
Mentoring and Technical Leadership: Effectively guiding and uplifting a team of senior engineers while also managing personal technical contributions.
Learning & Development Opportunities:
-
Deep Dive into AI Infrastructure: Gain unparalleled expertise in the specialized domain of managing AI hardware and software at scale.
-
Advanced Cloud-Native Technologies: Further develop skills in Kubernetes, distributed systems, and modern infrastructure automation patterns.
-
Architectural Leadership: Opportunities to define and influence the technical direction of critical infrastructure management platforms.
-
Industry Exposure: Working at the forefront of AI and data center technology in a rapidly developing region like Saudi Arabia.
-
Professional Development: Qualcomm typically offers resources for continuous learning, including access to training platforms, conferences, and internal knowledge-sharing sessions.
📝 Enhancement Note: This role offers the opportunity to tackle significant engineering challenges at the cutting edge of AI infrastructure. The growth potential is substantial, both in terms of technical depth and leadership influence within a major technology company.
💡 Interview Preparation
Strategy Questions:
-
System Design: "Design an orchestration system for managing AI model training jobs across a fleet of GPU servers. How would you handle resource allocation, scheduling, fault tolerance, and monitoring?"
-
Orchestration & Lifecycle: "Describe your approach to implementing safe, automated rolling upgrades for a complex, multi-component distributed system like AIMS. What mechanisms would you employ for rollback and recovery?"
-
UI & Operator Experience: "Imagine you need to build a dashboard for operators to monitor the health and status of hundreds of AI racks. What key information would you prioritize, and how would you design the UI for clarity and efficiency?"
-
Automation & IaC: "Walk me through a complex infrastructure deployment you've automated using Terraform and Ansible. What were the key challenges, and how did you ensure idempotency and reliability?"
-
Problem Solving: "A production AI cluster is experiencing intermittent performance degradation. As the owner of the AIMS platform, what steps would you take to diagnose and resolve the issue, considering both software and potential hardware factors?"
Company & Culture Questions:
-
"Based on your understanding of Qualcomm's expansion in Riyadh and Vision 2030, how do you see the AIMS platform contributing to these strategic goals?"
-
"Describe a situation where you had to influence stakeholders with differing technical opinions. How did you navigate that and reach a consensus?"
-
"How do you approach mentoring junior or peer engineers? Can you provide an example?"
Portfolio Presentation Strategy:
-
Structure: For each project, use a STAR (Situation, Task, Action, Result) or similar framework. Clearly define the problem, your role, the specific actions you took (highlighting technologies used), and the measurable outcomes.
-
Visuals: Use diagrams (architecture, workflow), code snippets (well-commented), and metrics dashboards to illustrate your points effectively.
-
Focus on IaC & Automation: Explicitly detail your experience with Terraform and Ansible, showcasing reusable modules, complex state management, and robust automation workflows.
-
Quantify Impact: Whenever possible, present data on efficiency improvements, cost savings, reduction in deployment time, or increase in system uptime.
-
Technical Depth: Be prepared to dive deep into the technical details of your projects, discussing architectural choices, trade-offs, and challenges overcome.
📝 Enhancement Note: Candidates should prepare to demonstrate not only strong technical skills in the specified tools and concepts but also the ability to think strategically, lead technically, and communicate complex ideas clearly, especially concerning the operationalization and management of AI infrastructure.
📌 Application Steps
To apply for this operations position:
-
Submit your application through the Qualcomm Careers portal via the provided URL.
-
Portfolio Customization: Curate your resume and portfolio to prominently feature projects showcasing your expertise in Node.js, Python, Terraform, Ansible, Kubernetes, and distributed systems management, especially within infrastructure or platform contexts. Prepare specific case studies demonstrating your contributions to AI infrastructure, orchestration, or lifecycle management.
-
Resume Optimization: Tailor your resume to highlight achievements related to infrastructure automation, system design for scale and reliability, operator UI development, and technical leadership/mentorship. Use keywords from the job description naturally.
-
Interview Preparation: Practice answering system design, coding, and behavioral questions. Be ready to present your portfolio projects with a focus on impact and technical depth. Research Qualcomm's recent AI initiatives and Saudi Arabia's Vision 2030 to frame your contributions.
-
Company Research: Understand Qualcomm's business, its role in the semiconductor and AI industries, and its strategic expansion in Riyadh. Familiarize yourself with their values and engineering culture to articulate your alignment.
⚠️ Important Notice: This enhanced job description includes AI-generated insights and operations industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Requires a degree in Engineering or Computer Science with 6-8+ years of experience depending on the degree level. Must have 4+ years of experience in programming languages like Python, C++, or Java, along with expertise in Node.js, Terraform, and Kubernetes.