AMD
Company
AI/ML Engineer
Hyderabad, India
Job Description
WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture. We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. AI/ML Engineer : THE ROLE: We are looking for an AIOps Software Development Engineer who designs and builds intelligent systems that automate IT operations using AI/ML, big data analytics, and automation tools. The role focuses on predicting incidents, reducing downtime, automating root-cause analysis, and improving overall system reliability. KEY RESPONSIBILITIES: 1. AI/ML Engineering Build and deploy ML models for anomaly detection, event correlation, log analysis, capacity forecasting, and predictive maintenance. Develop real-time data pipelines for metrics, logs, traces, and alerts. Perform feature engineering on operational data (system metrics, logs, traces, events). 2. Software Development & Automation Design and develop automation workflows for self-healing and preventive remediation. Build microservices, APIs, and automation platforms to integrate with monitoring tools. Implement end-to-end CI/CD pipelines. 3. Monitoring & Observability Integrate with tools like Nagios, Prometheus, PowerBI, Grafana, ELK/EFK, Splunk, AppDynamics, OpenTelemetry, etc. Develop dashboards, alert systems, and visualization for operational insights. Use distributed tracing and log aggregation to support automated analysis. 4. Incident & RCA Prediction & Fix Automation Build ML-based correlation engines for RCA. Develop systems to predict incidents based on patterns in logs/metrics. Automate incident detection, ticket classification, and probable cause inference. 5. Reliability Engineering Work with SRE teams to implement automated remediation (restart services, scale resources, patch nodes, heal containers, etc.). Improve SLAs, SLOs, and MTTR using automation and ML insights. PREFERRED EXPERIENCE: Programming & Development Python (must), Java/Go (optional) Strong understanding of data structures & algorithms API development (REST, gRPC) Microservices & containerization (Docker, Kubernetes) AI/ML Skills Machine learning (supervised, unsupervised) Anomaly detection algorithms Model deployment (MLflow, SageMaker, custom APIs) Data Engineering Kafka, Spark, Flink, Kinesis, or similar streaming systems SQL/NoSQL databases DevOps & Cloud CI/CD tools: Jenkins, GitHub Actions, GitLab CI Cloud platforms: AWS, Azure, GCP IaC: Terraform, Ansible ACADEMIC CREDENTIALS: Bachelor’s degree in Computer/Software Engineering, Computer Science, or related technical discipline Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
AI/ML Engineer : THE ROLE: We are looking for an AIOps Software Development Engineer who designs and builds intelligent systems that automate IT operations using AI/ML, big data analytics, and automation tools. The role focuses on predicting incidents, reducing downtime, automating root-cause analysis, and improving overall system reliability. KEY RESPONSIBILITIES: 1. AI/ML Engineering Build and deploy ML models for anomaly detection, event correlation, log analysis, capacity forecasting, and predictive maintenance. Develop real-time data pipelines for metrics, logs, traces, and alerts. Perform feature engineering on operational data (system metrics, logs, traces, events). 2. Software Development & Automation Design and develop automation workflows for self-healing and preventive remediation. Build microservices, APIs, and automation platforms to integrate with monitoring tools. Implement end-to-end CI/CD pipelines. 3. Monitoring & Observability Integrate with tools like Nagios, Prometheus, PowerBI, Grafana, ELK/EFK, Splunk, AppDynamics, OpenTelemetry, etc. Develop dashboards, alert systems, and visualization for operational insights. Use distributed tracing and log aggregation to support automated analysis. 4. Incident & RCA Prediction & Fix Automation Build ML-based correlation engines for RCA. Develop systems to predict incidents based on patterns in logs/metrics. Automate incident detection, ticket classification, and probable cause inference. 5. Reliability Engineering Work with SRE teams to implement automated remediation (restart services, scale resources, patch nodes, heal containers, etc.). Improve SLAs, SLOs, and MTTR using automation and ML insights. PREFERRED EXPERIENCE: Programming & Development Python (must), Java/Go (optional) Strong understanding of data structures & algorithms API development (REST, gRPC) Microservices & containerization (Docker, Kubernetes) AI/ML Skills Machine learning (supervised, unsupervised) Anomaly detection algorithms Model deployment (MLflow, SageMaker, custom APIs) Data Engineering Kafka, Spark, Flink, Kinesis, or similar streaming systems SQL/NoSQL databases DevOps & Cloud CI/CD tools: Jenkins, GitHub Actions, GitLab CI Cloud platforms: AWS, Azure, GCP IaC: Terraform, Ansible ACADEMIC CREDENTIALS: Bachelor’s degree in Computer/Software Engineering, Computer Science, or related technical discipline
Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants’ needs under the respective laws throughout all stages of the recruitment and selection process.
AI/ML Engineer : THE ROLE: We are looking for an AIOps Software Development Engineer who designs and builds intelligent systems that automate IT operations using AI/ML, big data analytics, and automation tools. The role focuses on predicting incidents, reducing downtime, automating root-cause analysis, and improving overall system reliability. KEY RESPONSIBILITIES: 1. AI/ML Engineering Build and deploy ML models for anomaly detection, event correlation, log analysis, capacity forecasting, and predictive maintenance. Develop real-time data pipelines for metrics, logs, traces, and alerts. Perform feature engineering on operational data (system metrics, logs, traces, events). 2. Software Development & Automation Design and develop automation workflows for self-healing and preventive remediation. Build microservices, APIs, and automation platforms to integrate with monitoring tools. Implement end-to-end CI/CD pipelines. 3. Monitoring & Observability Integrate with tools like Nagios, Prometheus, PowerBI, Grafana, ELK/EFK, Splunk, AppDynamics, OpenTelemetry, etc. Develop dashboards, alert systems, and visualization for operational insights. Use distributed tracing and log aggregation to support automated analysis. 4. Incident & RCA Prediction & Fix Automation Build ML-based correlation engines for RCA. Develop systems to predict incidents based on patterns in logs/metrics. Automate incident detection, ticket classification, and probable cause inference. 5. Reliability Engineering Work with SRE teams to implement automated remediation (restart services, scale resources, patch nodes, heal containers, etc.). Improve SLAs, SLOs, and MTTR using automation and ML insights. PREFERRED EXPERIENCE: Programming & Development Python (must), Java/Go (optional) Strong understanding of data structures & algorithms API development (REST, gRPC) Microservices & containerization (Docker, Kubernetes) AI/ML Skills Machine learning (supervised, unsupervised) Anomaly detection algorithms Model deployment (MLflow, SageMaker, custom APIs) Data Engineering Kafka, Spark, Flink, Kinesis, or similar streaming systems SQL/NoSQL databases DevOps & Cloud CI/CD tools: Jenkins, GitHub Actions, GitLab CI Cloud platforms: AWS, Azure, GCP IaC: Terraform, Ansible ACADEMIC CREDENTIALS: Bachelor’s degree in Computer/Software Engineering, Computer Science, or related technical discipline
AMD
99 jobs posted
About the job
Similar Jobs
Discover more opportunities that match your interests
- 11 days ago
Ai Ml Engineer
Welocalize
Noida, IndiaView details - 28 days ago
Cloud Engineer - AI/ML
Paypal
Scottsdale, Arizona, United States of AmericaView details - 28 days ago
AI/ML Engineer Intern
Paypal
Shanghai, ChinaView details - 20 days ago
AI/ML Engineer Specialist
Invisible
RemoteView details - 28 days ago
AI/ML Engineer Intern
Paypal
Shanghai, ChinaView details - 29 days ago
Senior ML/AI Engineer
Yahoo
United States of AmericaView details - 6 days ago
Senior ML/AI Engineer
Yahoo
United States of AmericaView details - 21 days ago
AI Engineer
Sana
StockholmView details - 12 days ago
AI Engineer
DRW
ChicagoView details - 12 days ago
AI Engineer
DRW
LondonView details
View all ML Engineer jobs
Looking for something different?
Browse all AI jobs