Position Overview
We are hiring an AI Engineer to build, fine-tune, deploy, and scale large language model–based systems. The role focuses on LLM optimization, backend API development, and MLOps, including RAG pipelines, efficient model serving, and automated evaluation. You’ll work on taking LLMs from experimentation to production-ready, scalable AI solutions.
ShyftLabs is a growing data product company that was founded in early 2020 and works primarily with Fortune 500 companies. We deliver digital solutions built to help accelerate the growth of businesses in various industries, by focusing on creating value through innovation.
Job Responsibilties:
Design and implement traditional ML and LLM-based systems and applicationsOptimize model inference performance and cost efficiencyFine-tune foundation models for specific use cases and domainsImplement diverse prompt engineering strategiesBuild robust backend infrastructure for AI-powered applications Implement and maintain MLOps pipelines for AI lifecycle managementDesign and implement comprehensive traditional ML and LLM monitoring and evaluation systemsDevelop automated testing frameworks for model quality and performance tracking,
Basic Qualifications:
4–8 years of relevant experience in LLMs, Backend Engineering, and MLOps.LLM ExpertiseModel Fine-tuning: Experience with parameter-efficient fine-tuning methods (LoRA, QLoRA, adapter layers)Inference Optimization: Knowledge of quantization, pruning, caching strategies, and serving optimizationsPrompt Engineering: Prompt design, few-shot learning, chain-of-thought prompting, and retrieval-augmented generation (RAG)Model Evaluation: Experience with AI evaluation frameworks and metrics for different use casesMonitoring & Testing: Design of automated evaluation pipelines, A/B testing for models, and continuous monitoring systemsBackend EngineeringLanguages: Proficiency in Python, with experience in FastAPI, Flask, or similar frameworksAPIs: Design and implementation of RESTful APIs and real-time systemsDatabases: Experience with vector databases and traditional databasesCloud Platforms: AWS, GCP, or Azure with focus on ML servicesMLOps & InfrastructureDeployment: Experience with model serving frameworks (vLLM, SGLang, TensorRT)Containerization: Docker and Kubernetes for ML workloadsMonitoring: ML model monitoring, performance tracking, and alerting systemsEvaluation Systems: Building automated evaluation pipelines with custom metrics and benchmarksCI/CD: MLOps pipelines for automated testing, and deploymentOrchestration: Experience with workflow tools like Airflow.,
Preferred Qualifications:
LLM Frameworks: Hands-on experience with Transformers, LangChain, LlamaIndex, or similarMonitoring Platforms: Knowledge of LLM-specific monitoring tools and general ML monitoringDistributed Training and Inference: Experience with multi-GPU and distributed training and inference setupsModel Compression: Knowledge of techniques like distillation, quantization, and efficient architecturesProduction Scale: Experience deploying models handling high-throughput, low-latency requirementsResearch Background: Familiarity with recent LLM research and ability to implement novel techniquesTools & Technologies We UseFrameworks: PyTorch, Transformers, TensorFlowServing: vLLM, TensorRT-LLM, SGlang, OpenAI API, Infrastructure: Kubernetes, Docker, AWS/GCPDatabases: PostgreSQL, Redis, Vector DBsWe are proud to offer a competitive salary alongside a strong insurance package. We pride ourselves on the growth of our employees, offering extensive learning and development resources.