Senior Quantization Engineer - Edge AI

NXP Semiconductors


Date: 3 weeks ago
City: Hyderabad
Contract type: Full time

Senior Quantization Engineer - Edge AI Model Optimization


We at NXP have an environment that fosters innovation. Our team has technology experts who understand the big picture and mentors who coach passionate professionals to work on the most exciting challenges. We share responsibilities in everything we do, where every point of view is valued. Join us!

Job Summary

We are seeking a highly skilled Edge AI Engineer/Scientist with a strong theoretical foundation in AI and solid software engineering expertise to contribute to our Edge AI Model Optimization program. While the primary focus of this role is on model quantization, the scope also includes complementary optimization strategies such as speculative decoding, pruning, and other methods for ensuring highly efficient on-device deployment.

You will work at the forefront of innovation, bridging the gap between research and practice, focusing on CNNs, Large Language Model (LLM) and Vision Language Model (VLM) optimization to bring advanced capabilities to NXP’s Ara2 family of NPUs, directly supporting the future of on‑device intelligence.

If you want to the future of efficient on‑device AI, this is the place to be.

Job Responsibilities

Research: Actively survey the latest research (NeurIPS, ICLR, CVPR) on model optimization/compression, focusing particularly on neural network quantization, but also including other techniques like speculative decoding, pruning, etc.
Prototyping: Develop and adapt state-of-the-art methods to NXP’s hardware constraints, building POCs to showcase the effectiveness of these techniques on NXP HW.
Production Implementation: Translate research prototypes into robust, optimized production code (C++/Python), ensuring strict memory and compute efficiency standards.
Systems Integration: Document algorithmic tradeoffs, derive deployment recipes, and mentor the engineering team on numerical methods and optimization.
Cross-Functional Leadership: Act as the technical bridge between AI Research, Hardware Engineering and other teams, providing quantified guidance on how choices impact model accuracy and performance.
IP Generation: Contribute to NXP’s intellectual property portfolio through patents and technical publications.

Job Qualifications

Required Background

Education: MSc or Ph.D (is a plus) in Computer Science, Electrical Engineering, or Mathematics with a focus on Machine Learning or Deep Learning.
AI Expertise: Proven practical experience in AI/ML with a deep understanding of CNN architectures and Generative AI (Transformers, LLMs, VLMs, etc.).
Technical Stack: Strong hands-on experience with PyTorch, ONNX, and model conversion/optimization pipelines.
Software Engineering: Proficient in Python and C++ and best development practices.
Embedded Mindset: Familiarity with the constraints of embedded systems (latency, power, memory bandwidth) and how code interacts with underlying hardware.

Preferred

Advanced AI: Experience with state-of-the-art quantization techniques for discriminative and generative AI (e.g., GPTQ, SpinQuant, etc).
Hardware Acceleration: Experience with NPUs, device-level profiling, and diagnosing memory bottlenecks.
Kernel Development: Experience with custom kernel development is a plus.
Compilers: Knowledge of MLIR or TVM is a significant plus.


We at NXP have an environment that fosters innovation. Our team has technology experts who understand the big picture and mentors who coach passionate professionals to work on the most exciting challenges. We share responsibilities in everything we do, where every point of view is valued. Join us!

Job Summary

We are seeking a highly skilled Edge AI Engineer/Scientist with a strong theoretical foundation in AI and solid software engineering expertise to contribute to our Edge AI Model Optimization program. While the primary focus of this role is on model quantization, the scope also includes complementary optimization strategies such as speculative decoding, pruning, and other methods for ensuring highly efficient on-device deployment.

You will work at the forefront of innovation, bridging the gap between research and practice, focusing on CNNs, Large Language Model (LLM) and Vision Language Model (VLM) optimization to bring advanced capabilities to NXP’s Ara2 family of NPUs, directly supporting the future of on‑device intelligence.

If you want to the future of efficient on‑device AI, this is the place to be.

Job Responsibilities

Research: Actively survey the latest research (NeurIPS, ICLR, CVPR) on model optimization/compression, focusing particularly on neural network quantization, but also including other techniques like speculative decoding, pruning, etc.
Prototyping: Develop and adapt state-of-the-art methods to NXP’s hardware constraints, building POCs to showcase the effectiveness of these techniques on NXP HW.
Production Implementation: Translate research prototypes into robust, optimized production code (C++/Python), ensuring strict memory and compute efficiency standards.
Systems Integration: Document algorithmic tradeoffs, derive deployment recipes, and mentor the engineering team on numerical methods and optimization.
Cross-Functional Leadership: Act as the technical bridge between AI Research, Hardware Engineering and other teams, providing quantified guidance on how choices impact model accuracy and performance.
IP Generation: Contribute to NXP’s intellectual property portfolio through patents and technical publications.

Job Qualifications

Required Background

Education: MSc or Ph.D (is a plus) in Computer Science, Electrical Engineering, or Mathematics with a focus on Machine Learning or Deep Learning.
AI Expertise: Proven practical experience in AI/ML with a deep understanding of CNN architectures and Generative AI (Transformers, LLMs, VLMs, etc.).
Technical Stack: Strong hands-on experience with PyTorch, ONNX, and model conversion/optimization pipelines.
Software Engineering: Proficient in Python and C++ and best development practices.
Embedded Mindset: Familiarity with the constraints of embedded systems (latency, power, memory bandwidth) and how code interacts with underlying hardware.

Preferred

Advanced AI: Experience with state-of-the-art quantization techniques for discriminative and generative AI (e.g., GPTQ, SpinQuant, etc).
Hardware Acceleration: Experience with NPUs, device-level profiling, and diagnosing memory bottlenecks.
Kernel Development: Experience with custom kernel development is a plus.
Compilers: Knowledge of MLIR or TVM is a significant plus.


More information about NXP in India...

#LI-29f4

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

TECHNICAL LEAD

Coforge Ltd., Hyderabad
1 day ago
We are looking for a highly skilled Full Stack Developer to design, develop, and maintain scalable web applications. The ideal candidate will have hands-on experience across frontend and backend technologies, strong problem-solving capabilities, and experience working in a fast-paced Agile environment. Key Responsibilities ? Design, develop, and maintain end-to-end web applications across frontend and backend layers ? Build scalable APIs...

consultant

Tata Consultancy Services, Hyderabad
4 days ago
Location Hyderabad Job Function CONSULTANCY Role Consultant Job Id 417578 Desired Skills SAP AII Desired Candidate Profile Qualifications : BACHELOR OF ENGINEERING, BACHELOR OF TECHNOLOGY

Network Engineer

Tata Consultancy Services, Hyderabad
5 days ago
Network Wireless LAN - Cisco and Aruba Location Hyderabad Job Function IT INFRASTRUCTURE SERVICES Role Consultant Job Id 417375 Desired Skills Cisco Network Desired Candidate Profile Qualifications : BACHELOR OF ENGINEERING