CENTRALE LYON - Post-doctoral position in computer science
- On-site
- Ecully, Auvergne-Rhône-Alpes, France
- €30,900 - €34,416 per year
- LIRIS - Laboratoire d'Informatique en Image et Systèmes d'Information
Job description
Heterogeneous compression of AI models
Efficient modern AI approaches heavily rely on Deep Neural Networks (DNN), also known as deep learning. They are used in multiple application domains (industrial production, entertainment, security, etc.) for solving complex issues related to computer vision, natural language processing… Recent AI models are very powerful, but they are composed of millions or even billions of parameters so that they can be costly to train but also to use at inference time. It is the reason why several techniques have been designed to reduce this cost, such as pruning part of the model weights [1] or changing value precision by quantization [2].
Quantization is a widely‑used technique to reduce the memory footprint, computational cost and power consumption of deep neural networks by lowering the precision of weights and activations (e.g., from 32‑bit floating point to 8‑bit integer or even fewer bits). Traditional quantization methods tend to apply a uniform precision (bit‑width) and uniform quantization scheme across all layers or all parameters of the network, such as the GPTQ algorithm [3]. In contrast, heterogeneous quantization (also called mixed‑precision) means that different parts of the network (different layers, different channels, even individual parameters) can be assigned different precisions or different quantization schemes according to their sensitivity, distribution of values, or hardware needs [4]. This more fine‑grained approach enables more aggressive compression (lower bits where tolerable) while preserving accuracy where it matters.
Despite the theoretical benefits of the heterogeneous quantization framework, a limitation is due to the hardware (HW) architecture used to deploy the quantized model. Indeed, HW architectures are designed to manage a set of pre-defined data precision and types (e.g., integer 4 bit, integer 8 bit, etc). This includes the memory-processing unit data transfers and arithmetic circuits, meaning that, for custom precisions (e.g., a priori n bit data), conversions and cast operations have to be added eventually increasing the overhead of the overall implementation [5].
The goal of this post-doc position is to investigate heterogeneous quantization according to efficiency and trustworthiness applied to a given AI model in order to derive requirements to design a custom hardware architecture.
Tasks:
1) Investigation of quantization techniques using PyTorch framework. In this first task the goal will be to target a “simple” Convolutional Neuronal Networks (CNN) architectures for object recognition (e.g., MobileNet) and explore different granularity of quantization (e.g., by layer, by filter, by parameters, …). This task will require to manipulate/modify the library functions of PyTorch like the Conv2D.
2) Extend the analysis of Task 1 to a small Language Models (LMs) based on Transformer architectures for classification tasks related to textual data, starting with reasonable encoder architectures such as TinyBERT [6].
3) Thanks to the outcomes of tasks 1) and 2) requirements to design a custom hardware architecture to efficient execute the quantized model will be produced. A model of the custom hardware architecture will be developed in cooperation with on-going research activities and will allow to evaluate the heterogenous quantization from the energy efficient point of view. Metrics to be used will be Energy per inference, Energy per token and clearly the accuracy.
Profile:
We are seeking a postdoctoral researcher with a PhD in computer science or a closely related field, and a strong background in machine learning and deep learning. The ideal candidate should be proficient with modern frameworks and methodologies in computer vision and/or natural language processing, and capable of applying these techniques to complex, real-world problems. A solid understanding of model architectures, training strategies, and evaluation methods is expected. The candidate should also be able to understand how the computations are done at the matrix level. Experience or familiarity with model compression and optimization techniques—such as pruning, quantization, or knowledge distillation—would be a significant advantage.
Dates:
Post Doc is expected to start in April or May 2026, duration 12 months.
Environment:
The Ph.D. candidate will be supervised by the LIRIS (expertise in Machine Learning) and INL (expertise in hardware architecture) teams in Lyon (Ecole Centrale Campus). The salary will follow standard French rates.
Contacts:
Julien Velcin / LIRIS Ecole Centrale Lyon – email: julien.velcin@ec-lyon.fr
Alberto Bosio / INL - Ecole Centrale Lyon – email: alberto.bosio@ec-lyon.fr
References
[1] Hassibi, Babak, David G Stork, and Gregory J Wolff (1993). « Optimal brain surgeon and general network pruning. » In: IEEE international conference on neural networks. IEEE, pp. 293–299
[2] Gupta, Suyog, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan (2015). « Deep learning with limited numerical precision. » In: International conference on machine learning. PMLR, pp. 1737–1746
[3] Frantar, Elias, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh (2023). “GPTQ: Accurate post-training quantization for generative pre-trained transformers”. In: ICLR
[4] Micikevicius, Paulius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, et al. (2018). Mixed precision training. In: ICLR
[5] Ali S. B., S. -I. Filip, O. Sentieys and G. Lemieux, "MPTorch-FPGA: A Custom Mixed-Precision Framework for FPGA-Based DNN Training," 2025 Design, Automation & Test in Europe Conference (DATE), Lyon, France, 2025, pp. 1-7, doi: 10.23919/DATE64628.2025.10993010.
[6] Jiao, Xiaoqi, et al. "TinyBERT: Distilling BERT for natural language understanding." Findings of the association for computational linguistics: EMNLP 2020. 2020.
Job requirements
Required skills / qualifications________________________________
Diplomas : PhD in Computer Science (or related field like Computer Engineer)
Experience : deep learning
Knowledge required: mathematics of deep learning, software implementation of deep learning
Operational skills : python programming (in particular, libraries related to deep learning such as pytorch)
Behavioural skills : Ability to work effectively in a multi-disciplinary team environment
Work context / environment_______________________________
The Ph.D. candidate will be supervised by the LIRIS (expertise in Machine Learning) and INL (expertise in hardware architecture) teams in Lyon (Ecole Centrale Campus). The recruited postdoc is expected to come to the lab physically at a daily basis. She or he will have its own desk and access to the computation facilities of the lab.
Recruitment process_______________________________
The recruitment process takes place in two stages, supervised by a recruitment committee, in accordance with Centrale Lyon's OTMR policy.
Recruitment timetable:
studies of the written applications: end of December, 2026
interviews (possible by visioconference): 5-15 of January, 2026
decision : by the end of January
Selection criteria :
excellence in the profile, experience and motivation
or
All done!
Your application has been successfully submitted!
