Hilde Kuehne - Personal Homepage

News

Two papers accepted to ECCV - Checkout: DEX-AR! The second paper, "Decompose, Compare, and Decide", will be on arxiv soon!

5th Workshop on What is Next in Multimodal Foundation Models? will be happening at CVPR 2026. Checkout the Program !

I'll give three workshop talks at CVPR - Checkout the Urvis Workshop, the 2nd VideoLLM Workshop, and BigMAC.

Four papers accepted to CVPR - Checkout: VOLD, SigLino, VisualOverload, and TTRV.

MS2Vid got accepted to 3DV - Big congrats to Nina and everybody involved!

MaskInversion got accepted to ICLR - Big congrats to Walid and everybody involved!

Our Paper on Multimodal Temperature Schedules got accepted as Oral to WACV 2026 - Big congrats Siarhei, Anna, and everybody involved!

Papers

		TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning Soumya Shamarao Jahagirdar, Edson Araujo, Anna Kukleva, M. Jehanzeb Mirza, Saurabhchand Bhati, Samuel Thomas, Brian Kingsbury, Rogerio Feris, James R. Glass, Hilde Kuehne arxiv 2026 (pdf)
		AVRT: Audio-Visual Reasoning Transfer through Single-Modality Teachers Edson Araujo, Saurabhchand Bhati, M. Jehanzeb Mirza, Brian Kingsbury, Samuel Thomas, Rogerio Feris, James R. Glass, Hilde Kuehne arxiv 2026 (pdf, website)
		Decompose, Compare, and Decide: Multimodal LLMs are Implicit Few-Shot Learners Yunhan Wang, Eshika Khandelwal, Edson Araujo, Walid Bousselham, Nina Shvetsova, Hilde Kuehne ECCV 2026 (comming soon)
		DEX-AR: A Dynamic Explainability Method for Autoregressive Vision-Language Models Walid Bousselham, Angie Boggust, Hendrik Strobelt, Hilde Kuehne ECCV 2026 (pdf, website)
		VOLD: Reasoning Transfer from LLMs to Vision-Language Models via On-Policy Distillation Walid Bousselham, Hilde Kuehne, Cordelia Schmid CVPR 2026 (pdf, website)
		SigLino: Efficient Multi-Teacher Distillation for Agglomerative Vision Foundation Models Sofian Chaybouti, Sanath Narayan, Yasser Dahou, Phúc H. Lê Khac, Ankit Singh, Ngoc Dung Huynh, Wamiq Reyaz Para, Hilde Kuehne, Hakim Hacid CVPR 2026 (Spotlight + Best Paper Award at A2A-MML Workshop) (pdf, code, website)
		TTRV: Test-Time Reinforcement Learning for Vision Language Models Akshit Singh, Shyam Marjit, Wei Lin, Paul Gavrikov, Serena Yeung-Levy, Hilde Kuehne, Rogerio Feris, Sivan Doveh, James Glass, M. Jehanzeb Mirza CVPR 2026 ( pdf , code , website )
		VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes Paul Gavrikov, Wei Lin, M. Jehanzeb Mirza, Soumya Jahagirdar, Muhammad Huzaifa, Sivan Doveh, Serena Yeung-Levy, James Glass, Hilde Kuehne CVPR 2026 (pdf, code, website)
		MaskInversion: Localized Embeddings via Optimization of Explainability Maps Walid Bousselham, Sofian Chaybouti, Christian Rupprecht, Vittorio Ferrari, Hilde Kuehne ICLR 2026 (pdf, code, website)
		M2SVid: End-to-End Inpainting and Refinement for Monocular-to-Stereo Video Conversion Nina Shvetsova, Goutam Bhat, Prune Truong, Hilde Kuehne, Federico Tombari 3DV 2026 (pdf, website)
		MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data Siarhei Sheludzko, Dhimitrios Duka, Bernt Schiele, Hilde Kuehne, Anna Kukleva WACV 2026 (Oral) (pdf, code)
		LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity Walid Bousselham, Angie Boggust, Sofian Chaybouti, Hendrik Strobelt, Hilde Kuehne ICCV 2025 (pdf, code, website, HuggingFace)
		Teaching VLMs to Localize Specific Objects from In-context Examples (IPLoc) Sivan Doveh, Nimrod Shabtay, Wei Lin, Eli Schwartz, Hilde Kuehne, Raja Giryes, Rogerio Feris, Leonid Karlinsky, James Glass, Assaf Arbelle, Shimon Ullman, M. Jehanzeb Mirza ICCV 2025 (pdf, code)
		Canonical rank adaptation: An efficient fine-tuning strategy for vision transformers Lokesh Veeramacheneni, Moritz Wolter, Hilde Kuehne, Juergen Gall ICML 2025 (pdf, code)
		CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment Edson Araujo, Andrew Rouditchenko, Yuan Gong, Saurabhchand Bhati, Samuel Thomas, Brian Kingsbury, Leonid Karlinsky, Rogerio Feris, James R. Glass, Hilde Kuehne CVPR 2025 (pdf, website, code)
		Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks Nina Shvetsova, Arsha Nagrani, Bernt Schiele, Hilde Kuehne, Christian Rupprecht CVPR 2025 (pdf, website, code)
		VideoGEM: Training-free Action Grounding in Videos Felix Vogel, Walid Bousselham, Anna Kukleva, Nina Shvetsova, Hilde Kuehne CVPR 2025 (pdf, code)
		Convolutional Differentiable Logic Gate Networks Felix Petersen, Hilde Kuehne, Christian Borgelt, Julian Welzel, Stefano Ermon NeurIPS 2024 (oral) (pdf)
		Fishers and Hessians of Continuous Relaxations Felix Petersen, Christian Borgelt, Tobias Sutter, Hilde Kuehne, Oliver Deussen, Stefano Ermon NeurIPS 2024 (pdf)
		ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs Irene Huang, Wei Lin, Muhammad Mirza, Jacob Hansen, Sivan Doveh, Victor Butoi, Roei Herzig, Assaf Arbelle, Hilde Kuehne, Trevor Darrell, Chuang Gan, Aude Oliva, Rogerio Feris, Leonid Karlinsky NeurIPS D&B 2024 (pdf, code)
		HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne ECCV 2024 (pdf, code)
		Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs M. Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Sivan Doveh, Jakub Micorek, Mateusz Kozinski, Hilde Kuhene, Horst Possegger ECCV 2024 (pdf, website, code)
		Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation Andrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne, Rogerio Feris, James Glass Interspeech 2024 (pdf, code, YouTube Presentation, Colab)
		Grounding Everything: Emerging Localization Properties in Vision-Language Transformers Walid Bousselham, Felix Petersen, Vittorio Ferrari, Hilde Kuehne CVPR 2024 (pdf, code, HuggingFace, Colab)
		What, when, and where? - Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne CVPR 2024 (pdf, website, data, code coming soon)
		Uncertainty Quantification via Stable Distribution Propagation Felix Petersen, Aashwin Mishra, Hilde Kuehne, Christian Borgelt, Oliver Deussen, Mikhail Yurochkin ICLR 2024 (pdf, code coming soon)
		What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation Benedikt Blumenstiel, Johannes Jakubik, Hilde Kühne, Michael Voessing NeurIPS D&B 2023 (pdf, code)
		Learning Human Action Recognition Representations Without Real Humans Howard Zhong, Samarth Mishra, Donghyun Kim, SouYoung Jin, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Aude Oliva, Rogerio Feris NeurIPS D&B 2023 (pdf, code)
		In-Style: Unsupervised Text-Video Retrieval with Style Preservation Nina Shvetsova, Anna Kukleva, Bernt Schiele, Hilde Kuehne ICCV 2023 (pdf, code)
		Preserving Modality Structure Improves Multi-Modal Learning Sirnam Swetha, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah ICCV 2023 (pdf)
		MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge Wei Lin, Leonid Karlinsky, Nina Shvetsova, Horst Possegger, Mateusz Kozinski, Rameswar Panda, Rogerio Feris, Hilde Kuehne, Horst Bischof ICCV 2023 (pdf, code)
		Learning by Sorting: Self-supervised Learning with Group Ordering Constraints Nina Shvetsova, Felix Petersen, Anna Kukleva, Bernt Schiele, Hilde Kuehne ICCV 2023 (pdf, code)
		Learning Situation Hyper-Graphs for Video Question Answering Aisha Urooj, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah CVPR 2023 (pdf), (code)
		Video Test-Time Adaptation for Action Recognition Wei Lin, Muhammad Jehanzeb Mirza, Mateusz Kozinski, Horst Possegger, Hilde Kuehne, Horst Bischof CVPR 2023 (pdf), (code)
		Temperature Schedules for self-supervised contrastive methods on long-tail data Anna Kukleva, Moritz Boehle, Bernt Schiele, Hilde Kuehne, Christian Rupprecht arxiv 2022 (pdf), (code)
		ISAAC Newton: Input-based Approximate Curvature for Newton's Method Felix Petersen, Tobias Sutter, Christian Borgelt, Dongsung Huh, Hilde Kuehne, Yuekai Sun, Oliver Deussen ICLR 2023 (pdf), (code)
		Contrastive audio-visual masked autoencoder Yuan Gong, Andrew Rouditchenko, Alexander H Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass ICLR 2023 (pdf), (code)
		Deep Differentiable Logic Gate Networks Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen NeurIPS 2022 (pdf), (code)
		C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass arxiv 2022 (pdf)
		VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models Felix Vogel, Nina Shvetsova, Leonid Karlinsky, Hilde Kuehne arxiv 2022 (pdf)
		Differentiable top-k classification learning Felix Petersen, Hilde Kuehne, Christian Borgelt, Oliver Deussen ICML 2022 (pdf), (code)
		Augmentation Learning for Semi-Supervised Classification Tim Frommknecht, Pedro Alves Zipf, Quanfu Fan, Nina Shvetsova, Hilde Kuehne GCPR 2022 (pdf)
		CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video Wei Lin, Anna Kukleva, Kunyang Sun, Horst Possegger, Hilde Kuehne, Horst Bischof ECCV 2022 (pdf)
		Weakly Supervised Grounding for VQA in Vision-Language Transformers Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels Da Vitoria Lobo, Mubarak Shah ECCV 2022 (Oral) (pdf), (code)
		Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne. CVPR 2022 (pdf), (code)
		Unsupervised Domain Generalization by Learning a Bridge Across Domains Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky. CVPR 2022 (pdf), (code)
		Monotonic Differentiable Sorting Networks Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen. ICLR 2022 (pdf), (Code), (YouTube)
		Style Agnostic 3D Reconstruction via Adversarial Style Transfer Felix Petersen, Bastian Goldluecke, Oliver Deussen, Hilde Kuehne. WACV 2022 (pdf), (Code), (YouTube)
		Learning with Algorithmic Supervision via Continuous Relaxations Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen. NeurIPS 2021 (pdf), (code), (Youtube)
		Detector-Free Weakly Supervised Grounding by Separation Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky. ICCV 2021 (oral) (pdf)
		Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang. ICCV 2021 (pdf), (code)
		Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting Anna Kukleva, Hilde Kuehne, Bernt Schiele. ICCV 2021 (pdf)
		AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass. Interspeech 2021 (pdf), (AVLNet code)
		Cascaded Multilingual Audio-Visual Learning from Videos Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass. Interspeech 2021 (pdf), (code)
		Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen. ICML 2021 (pdf), (DiffSort code), (YouTube)
		Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah. CVPR 2021 (pdf), (code)
		Unsupervised Discriminative Embedding for Sub-Action Learning in Complex Activities Sirnam Swetha, Hilde Kuehne, Yogesh S Rawat, Mubarak Shah. ICIP 2021 (pdf)
		Joint visual-temporal embedding for unsupervised learning of actions in untrimmed sequences Rosaura G VidalMata, Walter J Scheirer, Anna Kukleva, David Cox, Hilde Kuehne. WACV 2021 (pdf)
		More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation Quanfu Fan, Chun-Fu (Richard) Chen, Hilde Kuehne, Marco Pistoia, David Cox. NeurIPS 2019 (pdf), (code)
		Unsupervised learning of action classes with continuous temporal embedding A. Kukleva, H. Kuehne, F. Sener, J. Gall. CVPR 2019 (pdf), (code)
		A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation Hilde Kuehne, Alexander Richard, Juergen Gall. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 2019 (open access) (pdf)
		NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning Alexander Richard, Hilde Kuehne, Ahsan Iqbal, Juergen Gall. CVPR 2018 (pdf), (code)
		Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints Alexander Richard, Hilde Kuehne, Juergen Gall. CVPR 2018 (pdf), (bibtex), (code)
		Recurrent Residual Learning for Action Recognition, German Conference on Pattern Recognition Ahsan Iqbal, Alexander Richard, Hilde Kuehne, Juergen Gall. GCPR 2017 (Best Master's Award) (pdf), (bibtex)
		Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling A. Richard, H. Kuehne and J. Gall. CVPR 2017 (oral) (website & downloads)
		Weakly supervised learning of actions from transcripts H. Kuehne, A. Richard and J. Gall. CVIU 2017 (website & downloads)
		An end-to-end generative framework for video segmentation and recognition H. Kuehne, J. Gall and T. Serre. WACV 2016 (website & downloads)
		The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities H. Kuehne, A. B. Arslan and T. Serre. CVPR 2014 (Breakfast dataset: data & code)
		On-line Action Recognition from sparse Feature Flow H. Kuehne, D. Gehrig, T. Schultz, R. Stiefelhagen. VISAPP 2012 (data & annotations)
		HMDB: A Large Video Database for Human Motion Recognition H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, T. Serre. ICCV 2011 (project website)
*Visapp 2010* *Angers, France*		Motion Segmentation of Articulated Structures by Integration of Visula Perception Criteria H. Kuehne, A. Woerner. VisApp 2010 (pdf)(bibtex)
*ICCV 2009,* *Kyoto, Japan*		An Iterative Scheme for Motion-Based Scene Segmentation A.Bachmann, H. Kuehne. ICCV 2009, Workshop on Dynamical Vision (DV) (pdf)(bibtex)