Hilde Kuehne - Personal Homepage

News

Our Paper on Multimodal Temperature Schedules got accepted as Oral to WACV 2026 - Big congrats Siarhei, Anna, and everybody involved!

LeGrad and IPLoc got accepted to ICCV 2025 - Big congrats to Walid and Sivan and everybody involved!

One paper accepted to ICML 2025 - Big congrats to Lokesh, Moritz and everybody involved!

Three papers accepted to CVPR 2025 - Big congrats to Nina, Edson, and Felix and everybody involved!

3rd Workshop on What is Next in Multimodal Foundation Models? has been accepted for CVPR 2025. Checkout the Website

Workshop on New Frontiers in Associative Memories workshop at ICLR 2025. Checkout the Website and Call!

Two papers accepted to NeurIPS 2024, one as oral! - Big congrats to Felix and everybody involved!

One paper accepted to NeurIPS D&B 2024 - Big congrats to Irene and everybody involved!

Two papers accepted to ECCV 2024 - Big congrats to Nina, Anna, and Mirza and everybody involved!

Whisper-Flamingo has been accepted to Interspeech 2024 - Big congrats and kudos to Andrew!

I'll start a new position as a full professor at the Tuebingen AI Center

One workshop (incl. challenge) accepted for CVPR 2024. Check out the website and call:
- 2nd Workshop on What is Next in Multimodal Foundation Models?
- MMFM-Challenge

Two papers accepted to CVPR 2024 - Big congrats to Walid and Brian and everybody involved!

I had the pleasure of contributing my expertise to the 2024 annual report of the Commission of Experts for Research and Innovation

I'm honored to be a member of the Scientific Advisory Board of the Carl-Zeiss-Foundation and to have the chance to discuss the future of AI research with so great colleagues: Article

Walid got featured on the title page of RSIP ComputerVisionNews in January! Check out the GEM article: ihttps://www.rsipvision.com/ComputerVisionNews-2024January/!

The second edition of the Differentiable Almost Everything Workshop will be held at ICML 2024. Website and call will be available soon! (2023 Edition)

One paper accepted for ICML 2024 - Congratulations Felix! Check it out: Uncertainty Quantification via Stable Distribution Propagation

Papers

		LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity Walid Bousselham, Angie Boggust, Sofian Chaybouti, Hendrik Strobelt, Hilde Kuehne ICCV 2025 (pdf, code, website, HuggingFace)
		Teaching VLMs to Localize Specific Objects from In-context Examples (IPLoc) Sivan Doveh, Nimrod Shabtay, Wei Lin, Eli Schwartz, Hilde Kuehne, Raja Giryes, Rogerio Feris, Leonid Karlinsky, James Glass, Assaf Arbelle, Shimon Ullman, M. Jehanzeb Mirza ICCV 2025 (pdf, code)
		CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment Edson Araujo, Andrew Rouditchenko, Yuan Gong, Saurabhchand Bhati, Samuel Thomas, Brian Kingsbury, Leonid Karlinsky, Rogerio Feris, James R. Glass, Hilde Kuehne CVPR 2025 (pdf, website, code)
		Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks Nina Shvetsova, Arsha Nagrani, Bernt Schiele, Hilde Kuehne, Christian Rupprecht CVPR 2025 (pdf, website, code)
		VideoGEM: Training-free Action Grounding in Videos Felix Vogel, Walid Bousselham, Anna Kukleva, Nina Shvetsova, Hilde Kuehne CVPR 2025 (pdf, code)
		Convolutional Differentiable Logic Gate Networks Felix Petersen, Hilde Kuehne, Christian Borgelt, Julian Welzel, Stefano Ermon NeurIPS 2024 (oral) (pdf)
		Fishers and Hessians of Continuous Relaxations Felix Petersen, Christian Borgelt, Tobias Sutter, Hilde Kuehne, Oliver Deussen, Stefano Ermon NeurIPS 2024 (pdf)
		ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs Irene Huang, Wei Lin, Muhammad Mirza, Jacob Hansen, Sivan Doveh, Victor Butoi, Roei Herzig, Assaf Arbelle, Hilde Kuehne, Trevor Darrell, Chuang Gan, Aude Oliva, Rogerio Feris, Leonid Karlinsky NeurIPS D&B 2024 (pdf, code)
		MaskInversion: Localized Embeddings via Optimization of Explainability Maps Walid Bousselham, Sofian Chaybouti, Christian Rupprecht, Vittorio Ferrari, Hilde Kuehne arxiv 2024 (pdf, code, website)
		HowToCaption: Prompting LLMs to Transform Video Annotations at Scale Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne ECCV 2024 (pdf, code)
		Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs M. Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Sivan Doveh, Jakub Micorek, Mateusz Kozinski, Hilde Kuhene, Horst Possegger ECCV 2024 (pdf, website, code)
		Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation Andrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne, Rogerio Feris, James Glass Interspeech 2024 (pdf, code, YouTube Presentation, Colab)
		Grounding Everything: Emerging Localization Properties in Vision-Language Transformers Walid Bousselham, Felix Petersen, Vittorio Ferrari, Hilde Kuehne CVPR 2024 (pdf, code, HuggingFace, Colab)
		What, when, and where? - Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne CVPR 2024 (pdf, website, data, code coming soon)
		Uncertainty Quantification via Stable Distribution Propagation Felix Petersen, Aashwin Mishra, Hilde Kuehne, Christian Borgelt, Oliver Deussen, Mikhail Yurochkin ICLR 2024 (pdf, code coming soon)
		What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation Benedikt Blumenstiel, Johannes Jakubik, Hilde Kühne, Michael Voessing NeurIPS D&B 2023 (pdf, code)
		Learning Human Action Recognition Representations Without Real Humans Howard Zhong, Samarth Mishra, Donghyun Kim, SouYoung Jin, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Aude Oliva, Rogerio Feris NeurIPS D&B 2023 (pdf, code)
		In-Style: Unsupervised Text-Video Retrieval with Style Preservation Nina Shvetsova, Anna Kukleva, Bernt Schiele, Hilde Kuehne ICCV 2023 (pdf, code)
		Preserving Modality Structure Improves Multi-Modal Learning Sirnam Swetha, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah ICCV 2023 (pdf)
		MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge Wei Lin, Leonid Karlinsky, Nina Shvetsova, Horst Possegger, Mateusz Kozinski, Rameswar Panda, Rogerio Feris, Hilde Kuehne, Horst Bischof ICCV 2023 (pdf, code)
		Learning by Sorting: Self-supervised Learning with Group Ordering Constraints Nina Shvetsova, Felix Petersen, Anna Kukleva, Bernt Schiele, Hilde Kuehne ICCV 2023 (pdf, code)
		Learning Situation Hyper-Graphs for Video Question Answering Aisha Urooj, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah CVPR 2023 (pdf), (code)
		Video Test-Time Adaptation for Action Recognition Wei Lin, Muhammad Jehanzeb Mirza, Mateusz Kozinski, Horst Possegger, Hilde Kuehne, Horst Bischof CVPR 2023 (pdf), (code)
		Temperature Schedules for self-supervised contrastive methods on long-tail data Anna Kukleva, Moritz Boehle, Bernt Schiele, Hilde Kuehne, Christian Rupprecht arxiv 2022 (pdf), (code)
		ISAAC Newton: Input-based Approximate Curvature for Newton's Method Felix Petersen, Tobias Sutter, Christian Borgelt, Dongsung Huh, Hilde Kuehne, Yuekai Sun, Oliver Deussen ICLR 2023 (pdf), (code)
		Contrastive audio-visual masked autoencoder Yuan Gong, Andrew Rouditchenko, Alexander H Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass ICLR 2023 (pdf), (code)
		Deep Differentiable Logic Gate Networks Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen NeurIPS 2022 (pdf), (code)
		C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass arxiv 2022 (pdf)
		VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models Felix Vogel, Nina Shvetsova, Leonid Karlinsky, Hilde Kuehne arxiv 2022 (pdf)
		Differentiable top-k classification learning Felix Petersen, Hilde Kuehne, Christian Borgelt, Oliver Deussen ICML 2022 (pdf), (code)
		Augmentation Learning for Semi-Supervised Classification Tim Frommknecht, Pedro Alves Zipf, Quanfu Fan, Nina Shvetsova, Hilde Kuehne GCPR 2022 (pdf)
		CycDA: Unsupervised Cycle Domain Adaptation to Learn from Image to Video Wei Lin, Anna Kukleva, Kunyang Sun, Horst Possegger, Hilde Kuehne, Horst Bischof ECCV 2022 (pdf)
		Weakly Supervised Grounding for VQA in Vision-Language Transformers Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels Da Vitoria Lobo, Mubarak Shah ECCV 2022 (Oral) (pdf), (code)
		Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne. CVPR 2022 (pdf), (code)
		Unsupervised Domain Generalization by Learning a Bridge Across Domains Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter Staar, Shady Abu-Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogerio Feris, Leonid Karlinsky. CVPR 2022 (pdf), (code)
		Monotonic Differentiable Sorting Networks Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen. ICLR 2022 (pdf), (Code), (YouTube)
		Style Agnostic 3D Reconstruction via Adversarial Style Transfer Felix Petersen, Bastian Goldluecke, Oliver Deussen, Hilde Kuehne. WACV 2022 (pdf), (Code), (YouTube)
		Learning with Algorithmic Supervision via Continuous Relaxations Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen. NeurIPS 2021 (pdf), (code), (Youtube)
		Detector-Free Weakly Supervised Grounding by Separation Assaf Arbelle, Sivan Doveh, Amit Alfassy, Joseph Shtok, Guy Lev, Eli Schwartz, Hilde Kuehne, Hila Barak Levi, Prasanna Sattigeri, Rameswar Panda, Chun-Fu Chen, Alex Bronstein, Kate Saenko, Shimon Ullman, Raja Giryes, Rogerio Feris, Leonid Karlinsky. ICCV 2021 (oral) (pdf)
		Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang. ICCV 2021 (pdf), (code)
		Generalized and Incremental Few-Shot Learning by Explicit Learning and Calibration without Forgetting Anna Kukleva, Hilde Kuehne, Bernt Schiele. ICCV 2021 (pdf)
		AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass. Interspeech 2021 (pdf), (AVLNet code)
		Cascaded Multilingual Audio-Visual Learning from Videos Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass. Interspeech 2021 (pdf), (code)
		Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen. ICML 2021 (pdf), (DiffSort code), (YouTube)
		Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo, Mubarak Shah. CVPR 2021 (pdf), (code)
		Unsupervised Discriminative Embedding for Sub-Action Learning in Complex Activities Sirnam Swetha, Hilde Kuehne, Yogesh S Rawat, Mubarak Shah. ICIP 2021 (pdf)
		Joint visual-temporal embedding for unsupervised learning of actions in untrimmed sequences Rosaura G VidalMata, Walter J Scheirer, Anna Kukleva, David Cox, Hilde Kuehne. WACV 2021 (pdf)
		More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation Quanfu Fan, Chun-Fu (Richard) Chen, Hilde Kuehne, Marco Pistoia, David Cox. NeurIPS 2019 (pdf), (code)
		Unsupervised learning of action classes with continuous temporal embedding A. Kukleva, H. Kuehne, F. Sener, J. Gall. CVPR 2019 (pdf), (code)
		A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation Hilde Kuehne, Alexander Richard, Juergen Gall. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 2019 (open access) (pdf)
		NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning Alexander Richard, Hilde Kuehne, Ahsan Iqbal, Juergen Gall. CVPR 2018 (pdf), (code)
		Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints Alexander Richard, Hilde Kuehne, Juergen Gall. CVPR 2018 (pdf), (bibtex), (code)
		Recurrent Residual Learning for Action Recognition, German Conference on Pattern Recognition Ahsan Iqbal, Alexander Richard, Hilde Kuehne, Juergen Gall. GCPR 2017 (Best Master's Award) (pdf), (bibtex)
		Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling A. Richard, H. Kuehne and J. Gall. CVPR 2017 (oral) (website & downloads)
		Weakly supervised learning of actions from transcripts H. Kuehne, A. Richard and J. Gall. CVIU 2017 (website & downloads)
		An end-to-end generative framework for video segmentation and recognition H. Kuehne, J. Gall and T. Serre. WACV 2016 (website & downloads)
		The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities H. Kuehne, A. B. Arslan and T. Serre. CVPR 2014 (Breakfast dataset: data & code)
		On-line Action Recognition from sparse Feature Flow H. Kuehne, D. Gehrig, T. Schultz, R. Stiefelhagen. VISAPP 2012 (data & annotations)
		HMDB: A Large Video Database for Human Motion Recognition H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, T. Serre. ICCV 2011 (project website)
*Visapp 2010* *Angers, France*		Motion Segmentation of Articulated Structures by Integration of Visula Perception Criteria H. Kuehne, A. Woerner. VisApp 2010 (pdf)(bibtex)
*ICCV 2009,* *Kyoto, Japan*		An Iterative Scheme for Motion-Based Scene Segmentation A.Bachmann, H. Kuehne. ICCV 2009, Workshop on Dynamical Vision (DV) (pdf)(bibtex)