Back
The ability to extend a model beyond the domain of the training data is central to building robust computer vision models. Methods for dealing with unseen test distributions or biased training data often require leveraging additional image data, but linguistic knowledge of the task and potential domain shifts is much cheaper and easier to obtain. In this talk, I will present three recent works that focus on different ways one can improve accuracy with language advice and incomplete training data via large-scale vision and language models.
Lisa Dunlap (University of California, Berkeley)
PhD student
Lisa Dunlap is a second-year PhD student at UC Berkeley working on understanding large datasets and improving robustness in vision systems with language. Recently her focus has been on using diffusion-based image editing to improve accuracy in fine-grained classification and summarizing differences between image datasets with text. She is co-advised by Trevor Darrell in BAIR as well as Joseph Gonzalez in Sky Computing Lab.