Undergrad Research Project - Visual Dialog

Fall 2017

Sam Fazel-Sarjui
José Moura
Project description

The popularity of reinforcement learning (RL), a field of machine learning, has been grown considerably in recent years. RL's applications are many. I hope to understand RL fundamentals and RL applications to apply it to the field of computer vision, as the popularity of computer vision applications using RL has grown significantly in the past few years.

Visual Dialog is an AI task grounded in computer vision. It requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in the image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being grounded in vision enough to allow objective evaluation of individual responses and benchmark progress. With my partner, we will work under Professor Moura and his PhD student Satwik Kottur to extend the work of Visual Dialog and understand the fundamentals of RL, applying RL to the task of Visual Dialog.

Note: this research may pivot under the discretion and interests of Satwik and Professor Moura.

Return to project list