This project concerns the acquisition, learning, and representation of visual knowledge. It seeks to build systems that exhibit visual common-sense and are able to integrate visual information with textual information to improve knowledge acquisition and reasoning. Problems it has tackled or is tackling are the prediction of the dynamics of objects in static images (Newtonian Neural Network), the visual learning of concepts such as 'chair' (Levan), the visual verification of relation phrases such as 'does the dog eat ice cream' (VisKE), and the visual recognition of daily human activities (Charades).