Work

Visual Understanding using Analogical Learning over Qualitative Representations

Public

Visual understanding is an important area in artificial intelligence. Many researchers have proposed novel deep learning models for this, which have achieved impressive results. However, deep learning models of visual understanding do not model human-like cognition and are still far from human-level ability. Standard deep learning models do not model human-like structural, relational representations and thus lack understandability and data efficiency. On the other hand, symbolic methods are widely used to model human-like cognition. Symbolic qualitative representations can easily express structures and be interpreted. Also, these qualitative representations are adaptable to various input domains and can be generalized to new input domains. However, some researchers argue that these symbolic methods have much lower performance than state-of-the-art models. Thus, I aim to design AI approaches that model human cognition while having competitive results compared with state-of-the-art models. In this thesis, I focus on improving the use of analogical learning, a symbolic machine learning approach, using novel qualitative representations on multiple visual understanding tasks. Firstly, I explore a visual task, sketched object recognition. To describe the geometric information of objects, I created two novel object-level encoding schemes, geon-based encoding [Biederman, 1987], and part-based encoding. Then, I extend the approach to real images. For real images, I create a hybrid architecture, the Hybrid Primal Sketch Processor (HPSP), which combines deep learning and qualitative representations to generate comprehensive and accurate descriptions of images. The HPSP is inspired by Marr’s Primal Sketch [Marr, 1982], and extends it with more semantic information. We apply deep learning for low-level perception and analogical learning over qualitative representations for high-level perception. We aim to gain the advantages of the broad coverage of images from deep learning models and the high data efficiency, clear understandability, and strong adaptation from analogical generalization. Specifically, we use the HPSP to generate qualitative representations via two types of novel encoding schemes, pair-level encoding and scene-level encoding. We utilize them on visual relationship detection and question answering tasks. Finally, I proposed a novel encoding scheme for temporal data and apply analogical learning over these novel representations to the task of human action recognition. The encoding strategies described in this thesis provide rich information for visual understanding. Experiments on these visual tasks illustrate our claims on analogical learning over qualitative representations.

Creator
DOI
Subject
Language
Alternate Identifier
Keyword
Date created
Resource type
Rights statement

Relationships

Items