How to hack a music venue for deaf and disabled audiences

Attitude is Everything (AIE) has published a new DIY guide for bands and promoters, showing cheap and easy ways to make gigs more accessible for deaf and disabled fans. DIY Access Guide will be…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Learning Beyond Limited Labels

By Discovering Disentangled and Compact Representations of Abstract Concepts

As most machine learning practitioners know, the primary challenge for machine learning as a field continues to be ensuring that trained learning systems are capable of generalizing beyond limited training datasets — and humans are much better at this than current state-of-the-art learning systems. As a member of the applied machine learning research group at Intuit, I have been toying with the idea of combining domain knowledge and human interactions with data-driven approaches to teach learning systems to learn more effectively.

Dr. Bengio spoke next about the tried and true strategies that encourage learning systems to “learn how the world ticks,” including equivariance/invariance, symmetries, and leveraging spatial and temporal scales. Inspired by the neuroscientific approach to understanding how human brains work, these strategies attempt to:

Why does capturing meaningful low-dimensional representation allow for generalization? From a neuroscientific perspective, there is evidence that human thought is inherently low-dimensional, hierarchical, and attentional. The human brain can store about seven items in working memory, which is small compared to the hundreds of dimensions of internal representations used in state-of-the-art machine learning models.

Why might maximizing independence among external factors improve generalization? It could be argued that the human brain can come up with control policies that influence independent and distinct aspects of the world. For example, consider an object that we can manipulate/control — let’s say a coffee mug on a desk. One can choose to perform two independent actions on the mug, for instance: 1. turning the mug at its base by 90 degrees, and 2. filling up the mug with coffee to its rim. The outcome of this first action could be called “observing how the coffee mug looks from a 90-degree clockwise viewpoint,” while the outcome of the second is “observing the volume capacity of the mug.” These independent action-outcome pairs are maximally independent factors that explain how our perception of the coffee mug transitions from the original state (of sitting on a desk) to the two final states (turned by 90 degrees, and then filled with coffee). By mentally abstracting these action-outcome pairs, a human can generalize and predict the final states of the mug after these independent actions are taken in any sequence or combination. More importantly, humans can extend such generalization beyond coffee mugs to any object of similar physical properties.

The Conscious Prior

In brief, a conscious representation can be viewed as a low-dimensional combination of a few (concrete or abstract) concepts constituting a “thought.” State-of-the-art learning systems are often trained to capture a high-dimensional representation of the input space. With the Conscious Prior, there are two levels of representation: a high-dimensional “unconscious (hidden)” representation (h) and a low-dimensional “conscious” representation (c). The low dimensionality of the conscious representation serves as a strong regularization for the learning problem. Attention can be applied over the conscious and unconscious state in such a way that the learning system output/prediction synthesized from the unconscious representation maps to a simple combination in the conscious representation. We can think of attention as a model-training mechanism that encourages a learning system to discover a conscious representation in which learned concepts can be “mentally” manipulated and/or referred to compactly. It should be noted that merely encouraging lower-dimensional representation does not guarantee that the discovered/learned representation is “conscious” in the neuroscience sense.

The Independent Controllable Explanatory Factors

The idea behind Independent Controllable Explanatory Factors is that by acting in the real world, a learning system (or agent) can learn “disentangled representations.” Because an agent can act on a particular aspect of the environment while leaving others unaffected, the learned internal representation must be able to distinguish perturbations coming from independent factors. As in the example of manipulating the coffee mug, an agent learns to associate its actions in the environment with internal representations that are unique to the action and the aspect of the environment being acted upon (i.e., Independent Controllable Explanatory Factors).

In closing, here are my thoughts on the two priors. Reflecting on my understanding of the Conscious Prior, the idea of using a lower-dimensional conscious space to represent a “concept” seems to correspond well to the current understanding of how thoughts occur in human brains. This type of machine training should lead to models that are not only more easily trainable and generalizable, but also more explainable (e.g., model decisions explained by activation of a few meaningful concepts vs. activation of a large sparse array of neural weights). Fewer and more coherent explanatory factors, plus making these factors maximally independent from one another, makes the factors’ effects easier to tease apart. I see a parallel between Independent Controllable Explanatory Factors and the idea of learning with equivariance, which Dr. Bengio pointed out has led to discoveries of good representations in many visual learning tasks. Notably, not all features of an aspect of environments — such as the aspect of color — are information that an agent can act upon (i.e., controllable). So there must be other strategies the human brain uses to learn those concepts in generalizable ways.

How did Bengio’s group measure the tendency of complex models to latch on to surface statistical regularities rather than generalizable features?

Joy Rimchala is a Data Scientist in Applied Machine Learning Research in Intuit’s Technology Futures Group. She has implemented a synthetic data approach to overcome the lack of limited label data in computer vision and natural language settings, and has built model pipelines with TensorFlow, PyTorch and AWS SageMaker. Joy is currently leading the initiative on information extraction from images of structured documents using ideas from computer vision, natural language models, and representation learning. Joy holds a PhD from MIT, where she spent five years doing biological object tracking experiments, and modeling them using Markov Decision Processes.

First and foremost, I would like to thank Heather White for her coaching and for constructive feedback. Heather’s comments and suggestions brought much clarity and readability to the article. I also would like to thank Alex Gude, Andrew Mattarella-Micke, Conrad De Peuter, Riley Edmunds, Sricharan Kumar, Sumayah Rahman and Yang Li and for advice and feedbacks on the technical aspects of the article.

Add a comment

Related posts:

Do Wall Decals Damage Paint?

There is no straight-forward answer to this question as it depends upon several factors to include what type of adhesive is applied to the decals or stickers you are purchasing, the type of…

Breaking Dependencies

Dependencies are one of the biggest challenges to getting legacy code into a test harness. Especially dependencies on the Android framework. In the Android world, having a dependency on an activity…

Como una manzanita

Te ofrecemos esta vez varios consejos que vienen muy bien a la hora de mantener una apariencia saludable. “Sana como una manzana”, acostumbraban a decir los mayores al referirse a cualquier persona…