#### ABSTRACT

Gated Linear Networks (GLNs) are a recently introduced family of deep neural network architectures. Their distinguishing feature is the use of hard gating and local learning to add representational power, as opposed to the more widely used combination of non-linear transfer functions and backpropagation. The simultaneous interaction of local learning and gating gives rise to very different learning dynamics whose range of applications are only just starting to be explored. They can be viewed as a machine learning specific generalization of the PAQ family of context mixing networks, which are a key component of state of the art online language/data compression models. In particular a new form of gating, half-space gating, was proposed to deal with real-valued vector feature spaces. This formulation enjoys universality guarantees, empirical capacity that compares favourably with Deep ReLu networks and has recently been shown to give state of the art results in contextual bandits and regression applications. The key observation was to interpret context mixing networks as data dependant linear networks, explain the local learning procedure in terms of online convex programming to model a feature dependent target density, to abstract and understand the role of various gating functions and their effect on representation power and finally show that certain forms of gating allow for universal learning. Here will we present a unified view of these architectures, discuss their strengths and current limitations, and highlight promising directions for future investigation. Joint work with Avishkar Bhoopchand, David Budden, Agnieszka Grabska-Barwinska, Marcus Hutter, Tor Lattimore, Adam Marblestone, Christopher Mattern, Eren Sezener, Peter Toth, Jianan Wang, Simon Schmitt, Greg Wayne.