Alvin Zhang
AboutI am a currently a research engineer at Matician, working on perception for autonomous robots. Over the past summer, I worked with Dr. Bruno Olshausen in the Redwood Center for Theoretical Neuroscience at UC Berkeley. Previously, I recieved a B.Sc. in Electrical Engineering and Computer Science from UC Berkeley. GoalsI am seeking a Ph.D. in Computer Vision or Robotics. Research InterestsMy goal is to build robust, intelligent autonomous systems.
After receiving my Bachelor’s from UC Berkeley, I spent two years in industry developing algorithms for real-world robots. To my surprise, across several applications, I found that it was preferable to use classical methods and that learning-based approaches should be used as a last resort. Why is this? Shouldn’t a system that learns be more robust? The hallmark of intelligence is the ability to adapt. During training time, neural networks adjust their weights to improve their performance through repeated exposure to examples and feedback. However, at test-time, they lose this learning capability: a neural network presented with a scenario outside of its training distribution may perform arbitrarily poorly. Worse, it would not provide any signal of this poor performance. This lack of robustness has catastrophic consequences for real-world robots; this necessitates that any deep-learning-based module be carefully isolated from the rest of the system. This analysis has led me to the following proposal: PROPOSAL: If we are to trust a learning-based module in a real-world application, then it must have a mechanism to evaluate its performance and improve its behavior, regardless of the input. To achieve this, I propose to enforce self-consistency in learned systems, not just during training, but also at run-time. This not only provides a measure of confidence in the robot’s internal representation of the world, but is also a natural framework for continual learning. My approach is a natural extension of self-supervised representation learning and it can leverage exactly the same techniques. However, by removing the dichotomy between “train-time” and “test-time”, this approach enables on-line system evaluation and continual learning. For example, a photometric loss can be used to supervise training for stereo matching or optical flow. This is the end of the story for current “deep” approaches – at test-time, a red pixel can still be matched to a green pixel with high confidence. Instead, by using the photometric loss at test-time, the robot gets a signal that the prediction is incorrect, which can then be used to refine the prediction, adapt behavior, and supervise further training. More generally, the theory of “Perception as Inference” casts perceptual inference as an on-line optimization problem. It posits that the goal of perception is to infer, from observations, an actionable internal state that reflects an agent’s environment. This internal state should be verified by further observations; any inconsistencies should be identified and resolved, either through careful exploratory actions or an internal reasoning process. In particular, for robotics, this suggests that visual perception should be grounded in touch, since that is the way that robots ultimately interact with the world. ProjectsSee my Projects (Serious) and Projects (Fun) pages. Also check out my code libaries for tensor dimension-naming and coordinate-transform type-safety! PublicationsA. Zhang, “Generalized Skill Learning, Safety, and Exploration with Flow-Based Models”, Workshop on Task-Agnostic Reinforcement Learning, International Conference on Learning Representations, 2019. |