deepvision.data61.csiro.au
Open in
urlscan Pro
2405:b000:e00:213::139
Public Scan
Submitted URL: http://deepvision.data61.csiro.au/
Effective URL: https://deepvision.data61.csiro.au/
Submission: On June 02 via api from US — Scanned from US
Effective URL: https://deepvision.data61.csiro.au/
Submission: On June 02 via api from US — Scanned from US
Form analysis
0 forms found in the DOMText Content
CVPR 2018 WORKSHOP June 18th, 2018, Salt Lake City, Utah * Program * Invited Speakers * Organizing Committee * Description * Previous Edition'2017 * Previous Edition'2016 * Previous Edition'2015 * Previous Edition'2014 Sponsored by * CONTACT Jose Alvarez CALL FOR PAPERS Download CfP NEWS PROGRAM 9:15 Welcome 9:20 Kevin Murphy (Google) 10:00 Morning Break 10:30 Josef Sivic (INRIA) 11:05 Adriana Romero (Facebook AI) 11:40 Olga Russakovsky (Princeton) 12:25 Lunch 14:00 Vittorio Ferrari (Google) 14:35 Chris Re (Stanford) 15:10 Devi Parik (Georgia Tech and Facebook AI) 15:45 Afternoon Break & Poster Session List of Extended Abstracts (Posters) A probabilistic constrained clustering for transfer learning and image category discovery, Yen-Chang Hsu, Zhaoyang Lv, Joel Schlosser, Phillip Odom, Zsolt Kira Near-field Depth Estimation using Monocular Fisheye Camera: A Semi-supervised learning approach using Sparse Velodyne Data, Varun Ravi Kumar, Stefan Milz, Martin Simon, Christian Witt, Karl Amende, Johannes Petzold, Senthil Yogamani, Timo Pech Comparison of Deep Learning Models for Semantic Segmentation on Domain Specific Data in Food Processing, Nicolas Loerbroks, Piyawat Suwanvithaya; Isabel Schwende Material Segmentation from Local Appearance and Global Context, Gabriel Schwartz, Ko Nishino Fusion Scheme for Semantic and Instance-level Segmentation, Arthur Costea, Andra Petrovai, Sergiu Nedevschi Weakly Supervised Object Localization via Sensitivity Analysis, Mohammad K. Ebrahimpour, David C. Noelle A Multi-Layer Approach to Superpixel-based Higher-order Conditional Random Field for Semantic Image Segmentation, Li Sulimowicz, Ishfaq Ahmad, alexander aved Two Stream Self-Supervised Learning for Action Recognition, Ahmed Taha, Moustafa Meshry, Xitong Yang, Yi-Ting Chen, Larry Davis Object Detection using Domain Randomization and Generative Adversarial Refinement of Synthetic Images, Fernando Camaro Nogues, Andrew Huie, Sakya Dasgupta Scaling Neural Programmer-Interpreter For Real-Life Tasks, Himadri Mishra, K K Shukla Fast and Light-weight Unsupervised Depth Estimation for Mobile GPU Hardware, Sangyun Oh, Jongeun Lee, Hye-Jin S. Kim Semi-supervised Learning: Fusion of Self-supervised, Supervised Learning, and Multimodal Cues for Tactical Driver Behavior Detection, Athmanarayanan Lakshmi narayanan, Yi-Ting Chen, Srikanth Malla Recurrent Neural Networks for Semantic Instance Segmentation, Amaia Salvador, Míriam Bellver, Manel Baradad, Victor Campos, Ferran Marques, Jordi Torres, Xavier Giro-i-Nieto Action2Vec: A Crossmodal Embedding Approach to Zero Shot Action Learning, Meera Hahn, Andrew Silva, James M. Rehg Generating superpixels with deep representations, Thomas Verelst, Maxim Berman, Matthew B. Blaschko INVITED SPEAKERS * KEVIN MURPHY, (GOOGLE, USA) TITLE: Generative models for Images ABSTRACT: In this talk, I summarize two recent generative models for images that we have developed. The first is a conditional model of color images, given an input gray image. The basic idea is to use a conditional auto regressive model to generate multiple, diverse low-resolution color images, and then to upsample them and use them to colorize the high-resolution gray image. For details, see "PixColor: Pixel Recursive Colorization", BMVC 2017. The second is a latent variable model of (color) images and attributes. The basic idea is to use a joint VAE to allow us to generate images of differing levels of abstraction, conditioned on attributes with differing degrees of missing information. For details, see "Generative Models of Visually Grounded Imagination", ICLR 2018. * DEVI PARIKH, (GEORGIA TECH AND FACEBOOK AI, USA) TITLE: Embodied Question Answering. ABSTRACT: Embodied Question Answering is a new AI task where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?"). In order to answer, the agent must first intelligently navigate to explore the environment, gather information through first-person (egocentric) vision, and then answer the question ("orange"). EmbodiedQA requires a range of AI skills � language understanding, visual recognition, active perception, goal-driven navigation, commonsense reasoning, long-term memory, and grounding language into actions. I will present a dataset of questions and answers in simulated indoor environments, evaluation metrics, and a hierarchical model trained with imitation and reinforcement learning. * CHRIS RE, (STANFORD, USA) TITLE: Software 2.0 and Snorkel: Beyond hand-labeled data ABSTRACT: In the last several years, deep learning models have simultaneously become more performant and more readily available as easy-to-use, commodity tools--however, their deployment in practice is bottlenecked by the need for large, hand-labeled training sets. This talk describes Snorkel, a system that focuses on this emerging training data bottleneck in the software 2.0 stack. In Snorkel, instead of tediously hand-labeling individual data items, a user implicitly defines large training sets by writing simple programs, called labeling functions, that label subsets of data points. This allows users to build high-quality models despite the fact that these labeling functions will have varying quality, coverage, and specificity--and be correlated in unknown ways. A key technical challenge in Snorkel is to estimate the quality and correlations among these labeling functions without hand-labeled data. This talk will explain a theory of learning without labeled data, and a host of recent applications in natural language processing, structured data problems, and computer vision. This talk will also briefly discuss recent extensions of these core ideas to automatically generating data augmentations, synthesizing training data, and learning from multi-task supervision. Snorkel is open source on github. Technical blog posts and tutorials are available at Snorkel.Stanford.edu. * ADRIANA ROMERO, (FACEBOOK AI, USA) TITLE: Graph Attention Networks ABSTRACT: In recent years, deep learning has achieved remarkable results in many computer vision, speech and natural language processing problems. However, many interesting tasks involve data that can not be represented in a grid-like structure and that instead lies in an irregular domain. This is the case of 3D meshes, social networks, biological networks or brain connectomes. Such data can usually be represented in the form of graphs. In this talk, I will present our recent work on Graph Attention Networks (GATs). I will start by reviewing early approaches to leverage neural networks for processing graph structured data, with special emphasis on graph convolutions, highlighting potential issues and motivating our work. Then, I will introduce GATs, a novel neural network architecture that leverages masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. Finally, I will discuss the results we obtained on well established transductive and inductive benchmarks; and show some recent application of our model to mesh-based parcellation of the cerebral cortex. * JOSEF SIVIC, (INRIA, FRANCE) TITLE: Weakly supervised learning for visual recognition ABSTRACT: The current successes in visual recognition are, in large part, due to a combination of learnable visual representations, supervised machine learning techniques and large-scale carefully annotated image collections. In this talk, I will argue that in order to build machines that understand the changing visual world around us the next challenges lie in developing visual representations that generalize to never seen before conditions and are learnable in a weakly supervised manner, i.e. from noisy and only partially annotated data. I will show examples of our work in this direction with applications in understanding narrated instructional videos, visual localization across changing conditions or finding visual correspondence. * OLGA RUSSAKOVSKY, (PRINCETON, USA) TITLE: ABSTRACT: * VITTORIO FERRARI, (GOOGLE, SWITZERLAND) TITLE: Knowledge transfer and human-machine collaboration for training visual models ABSTRACT: Object class detection and segmentation are challenging tasks that typically requires tedious and time consuming manual annotation for training. In this talk I will present three techniques we recently developed for reducing this effort. In the first part I will explore a knowledge transfer scenario: training object detectors for target classes with only image-level labels, helped by a set of source classes with bounding-box annotations. In the second and third parts I will consider human-machine collaboration scenarios (for annotating bounding-boxes of one object class, and for annotating the class label and approximate segmentation of every object and background region in an image). ORGANIZING COMMITEE * Jose M. Alvarez, Data61 (CSIRO), Australia * Nathan Silberman, 4Catalyzer, USA * Dhruv Batra, Facebook AI Research / Georgia Tech, USA * Yann LeCun, Facebook AI Research / NYU, USA DESCRIPTION OF THE WORKSHOP Most of the major advances in Deep Learning have come from supervised learning. Despite these successes, supervised learning algorithms are characterized by a major limitation: they necessitate massive amounts of carefully, and typically expensively, annotated data. This workshop will emphasis future directions beyond supervised learning such as reinforcement learning and weakly supervised learning. Such approaches require far less supervision and allow computers to learn beyond mimicking what is explicitly encoded in a large-scale set of annotations. We encourage researchers to formulate innovative learning theories, feature representations, and end-to-end vision systems based on deep learning. We also encourage new theories and processes for dealing with large scale image datasets through deep learning architectures. We are soliciting original contributions that address a wide range of theoretical and practical issues including, but not limited to: * Large scale image and video understanding with limited annotations: * Video classification * Object recognition * Object tracking * Scene understanding * Industrial and medical applications * Theoretical foundations of unsupervised learning. * Unsupervised feature learning and feature selection. * Deep learning in mobile platforms and embedded systems. * Advancements in semi-supervised learning and transfer learning algorithms. * Inference and optimization. * Applications of unsupervised learning. * Deep learning for robotics. * Lifelong learning. * Reinforcement learning. As main difference with previous years, for this edition of the workshop, papers are meant to be extended abstracts showing current / preliminary / novel results to encourage discussion during the workshop. Statcounter © 2017 Data61, CSIRO Australia. All Rights Reserved.