deepvision.data61.csiro.au Open in urlscan Pro
2405:b000:e00:213::139  Public Scan

Submitted URL: http://deepvision.data61.csiro.au/
Effective URL: https://deepvision.data61.csiro.au/
Submission: On June 02 via api from US — Scanned from US

Form analysis 0 forms found in the DOM

Text Content

CVPR 2018 WORKSHOP

June 18th, 2018, Salt Lake City, Utah

 * Program
 * Invited Speakers
 * Organizing Committee
 * Description
 * Previous Edition'2017
 * Previous Edition'2016
 * Previous Edition'2015
 * Previous Edition'2014
   

Sponsored by
 * 


CONTACT

Jose Alvarez


CALL FOR PAPERS

Download CfP




 







NEWS


PROGRAM



9:15 Welcome 9:20 Kevin Murphy (Google) 10:00 Morning Break     10:30 Josef
Sivic (INRIA) 11:05 Adriana Romero (Facebook AI) 11:40 Olga Russakovsky
(Princeton) 12:25 Lunch     14:00 Vittorio Ferrari (Google) 14:35 Chris Re
(Stanford) 15:10 Devi Parik (Georgia Tech and Facebook AI) 15:45 Afternoon Break
& Poster Session

List of Extended Abstracts (Posters) A probabilistic constrained clustering for
transfer learning and image category discovery, Yen-Chang Hsu, Zhaoyang Lv, Joel
Schlosser, Phillip Odom, Zsolt Kira Near-field Depth Estimation using Monocular
Fisheye Camera: A Semi-supervised learning approach using Sparse Velodyne Data,
Varun Ravi Kumar, Stefan Milz, Martin Simon, Christian Witt, Karl Amende,
Johannes Petzold, Senthil Yogamani, Timo Pech Comparison of Deep Learning Models
for Semantic Segmentation on Domain Specific Data in Food Processing, Nicolas
Loerbroks, Piyawat Suwanvithaya; Isabel Schwende Material Segmentation from
Local Appearance and Global Context, Gabriel Schwartz, Ko Nishino Fusion Scheme
for Semantic and Instance-level Segmentation, Arthur Costea, Andra Petrovai,
Sergiu Nedevschi Weakly Supervised Object Localization via Sensitivity Analysis,
Mohammad K. Ebrahimpour, David C. Noelle A Multi-Layer Approach to
Superpixel-based Higher-order Conditional Random Field for Semantic Image
Segmentation, Li Sulimowicz, Ishfaq Ahmad, alexander aved Two Stream
Self-Supervised Learning for Action Recognition, Ahmed Taha, Moustafa Meshry,
Xitong Yang, Yi-Ting Chen, Larry Davis Object Detection using Domain
Randomization and Generative Adversarial Refinement of Synthetic Images,
Fernando Camaro Nogues, Andrew Huie, Sakya Dasgupta Scaling Neural
Programmer-Interpreter For Real-Life Tasks, Himadri Mishra, K K Shukla Fast and
Light-weight Unsupervised Depth Estimation for Mobile GPU Hardware, Sangyun Oh,
Jongeun Lee, Hye-Jin S. Kim Semi-supervised Learning: Fusion of Self-supervised,
Supervised Learning, and Multimodal Cues for Tactical Driver Behavior Detection,
Athmanarayanan Lakshmi narayanan, Yi-Ting Chen, Srikanth Malla Recurrent Neural
Networks for Semantic Instance Segmentation, Amaia Salvador, Míriam Bellver,
Manel Baradad, Victor Campos, Ferran Marques, Jordi Torres, Xavier Giro-i-Nieto
Action2Vec: A Crossmodal Embedding Approach to Zero Shot Action Learning, Meera
Hahn, Andrew Silva, James M. Rehg Generating superpixels with deep
representations, Thomas Verelst, Maxim Berman, Matthew B. Blaschko






INVITED SPEAKERS


 * KEVIN MURPHY, (GOOGLE, USA)


   TITLE:

   Generative models for Images


   ABSTRACT:

   In this talk, I summarize two recent generative models for images that we
   have developed. The first is a conditional model of color images, given an
   input gray image. The basic idea is to use a conditional auto regressive
   model to generate multiple, diverse low-resolution color images, and then to
   upsample them and use them to colorize the high-resolution gray image. For
   details, see "PixColor: Pixel Recursive Colorization", BMVC 2017. The second
   is a latent variable model of (color) images and attributes. The basic idea
   is to use a joint VAE to allow us to generate images of differing levels of
   abstraction, conditioned on attributes with differing degrees of missing
   information. For details, see "Generative Models of Visually Grounded
   Imagination", ICLR 2018.


 * DEVI PARIKH, (GEORGIA TECH AND FACEBOOK AI, USA)


   TITLE:

   Embodied Question Answering.


   ABSTRACT:

   Embodied Question Answering is a new AI task where an agent is spawned at a
   random location in a 3D environment and asked a question ("What color is the
   car?"). In order to answer, the agent must first intelligently navigate to
   explore the environment, gather information through first-person (egocentric)
   vision, and then answer the question ("orange"). EmbodiedQA requires a range
   of AI skills � language understanding, visual recognition, active
   perception, goal-driven navigation, commonsense reasoning, long-term memory,
   and grounding language into actions. I will present a dataset of questions
   and answers in simulated indoor environments, evaluation metrics, and a
   hierarchical model trained with imitation and reinforcement learning.


 * CHRIS RE, (STANFORD, USA)


   TITLE:

   Software 2.0 and Snorkel: Beyond hand-labeled data


   ABSTRACT:

   In the last several years, deep learning models have simultaneously become
   more performant and more readily available as easy-to-use, commodity
   tools--however, their deployment in practice is bottlenecked by the need for
   large, hand-labeled training sets. This talk describes Snorkel, a system that
   focuses on this emerging training data bottleneck in the software 2.0 stack.
   In Snorkel, instead of tediously hand-labeling individual data items, a user
   implicitly defines large training sets by writing simple programs, called
   labeling functions, that label subsets of data points. This allows users to
   build high-quality models despite the fact that these labeling functions will
   have varying quality, coverage, and specificity--and be correlated in unknown
   ways. A key technical challenge in Snorkel is to estimate the quality and
   correlations among these labeling functions without hand-labeled data. This
   talk will explain a theory of learning without labeled data, and a host of
   recent applications in natural language processing, structured data problems,
   and computer vision. This talk will also briefly discuss recent extensions of
   these core ideas to automatically generating data augmentations, synthesizing
   training data, and learning from multi-task supervision. Snorkel is open
   source on github. Technical blog posts and tutorials are available at
   Snorkel.Stanford.edu.


 * ADRIANA ROMERO, (FACEBOOK AI, USA)


   TITLE:

   Graph Attention Networks


   ABSTRACT:

   In recent years, deep learning has achieved remarkable results in many
   computer vision, speech and natural language processing problems. However,
   many interesting tasks involve data that can not be represented in a
   grid-like structure and that instead lies in an irregular domain. This is the
   case of 3D meshes, social networks, biological networks or brain connectomes.
   Such data can usually be represented in the form of graphs. In this talk, I
   will present our recent work on Graph Attention Networks (GATs). I will start
   by reviewing early approaches to leverage neural networks for processing
   graph structured data, with special emphasis on graph convolutions,
   highlighting potential issues and motivating our work. Then, I will introduce
   GATs, a novel neural network architecture that leverages masked
   self-attentional layers to address the shortcomings of prior methods based on
   graph convolutions or their approximations. Finally, I will discuss the
   results we obtained on well established transductive and inductive
   benchmarks; and show some recent application of our model to mesh-based
   parcellation of the cerebral cortex.


 * JOSEF SIVIC, (INRIA, FRANCE)


   TITLE:

   Weakly supervised learning for visual recognition


   ABSTRACT:

   The current successes in visual recognition are, in large part, due to a
   combination of learnable visual representations, supervised machine learning
   techniques and large-scale carefully annotated image collections. In this
   talk, I will argue that in order to build machines that understand the
   changing visual world around us the next challenges lie in developing visual
   representations that generalize to never seen before conditions and are
   learnable in a weakly supervised manner, i.e. from noisy and only partially
   annotated data. I will show examples of our work in this direction with
   applications in understanding narrated instructional videos, visual
   localization across changing conditions or finding visual correspondence.


 * OLGA RUSSAKOVSKY, (PRINCETON, USA)


   TITLE:


   ABSTRACT:


 * VITTORIO FERRARI, (GOOGLE, SWITZERLAND)


   TITLE:

   Knowledge transfer and human-machine collaboration for training visual models


   ABSTRACT:

   Object class detection and segmentation are challenging tasks that typically
   requires tedious and time consuming manual annotation for training. In this
   talk I will present three techniques we recently developed for reducing this
   effort. In the first part I will explore a knowledge transfer scenario:
   training object detectors for target classes with only image-level labels,
   helped by a set of source classes with bounding-box annotations. In the
   second and third parts I will consider human-machine collaboration scenarios
   (for annotating bounding-boxes of one object class, and for annotating the
   class label and approximate segmentation of every object and background
   region in an image).


ORGANIZING COMMITEE

 * Jose M. Alvarez, Data61 (CSIRO), Australia
 * Nathan Silberman, 4Catalyzer, USA
 * Dhruv Batra, Facebook AI Research / Georgia Tech, USA
 * Yann LeCun, Facebook AI Research / NYU, USA


DESCRIPTION OF THE WORKSHOP

Most of the major advances in Deep Learning have come from supervised learning.
Despite these successes, supervised learning algorithms are characterized by a
major limitation: they necessitate massive amounts of carefully, and typically
expensively, annotated data. This workshop will emphasis future directions
beyond supervised learning such as reinforcement learning and weakly supervised
learning. Such approaches require far less supervision and allow computers to
learn beyond mimicking what is explicitly encoded in a large-scale set of
annotations. We encourage researchers to formulate innovative learning theories,
feature representations, and end-to-end vision systems based on deep learning.
We also encourage new theories and processes for dealing with large scale image
datasets through deep learning architectures. We are soliciting original
contributions that address a wide range of theoretical and practical issues
including, but not limited to:

 * Large scale image and video understanding with limited annotations:
 * Video classification
 * Object recognition
 * Object tracking
 * Scene understanding
 * Industrial and medical applications
 * Theoretical foundations of unsupervised learning.
 * Unsupervised feature learning and feature selection.
 * Deep learning in mobile platforms and embedded systems.
 * Advancements in semi-supervised learning and transfer learning algorithms.
 * Inference and optimization.
 * Applications of unsupervised learning.
 * Deep learning for robotics.
 * Lifelong learning.
 * Reinforcement learning.

As main difference with previous years, for this edition of the workshop, papers
are meant to be extended abstracts showing current / preliminary / novel results
to encourage discussion during the workshop.


Statcounter

© 2017 Data61, CSIRO Australia. All Rights Reserved.