Code & Data

Perceiving Systems The MIT License Active Domain Adaptation via Clustering Uncertainty-weighted Embeddings Generalizing deep neural networks to new target domains is critical to their real-world utility. While labeling data from the target domain, it is desirable to select a subset that is maximally-informative to be cost-effective (called Active Learning). The ADA-CLUE algorithm addresses the problem of Active Learning under a domain shift. The GitHub repo consists of code to train models with the ADA-CLUE algorithm for multiple source and target domain shifts. Pre-trained models are also available.

Perceiving Systems The MIT License Efficient Learning on Point Clouds with Basis Point Sets Basis Point Set (BPS) is a simple and efficient method for encoding 3D point clouds into fixed-length representations. It is based on a simple idea: select k fixed points in space and compute vectors from these basis points to the nearest points in a point cloud; use these vectors (or simply their norms) as features. The basis points are kept fixed for all the point clouds in the dataset, providing a fixed representation of every point cloud as a vector. This representation can then be used as input to arbitrary machine learning methods, in particular it can be used as input to off-...

Perceiving Systems The MIT License Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles Capturing human motion in natural scenarios means moving motion capture out of the lab and into the wild. Typical approaches rely on fixed, calibrated, cameras and reflective markers on the body, significantly limiting the motions that can be captured. To make motion capture truly unconstrained, we describe the first fully autonomous outdoor capture system based on flying vehicles. We use multiple micro-aerial-vehicles(MAVs), each equipped with a monocular RGB camera, an IMU, and a GPS receiver module. These detect the person, optimize their position, and localize themselves approximately. ...

Perceiving Systems The MIT License Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture from Images "In the Wild" We present the first method to perform automatic 3D pose, shape and texture capture of animals from images acquired in-the-wild. In particular, we focus on the problem of capturing 3D information about Grevy's zebras from a collection of images. We integrate the recent SMAL animal model into a network-based regression pipeline, which we train end-to-end on synthetically generated images with pose, shape, and background variation. We couple 3D pose and shape prediction with the task of texture synthesis, obtaining a full texture map of the animal from a single image. The predicted textur...

Perceiving Systems The MIT License Competitive Collaboration Competitive Collaboration is a generic framework in which networks learn to collaborate and compete, thereby achieving specific goals. Competitive Collaboration is a three player game consisting of two players competing for a resource that is regulated by a third player, moderator. This framework is similar in spirit to expectation-maximization (EM) but is formulated for neural network training.

Perceiving Systems The MIT License RingNet: Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision Code: We provide the inference code of RingNet. Please check the repository which is self explanatory. NoW Benchmark Dataset and Challenge: Please check the external link to download the data and participate in the challenge.

Perceiving Systems The MIT License VOCA: Capture, Learning, and Synthesis of 3D Speaking Styles VOCA (Voice Operated Character Animation) is a framework that takes a speech signal as input and realistically animates a wide range of adult faces. <p><strong>Code: </strong>We provide Python demo code that outputs a 3D head animation given a speech signal and a static 3D head mesh. The codebase further provides animation control to alter the speaking style, identity-dependent facial shape, and head pose (i.e. head rotation around the neck) during animation. The code further demonstrates how to sample 3D head meshes from the publicly available FLAME model, that can then be animated&nbs...

Perceiving Systems The MIT License Convolutional Mesh Autoencoders The code allows to build convolutional networks on mesh structures analogous to CNNs on images. The code includes mesh convolutions, and introduces downsampling and upsampling operators that can be directly applied to the mesh structure. The code implements a Convolution Mesh Autoencoder using the above mesh processing operators and achieves state of the art results on generating 3D facial meshes.

Perceiving Systems The MIT License Learning Human Optical Flow The optical flow of humans is well known to be useful for the analysis of human action. Given this, we devise an optical flow algorithm specifically for human motion and show that it is superior to generic flow methods. Designing a method by hand is impractical, so we develop a new training database of image sequences with ground truth optical flow. For this we use a 3D model of the human body and motion capture data to synthesize realistic flow fields. We then train a convolutional neural network to estimate human flow fields from pairs of images. Since many applications in human motion an...