What happens when drones and people sync their vision?

Multiple recon drones in the sky all suddenly aim their cameras at a person of interest on the ground, synced to what persons on the ground see …

That could be a reality soon, thanks to an agreement just announced by the mysterious SICdrone, an unmanned aircraft system manufacturer, and CrowdOptic, an “interactive streaming platform that connects the world through smart devices.”

A CrowdOptic “cluster” — multiple people focused on the same object.  (credit: CrowdOptic)

CrowdOptic’s technology lets a “cluster” (multiple people or objects) point their cameras or smartphones at the same thing (say, at a concert or sporting event), with different views, allowing for group chat or sharing content.

Drone air control

For SICdrone, the idea is to use CrowdOptic tech to automatically orchestrate the drones’ onboard cameras to track and capture multiple camera angles (and views) of a single point of interest.* Beyond that, this tech could provide vital flight-navigation systems to coordinate multiple drones without having them conflict (or crash), says CrowdOptic CEO Jon Fisher.

This disruptive innovation might become essential (and mandated by law?) as AmazonFlirtey, and others compete to dominate drone delivery. It could also possibly help with the growing concern about drone risk to airplanes.**

Other current (and possible) users of CrowdOptics tech include first responders, news and sports reporting, advertising analytics (seeing what people focus on), linking up augmented-reality and VR headset users, and “social TV” (live attendees — using the Periscope app, for example — provide the most interesting video to people watching at home), Fisher explained to KurzweilAI.

* This uses several CrowdOptic patents (U.S. Patents 8,527,340, 9,020,832, and 9,264,474).

** Drone Comes Within 200 Feet Of Passenger Jet Coming In To Land At LAX

TED releases Meta 2 augmented-reality presentation video

TED just released the full video of Meta CEO Meron Gribetz’s preview of Meta’s next-generation augmented reality (AR) technology at the TED 2016 conference on Feb. 17. It can be found online at metavision.com and TED.com.

The presentation, which Forbes said “dazzles TED crowd” and received a standing ovation from TED attendees, dramatically showcases the capabilities of the Meta 2 Development Kit. Launched two weeks ago, the Meta 2 kit is now available for pre-order at $949 at metavision.com. (Also see “First ‘natural machine’ augmented reality product Meta 2 launches to developers.)

First ‘natural machine’ augmented reality product Meta 2 launches to developers

Meta 2 (credit: Meta)

Last month, Meta CEO Meron Gribetz wowed TED with a sneak peak at the company’s new Meta 2 augmented-reality product. Today, Meta announced that the Meta 2 Development Kit is now available for pre-orders.

Meta 2′s Iron-Man-like immersive functionality appears similar to Hololens and Magic Leap, but with a wider 90-degree field of view, 2560 x 1440 high-DPI display, and natural hand-controlled operation.


Meta | Meta 2 Development Kit — Launch Video

Technology pundit and futurist Robert Scoble called Meta 2 “the most important new product since the original Macintosh.”

In his TED Talk, Gribetz, a neuroscientist, described a neuroscience-based design that “merges the art of user interface design with the science of the brain, creating ‘natural machines’ that feel like extensions of ourselves rather than the other way around.”

Meta 2 was designed with input from nearly 1000 companies that were users of the first-generation Meta 1 (see “Meta’s AR headset lets you play with virtual objects in 3D space“). Meta believes the new version has the potential to “fundamentally change the way people collaborate, communicate and engage with information and each other, including medicine, education, and manufacturing.”

With Meta, you’ll be able to directly grab items you’re interested in and
interact with them. (credit: Meta)

Importantly, Meta 2 is fully hand-controlled; no input device required. Meta says types of AR application include 3D modeling (has advantages over 3D on a screen), web browsing (add holograms to any existing webpage) and remote collaboration (colleagues can view and manipulate holograms with their hands).

Meta 2 has an on-board color camera (720p) and four speakers for near-ear audio. It supports Windows-based applications (Mac later this year) and initially requires a modern computer running Windows 8 or 10.

Steve Mann with 3 of his inventions: EyeTap Digital Eye Glass, smartwatch, and SWIM (Sequential Wave Imprinting Machine) phenomenological augmented reality. (credit: Steve Mann)

Meta’s chief scientist is legendary inventor Steve Mann, PhD., a professor at the University of Toronto (see “First attack on a cyborg“).


Meta | Meta Pioneers: Holograam

Real or computer-generated: can you tell the difference?

Which of these are photos vs. computer-generated images? (credit: Olivia Holmes et al./ACM Transactions on Applied Perception) (credit: Olivia Holmes et al./ACM Transactions on Applied Perception)

As computer-generated characters become increasingly photorealistic, people are finding it harder to distinguish between real and computer-generated, a Dartmouth College-led study has found.

This has introduced complex forensic and legal issues*, such as how to distinguish between computer-generated and photographic images of child pornography, says Hany Farid, a professor of computer science, pioneering researcher in digital forensics at Dartmouth, and senior author of a paper in the journal ACM Transactions on Applied Perception.

“This can be problematic when a photograph is introduced into a court of law and the jury has to assess its authenticity,” Farid says.

Training helps … for now

In their study, Farid’s team conducted perceptual experiments in which 60 high-quality computer-generated and photographic images of men’s and women’s faces were shown to 250 observers. Each observer was asked to classify each image as either computer generated or photographic. Observers correctly classified photographic images 92 percent of the time, but correctly classified computer-generated images only 60 percent of the time.

The top row images are all computer-generated, paired here with their photographic matches below (credit: Olivia Holmes et al./ACM Transactions on Applied Perception)

But in a follow-up experiment, when the researchers provided a second set of observers with some training before the experiment, their accuracy on classifying photographic images fell slightly to 85 percent but their accuracy on computer-generated images jumped to 76 percent.

With or without training, observers performed much worse than Farid’s team observed five years ago in a study when computer-generated imagery was not as photorealistic.

When humans can no longer judge what’s real

“We expect that human observers will be able to continue to perform this task for a few years to come, but eventually we will have to refine existing techniques and develop new computational methods that can detect fine-grained image details that may not be identifiable by the human visual system,” says Farid.

The study, which included Dartmouth student Olivia Holmes and Professor Martin Banks at the University of California, Berkeley, was supported by the National Science Foundation.

* Legal background

  • In 1996, Congress passed the Child Pornography Prevention Act (CPPA), which made illegal “any visual depiction including any photograph, film, video, picture or computer-generated image that is, or appears to be, of a minor engaging in sexually explicit conduct.”
  • In 2002, the U.S. Supreme Court ruled that the CPPA infringed on the First Amendment and classified computer-generated child pornography as protected speech. As a result, defense attorneys need only claim their client’s images of child pornography are computer generated.
  • In 2003, Congress passed the PROTECT Act, which classified computer generated child pornography as “obscene,” but this law didn’t eliminate the so-called “virtual defense” because juries are reluctant to send a defendant to prison for merely possessing computer-generated imagery when no real child was harmed.

Abstract of Assessing and Improving the Identification of Computer-Generated Portraits

Modern computer graphics are capable of generating highly photorealistic images. Although this can be considered a success for the computer graphics community, it has given rise to complex forensic and legal issues. A compelling example comes from the need to distinguish between computer-generated and photographic images as it pertains to the legality and prosecution of child pornography in the United States. We performed psychophysical experiments to determine the accuracy with which observers are capable of distinguishing computer-generated from photographic images. We find that observers have considerable difficulty performing this task—more difficulty than we observed 5 years ago when computer-generated imagery was not as photorealistic. We also find that observers are more likely to report that an image is photographic rather than computer generated, and that resolution has surprisingly little effect on performance. Finally, we find that a small amount of training greatly improves accuracy.

Physical rehab and athlete training in VR

ICSPACE exercise feedback display (credit: ICSPACE)

A virtual “intelligent coaching space” (ICSPACE) developed by Cluster of Excellence Cognitive Interaction Technology (CITEC) at Bielefeld University in Germany is assisting patients with physical rehabilitation and helping athletes improve their performance with sports exercises.

The user is 3D-scanned in advance and used to create an avatar. Participants wear 3D stereoscopic glasses, which create the impression of working out in a gym with a coach. Reflective markers attached to the user are tracked by infrared cameras and a virtual display allows users to watch themselves from various angles to see how they are performing the exercises and make improvements.

Mistakes made during movement exercises, such as bending one’s neck too far during a squat, are depicted in the display in an exaggerated way to draw attention to the error.

A virtual coach instructs a user how to do a squat (credit: CITEC/Bielefeld University)

A virtual coach is also available and can mark individual body areas on the display to show needed improvement. A slow-motion video of the user performing the exercise can demonstrate correct motions.

“The planned range of activities will include gymnastics exercises, tai chi, yoga, or, for example, how to swing a golf club,” says cognitive scientist Professor Thomas Schack.

The research is funded by the German Research Foundation (DFG).


Research TV Bielefeld University | ICSPACE: Exercise training in virtual reality


Abstract of Multi-Level Analysis of Motor Actions as a Basis for Effective Coaching in Virtual Reality

In order to effectively support motor learning in Virtual Reality, real-time analysis of motor actions performed by the athlete is essential. Most recent work in this area rather focuses on feedback strategies, and not primarily on systematic analysis of the motor action to be learnt. Aiming at a high-level understanding of the performed motor action, we introduce a two-level approach. On the one hand, we focus on a hierarchical motor performance analysis performed online in a VR environment. On the other hand, we introduce an analysis of cognitive representation as a complement for a thorough analysis of motor action.

Physical rehab and athlete training in VR

ICSPACE exercise feedback display (credit: ICSPACE)

A virtual “intelligent coaching space” (ICSPACE) developed by Cluster of Excellence Cognitive Interaction Technology (CITEC) at Bielefeld University in Germany is assisting patients with physical rehabilitation and helping athletes improve their performance with sports exercises.

The user is 3D-scanned in advance and used to create an avatar. Participants wear 3D stereoscopic glasses, which create the impression of working out in a gym with a coach. Reflective markers attached to the user are tracked by infrared cameras and a virtual display allows users to watch themselves from various angles to see how they are performing the exercises and make improvements.

Mistakes made during movement exercises, such as bending one’s neck too far during a squat, are depicted in the display in an exaggerated way to draw attention to the error.

A virtual coach instructs a user how to do a squat (credit: CITEC/Bielefeld University)

A virtual coach is also available and can mark individual body areas on the display to show needed improvement. A slow-motion video of the user performing the exercise can demonstrate correct motions.

“The planned range of activities will include gymnastics exercises, tai chi, yoga, or, for example, how to swing a golf club,” says cognitive scientist Professor Thomas Schack.

The research is funded by the German Research Foundation (DFG).


Research TV Bielefeld University | ICSPACE: Exercise training in virtual reality


Abstract of Multi-Level Analysis of Motor Actions as a Basis for Effective Coaching in Virtual Reality

In order to effectively support motor learning in Virtual Reality, real-time analysis of motor actions performed by the athlete is essential. Most recent work in this area rather focuses on feedback strategies, and not primarily on systematic analysis of the motor action to be learnt. Aiming at a high-level understanding of the performed motor action, we introduce a two-level approach. On the one hand, we focus on a hierarchical motor performance analysis performed online in a VR environment. On the other hand, we introduce an analysis of cognitive representation as a complement for a thorough analysis of motor action.

Open-source GPU could push computing power to the next level


Nvidia | Mythbusters Demo GPU versus CPU

Binghamton University researchers have developed Nyami, a synthesizable graphics processor unit (GPU) architectural model for general-purpose and graphics-specific workloads, and have run a series of experiments on it to see how different hardware and software configurations would affect the circuit’s performance.

Binghamton University computer science assistant professor Timothy Miller said the results will help other scientists make their own GPUs and “push computing power to the next level.”

GPUs are typically found on commercial video or graphics cards inside of a computer or gaming console. The specialized circuits have computing power designed to make images appear smoother and more vibrant on a screen. There has recently been a movement to see if the chip can also be applied to non-graphical computations, such as algorithms processing large chunks of data.

GPU programming model “unfamiliar”

Maxwell, Nvidia’s most powerful GPU architecture (credit: Nvidia)

“In terms of performance per Watt and performance per cubic meter, GPUs can outperform CPUs by orders of magnitude on many important workloads,” the researchers note in an open-access paper by Miller and other authors in International Symposium on Performance Analysis of Systems and Software (Jeff Bush, the director of software engineering at Roku, was lead author).

“The adoption of GPUs into HPC [high-performance computing] has therefore been both a major boost in performance and a shift in how supercomputers are programmed. Unfortunately, this shift has suffered slow adoption because the GPU programming model is unfamiliar to those who are accustomed to writing software for traditional CPUs.”

This slow adoption was a result of GPU manufacturers’ decision to keep their chip specifications secret, said Miller. “That prevented open source developers from writing software that could utilize that hardware. Nyami makes it easier for other researchers to conduct experiments of their own, because they don’t have to reinvent the wheel. With contributions from the ‘open hardware’ community, we can incorporate more creative ideas and produce an increasingly better tool.

“The ramifications of the findings could make processors easier for researchers to work with and explore different design tradeoffs. We can also use [Nyami] as a platform for conducting research that isn’t GPU-specific, like energy efficiency and reliability,” he added.


Abstract of Nyami: a synthesizable GPU architectural model for general-purpose and graphics-specific workloads

Graphics processing units (GPUs) continue to grow in popularity for general-purpose, highly parallel, high-throughput systems. This has forced GPU vendors to increase their focus on general purpose workloads, sometimes at the expense of the graphics-specific workloads. Using GPUs for general-purpose computation is a departure from the driving forces behind programmable GPUs that were focused on a narrow subset of graphics rendering operations. Rather than focus on purely graphics-related or general-purpose use, we have designed and modeled an architecture that optimizes for both simultaneously to efficiently handle all GPU workloads. In this paper, we present Nyami, a co-optimized GPU architecture and simulation model with an open-source implementation written in Verilog. This approach allows us to more easily explore the GPU design space in a synthesizable, cycle-precise, modular environment. An instruction-precise functional simulator is provided for co-simulation and verification. Overall, we assume a GPU may be used as a general-purpose GPU (GPGPU) or a graphics engine and account for this in the architecture’s construction and in the options and modules selectable for synthesis and simulation. To demonstrate Nyami’s viability as a GPU research platform, we exploit its flexibility and modularity to explore the impact of a set of architectural decisions. These include sensitivity to cache size and associativity, barrel and switch-on-stall multithreaded instruction scheduling, and software vs. hardware implementations of rasterization. Through these experiments, we gain insight into commonly accepted GPU architecture decisions, adapt the architecture accordingly, and give examples of the intended use as a GPU research tool.

How to create a synthesized actor performance in post-production

Given a pair of facial performances (horizontal and vertical faces, left), a new performance (film strip, right) can be blended (credit: Charles Malleson et al./Disney Research)

Disney Research has devised a way to blend an actor’s facial performances from a few or multiple takes to allow a director to get just the right emotion, instead of re-shooting the scene multiple times.

“It’s not unheard of for a director to re-shoot a crucial scene dozens of times, even 100 or more times, until satisfied,” said Markus Gross, vice president of research at Disney Research. “That not only takes a lot of time — it also can be quite expensive. Now our research team has shown that a director can exert control over an actor’s performance after the shoot with just a few takes, saving both time and money.”

And the work can be done in post-production, rather than on an expensive film set.

How FaceDirector works

Developed jointly with the University of Surrey, the system, called FaceDirector, works with normal 2D video input acquired by standard cameras, without the need for additional hardware or 3D face reconstruction.

“The central challenge for combining an actor’s performances from separate takes is video synchronization,” said Jean-Charles Bazin, associate research scientist at Disney Research. “But differences in head pose, emotion, expression intensity, as well as pitch accentuation and even the wording of the speech, are just a few of many difficulties in syncing video takes.”

The system analyzes both facial expressions and audio cues, then identifies frames that correspond between the takes, using a graph-based framework. Once this synchronization has occurred, the system enables a director to control the performance by choosing the desired facial expressions and timing from either video, which are then blended together using facial landmarks, optical flow, and compositing.


Disney Research | FaceDirector: Continuous Control of Facial Performance in Video

To test the system, actors performed several lines of dialog, repeating the performances to convey different emotions – happiness, sadness, excitement, fear, anger, etc. The line readings were captured in HD resolution using standard compact cameras. The researchers were able to synchronize the videos in real-time and automatically on a standard desktop computer. Users could generate novel versions of the performances by interactively blending the video takes.

Multiple uses

The researchers showed how it could be used for a variety of purposes, including generation of multiple performances from just a few video takes (for use elsewhere in the video), for script correction and editing, and switching between voices (for example to create an entertaining performance with a sad voice over a happy face).

Speculation: It might also be possible to use this to create a fake video in which a person’s different facial expressions are combined, along with audio clips, to make a person show apparently inappropriate emotions, for example.

The researchers will present their findings at ICCV 2015, the International Conference on Computer Vision, Dec. 11–18, in Santiago, Chile.


Abstract of FaceDirector: Continuous Control of Facial Performance in Video

We present a method to continuously blend between multiple facial performances of an actor, which can contain different facial expressions or emotional states. As an example, given sad and angry video takes of a scene, our method empowers a movie director to specify arbitrary weighted combinations and smooth transitions between the two takes in post-production. Our contributions include (1) a robust nonlinear audio-visual synchronization technique that exploits complementary properties of audio and visual cues to automatically determine robust, dense spatio-temporal correspondences between takes, and (2) a seamless facial blending approach that provides the director full control to interpolate timing, facial expression, and local appearance, in order to generate novel performances after filming. In contrast to most previous works, our approach operates entirely in image space, avoiding the need of 3D facial reconstruction. We demonstrate that our method can synthesize visually believable performances with applications in emotion transition, performance correction, and timing control.

How to create a synthesized actor performance in post-production

Given a pair of facial performances (horizontal and vertical faces, left), a new performance (film strip, right) can be blended (credit: Charles Malleson et al./Disney Research)

Disney Research has devised a way to blend an actor’s facial performances from a few or multiple takes to allow a director to get just the right emotion, instead of re-shooting the scene multiple times.

“It’s not unheard of for a director to re-shoot a crucial scene dozens of times, even 100 or more times, until satisfied,” said Markus Gross, vice president of research at Disney Research. “That not only takes a lot of time — it also can be quite expensive. Now our research team has shown that a director can exert control over an actor’s performance after the shoot with just a few takes, saving both time and money.”

And the work can be done in post-production, rather than on an expensive film set.

How FaceDirector works

Developed jointly with the University of Surrey, the system, called FaceDirector, works with normal 2D video input acquired by standard cameras, without the need for additional hardware or 3D face reconstruction.

“The central challenge for combining an actor’s performances from separate takes is video synchronization,” said Jean-Charles Bazin, associate research scientist at Disney Research. “But differences in head pose, emotion, expression intensity, as well as pitch accentuation and even the wording of the speech, are just a few of many difficulties in syncing video takes.”

The system analyzes both facial expressions and audio cues, then identifies frames that correspond between the takes, using a graph-based framework. Once this synchronization has occurred, the system enables a director to control the performance by choosing the desired facial expressions and timing from either video, which are then blended together using facial landmarks, optical flow, and compositing.


Disney Research | FaceDirector: Continuous Control of Facial Performance in Video

To test the system, actors performed several lines of dialog, repeating the performances to convey different emotions – happiness, sadness, excitement, fear, anger, etc. The line readings were captured in HD resolution using standard compact cameras. The researchers were able to synchronize the videos in real-time and automatically on a standard desktop computer. Users could generate novel versions of the performances by interactively blending the video takes.

Multiple uses

The researchers showed how it could be used for a variety of purposes, including generation of multiple performances from just a few video takes (for use elsewhere in the video), for script correction and editing, and switching between voices (for example to create an entertaining performance with a sad voice over a happy face).

Speculation: It might also be possible to use this to create a fake video in which a person’s different facial expressions are combined, along with audio clips, to make a person show apparently inappropriate emotions, for example.

The researchers will present their findings at ICCV 2015, the International Conference on Computer Vision, Dec. 11–18, in Santiago, Chile.


Abstract of FaceDirector: Continuous Control of Facial Performance in Video

We present a method to continuously blend between multiple facial performances of an actor, which can contain different facial expressions or emotional states. As an example, given sad and angry video takes of a scene, our method empowers a movie director to specify arbitrary weighted combinations and smooth transitions between the two takes in post-production. Our contributions include (1) a robust nonlinear audio-visual synchronization technique that exploits complementary properties of audio and visual cues to automatically determine robust, dense spatio-temporal correspondences between takes, and (2) a seamless facial blending approach that provides the director full control to interpolate timing, facial expression, and local appearance, in order to generate novel performances after filming. In contrast to most previous works, our approach operates entirely in image space, avoiding the need of 3D facial reconstruction. We demonstrate that our method can synthesize visually believable performances with applications in emotion transition, performance correction, and timing control.

MIT invention could boost resolution of 3-D depth cameras 1,000-fold

By combining the information from the Kinect depth frame in (a) with polarized photographs, MIT researchers reconstructed the 3-D surface shown in (c). Polarization cues can allow coarse depth sensors like Kinect to achieve laser scan quality (b). (credit: courtesy of the researchers)

MIT researchers have shown that by exploiting light polarization (as in polarized sunglasses) they can increase the resolution of conventional 3-D imaging devices such as the Microsoft Kinect as much as 1,000 times.

The technique could lead to high-quality 3-D cameras built into cellphones, and perhaps the ability to snap a photo of an object and then use a 3-D printer to produce a replica. Further out, the work could also help the development of driverless cars.

Headed Ramesh Raskar, associate professor of media arts and sciences in the MIT Media Lab, the researchers describe the new system, which they call Polarized 3D, in a paper they’re presenting at the International Conference on Computer Vision in December.

How polarized light works

If an electromagnetic wave can be thought of as an undulating squiggle, polarization refers to the squiggle’s orientation. It could be undulating up and down, or side to side, or somewhere in-between.

Polarization also affects the way in which light bounces off of physical objects. If light strikes an object squarely, much of it will be absorbed, but whatever reflects back will have the same mix of polarizations (horizontal and vertical) that the incoming light did. At wider angles of reflection, however, light within a certain range of polarizations is more likely to be reflected.

This is why polarized sunglasses are good at cutting out glare: Light from the sun bouncing off asphalt or water at a low angle features an unusually heavy concentration of light with a particular polarization. So the polarization of reflected light carries information about the geometry of the objects it has struck.

This relationship has been known for centuries, but it’s been hard to do anything with it, because of a fundamental ambiguity about polarized light. Light with a particular polarization, reflecting off of a surface with a particular orientation and passing through a polarizing lens is indistinguishable from light with the opposite polarization, reflecting off of a surface with the opposite orientation.

This means that for any surface in a visual scene, measurements based on polarized light offer two equally plausible hypotheses about its orientation. Canvassing all the possible combinations of either of the two orientations of every surface, in order to identify the one that makes the most sense geometrically, is a prohibitively time-consuming computation.

Polarization plus depth sensing

To resolve this ambiguity, the Media Lab researchers use coarse depth estimates provided by some other method, such as the time a light signal takes to reflect off of an object and return to its source. Even with this added information, calculating surface orientation from measurements of polarized light is complicated, but it can be done in real-time by a graphics processing unit, the type of special-purpose graphics chip found in most video game consoles.

The researchers’ experimental setup consisted of a Microsoft Kinect — which gauges depth using reflection time — with an ordinary polarizing photographic lens placed in front of its camera. In each experiment, the researchers took three photos of an object, rotating the polarizing filter each time, and their algorithms compared the light intensities of the resulting images.

On its own, at a distance of several meters, the Kinect can resolve physical features as small as a centimeter or so across. But with the addition of the polarization information, the researchers’ system could resolve features in the range of hundreds of micrometers, or one-thousandth the size.

For comparison, the researchers also imaged several of their test objects with a high-precision laser scanner, which requires that the object be inserted into the scanner bed. Polarized 3D still offered the higher resolution.

Uses in cameras and self-driving cars

A mechanically rotated polarization filter would probably be impractical in a cellphone camera, but grids of tiny polarization filters that can overlay individual pixels in a light sensor are commercially available. Capturing three pixels’ worth of light for each image pixel would reduce a cellphone camera’s resolution, but no more than the color filters that existing cameras already use.

The new paper also offers the tantalizing prospect that polarization systems could aid the development of self-driving cars. Today’s experimental self-driving cars are, in fact, highly reliable under normal illumination conditions, but their vision algorithms go haywire in rain, snow, or fog.

That’s because water particles in the air scatter light in unpredictable ways, making it much harder to interpret. The MIT researchers show that in some very simple test cases their system can exploit information contained in interfering waves of light to handle scattering.


Abstract of Polarized 3D: High-Quality Depth Sensing with Polarization Cues

Coarse depth maps can be enhanced by using the shape information from polarization cues. We propose a framework to combine surface normals from polarization (hereafter polarization normals) with an aligned depth map. Polarization normals have not been used for depth enhancement before. This is because polarization normals suffer from physics-based artifacts, such as azimuthal ambiguity, refractive distortion and fronto-parallel signal degradation. We propose a framework to overcome these key challenges, allowing the benefits of polarization to be used to enhance depth maps. Our results demonstrate improvement with respect to state-of-the-art 3D reconstruction techniques