How to turn audio clips into realistic lip-synced video


UW (University of Washington) | UW researchers create realistic video from audio files alone

University of Washington researchers at the UW Graphics and Image Laboratory have developed new algorithms that turn audio clips into a realistic, lip-synced video, starting with an existing video of  that person speaking on a different topic.

As detailed in a paper to be presented Aug. 2 at  SIGGRAPH 2017, the team successfully generated a highly realistic video of former president Barack Obama talking about terrorism, fatherhood, job creation and other topics, using audio clips of those speeches and existing weekly video addresses in which he originally spoke on a different topic decades ago.

Realistic audio-to-video conversion has practical applications like improving video conferencing for meetings (streaming audio over the internet takes up far less bandwidth than video, reducing video glitches), or holding a conversation with a historical figure in virtual reality, said Ira Kemelmacher-Shlizerman, an assistant professor at the UW’s Paul G. Allen School of Computer Science & Engineering.


Supasorn Suwajanakorn | Teaser — Synthesizing Obama: Learning Lip Sync from Audio

This beats previous audio-to-video conversion processes, which have involved filming multiple people in a studio saying the same sentences over and over to try to capture how a particular sound correlates to different mouth shapes, which is expensive, tedious and time-consuming. The new machine learning tool may also help overcome the “uncanny valley” problem, which has dogged efforts to create realistic video from audio.

How to do it

A neural network first converts the sounds from an audio file into basic mouth shapes. Then the system grafts and blends those mouth shapes onto an existing target video and adjusts the timing to create a realistic, lip-synced video of the person delivering the new speech. (credit: University of Washington)

1. Find or record a video of the person (or use video chat tools like Skype to create a new video) for the neural network to learn from. There are millions of hours of video that already exist from interviews, video chats, movies, television programs and other sources, the researchers note. (Obama was chosen because there were hours of presidential videos in the public domain.)

2. Train the neural network to watch videos of the person and translate different audio sounds into basic mouth shapes.

3. The system then uses the audio of an individual’s speech to generate realistic mouth shapes, which are then grafted onto and blended with the head of that person. Use a small time shift to enable the neural network to anticipate what the person is going to say next.

4. Currently, the neural network is designed to learn on one individual at a time, meaning that Obama’s voice — speaking words he actually uttered — is the only information used to “drive” the synthesized video. Future steps, however, include helping the algorithms generalize across situations to recognize a person’s voice and speech patterns with less data, with only an hour of video to learn from, for instance, instead of 14 hours.

Fakes of fakes

So the obvious question is: Can you use someone else’s voice on a video (assuming enough videos)? The researchers said they decided against going down the path, but they didn’t say it was impossible.

Even more pernicious: the original video person’s words (not just the voice) could be faked using Princeton/Adobe’s “VoCo” software (when available) — simply by editing a text transcript of their voice recording — or the fake voice itself could be modified.

Or Disney Research’s FaceDirector could be used to edit recorded substitute facial expressions (along with the fake voice) into the video.

However, by reversing the process — feeding video into the neural network instead of just audio — one could also potentially develop algorithms that could detect whether a video is real or manufactured, the researchers note.

The research was funded by Samsung, Google, Facebook, Intel, and the UW Animation Research Labs. You can contact the research team at audiolipsync@cs.washington.edu.


Abstract of Synthesizing Obama: Learning Lip Sync from Audio

Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes. Given the mouth shape at each time instant, we synthesize high quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track. Our approach produces photorealistic results.

How to ‘talk’ to your computer or car with hand or body poses

Researchers at Carnegie Mellon University’s Robotics Institute have developed a system that can detect and understand body poses and movements of multiple people from a video in real time — including, for the first time, the pose of each individual’s fingers.

The ability to recognize finger or hand poses, for instance, will make it possible for people to interact with computers in new and more natural ways, such as simply pointing at things.

That will also allow robots to perceive you’re doing, what moods you’re in, and whether you can be interrupted, for example. Your self-driving car could get an early warning that a pedestrian is about to step into the street by monitoring your body language. The technology could also be used for behavioral diagnosis and rehabilitation for conditions such as autism, dyslexia, and depression, the researchers say.

This new method was developed at CMU’s NSF-funded Panoptic Studio, a two-story dome embedded with 500 video cameras, but the researchers can now do the same thing with a single camera and laptop computer.

The researchers have released their computer code. It’s already being widely used by research groups, and more than 20 commercial groups, including automotive companies, have expressed interest in licensing the technology, according to Yaser Sheikh, associate professor of robotics.

Tracking multiple people in real time, particularly in social situations where they may be in contact with each other, presents a number of challenges. Sheikh and his colleagues took a bottom-up approach, which first localizes all the body parts in a scene — arms, legs, faces, etc. — and then associates those parts with particular individuals.

Sheikh and his colleagues will present reports on their multiperson and hand-pose detection methods at CVPR 2017, the Computer Vision and Pattern Recognition Conference, July 21–26 in Honolulu.

Radical new vertically integrated 3D chip design combines computing and data storage

Four vertical layers in new 3D nanosystem chip. Top (fourth layer): sensors and more than one million carbon-nanotube field-effect transistor (CNFET) logic inverters; third layer, on-chip non-volatile RRAM (1 Mbit memory); second layer, CNFET logic with classification accelerator (to identify sensor inputs); first (bottom) layer, silicon FET logic. (credit: Max M. Shulaker et al./Nature)

A radical new 3D chip that combines computation and data storage in vertically stacked layers — allowing for processing and storing massive amounts of data at high speed in future transformative nanosystems — has been designed by researchers at Stanford University and MIT.

The new 3D-chip design* replaces silicon with carbon nanotubes (sheets of 2-D graphene formed into nanocylinders) and integrates resistive random-access memory (RRAM) cells.

Carbon-nanotube field-effect transistors (CNFETs) are an emerging transistor technology that can scale beyond the limits of silicon MOSFETs (conventional chips), and promise an order-of-magnitude improvement in energy-efficient computation. However, experimental demonstrations of CNFETs so far have been small-scale and limited to integrating only tens or hundreds of devices (see earlier 2015 Stanford research, “Skyscraper-style carbon-nanotube chip design…”).

The researchers integrated more than 1 million RRAM cells and 2 million carbon-nanotube field-effect transistors in the chip, making it the most complex nanoelectronic system ever made with emerging nanotechnologies, according to the researchers. RRAM is an emerging memory technology that promises high-capacity, non-volatile data storage, with improved speed, energy efficiency, and density, compared to dynamic random-access memory (DRAM).

Instead of requiring separate components, the RRAM cells and carbon nanotubes are built vertically over one another, creating a dense new 3D computer architecture** with interleaving layers of logic and memory. By using ultradense through-chip vias (electrical interconnecting wires passing between layers), the high delay with conventional wiring between computer components is eliminated.

The new 3D nanosystem can capture massive amounts of data every second, store it directly on-chip, perform in situ processing of the captured data, and produce “highly processed” information. “Such complex nanoelectronic systems will be essential for future high-performance, highly energy-efficient electronic systems,” the researchers say.

How to combine computation and storage

Illustration of separate CPU (bottom) and RAM memory (top) in current computer architecture (images credit: iStock)

The new chip design aims to replace current chip designs, which separate computing and data storage, resulting in limited-speed connections.

Separate 2D chips have been required because “building conventional silicon transistors involves extremely high temperatures of over 1,000 degrees Celsius,” explains lead author Max Shulaker, an assistant professor of electrical engineering and computer science at MIT and lead author of a paper published July 5, 2017 in the journal Nature. “If you then build a second layer of silicon circuits on top, that high temperature will damage the bottom layer of circuits.”

Instead, carbon nanotube circuits and RRAM memory can be fabricated at much lower temperatures: below 200 C. “This means they can be built up in layers without harming the circuits beneath,” says Shulaker.

Overcoming communication and computing bottlenecks

As applications analyze increasingly massive volumes of data, the limited rate at which data can be moved between different chips is creating a critical communication “bottleneck.” And with limited real estate on increasingly miniaturized chips, there is not enough room to place chips side-by-side.

At the same time, embedded intelligence in areas ranging from autonomous driving to personalized medicine is now generating huge amounts of data, but silicon transistors are no longer improving at the historic rate that they have for decades.

Instead, three-dimensional integration is the most promising approach to continue the technology-scaling path set forth by Moore’s law, allowing an increasing number of devices to be integrated per unit volume, according to Jan Rabaey, a professor of electrical engineering and computer science at the University of California at Berkeley, who was not involved in the research.

Three-dimensional integration “leads to a fundamentally different perspective on computing architectures, enabling an intimate interweaving of memory and logic,” he says. “These structures may be particularly suited for alternative learning-based computational paradigms such as brain-inspired systems and deep neural nets, and the approach presented by the authors is definitely a great first step in that direction.”

The new 3D design provides several benefits for future computing systems, including:

  • Logic circuits made from carbon nanotubes can be an order of magnitude more energy-efficient compared to today’s logic made from silicon.
  • RRAM memory is denser, faster, and more energy-efficient compared to conventional DRAM (dynamic random-access memory) devices.
  • The dense through-chip vias (wires) can enable vertical connectivity that is 1,000 times more dense than conventional packaging and chip-stacking solutions allow, which greatly improves the data communication bandwidth between vertically stacked functional layers. For example, each sensor in the top layer can connect directly to its respective underlying memory cell with an inter-layer via. This enables the sensors to write their data in parallel directly into memory and at high speed.
  • The design is compatible in both fabrication and design with today’s CMOS silicon infrastructure.

Shulaker next plans to work with Massachusetts-based semiconductor company Analog Devices to develop new versions of the system.

This work was funded by the Defense Advanced Research Projects Agency, the National Science Foundation, Semiconductor Research Corporation, STARnet SONIC, and member companies of the Stanford SystemX Alliance.

* As a working-prototype demonstration of the potential of the technology, the researchers took advantage of the ability of carbon nanotubes to also act as sensors. On the top layer of the chip, they placed more than 1 million carbon nanotube-based sensors, which they used to detect and classify ambient gases for detecting signs of disease by sensing particular compounds in a patient’s breath, says Shulaker. By layering sensing, data storage, and computing, the chip was able to measure each of the sensors in parallel, and then write directly into its memory, generating huge bandwidth in just one device, according to Shulaker. The top layer could be replaced with additional computation or data storage subsystems, or with other forms of input/output, he explains.

** Previous R&D in 3D chip technologies and their limitations are covered here, noting that “in general, 3D integration is a broad term that includes such technologies as 3D wafer-level packaging (3DWLP); 2.5D and 3D interposer-based integration; 3D stacked ICs (3D-SICs), monolithic 3D ICs; 3D heterogeneous integration; and 3D systems integration.” The new Stanford-MIT nanosystem design significantly expands this definition.


Abstract of Three-dimensional integration of nanotechnologies for computing and data storage on a single chip

The computing demands of future data-intensive applications will greatly exceed the capabilities of current electronics, and are unlikely to be met by isolated improvements in transistors, data storage technologies or integrated circuit architectures alone. Instead, transformative nanosystems, which use new nanotechnologies to simultaneously realize improved devices and new integrated circuit architectures, are required. Here we present a prototype of such a transformative nanosystem. It consists of more than one million resistive random-access memory cells and more than two million carbon-nanotube field-effect transistors—promising new nanotechnologies for use in energy-efficient digital logic circuits and for dense data storage—fabricated on vertically stacked layers in a single chip. Unlike conventional integrated circuit architectures, the layered fabrication realizes a three-dimensional integrated circuit architecture with fine-grained and dense vertical connectivity between layers of computing, data storage, and input and output (in this instance, sensing). As a result, our nanosystem can capture massive amounts of data every second, store it directly on-chip, perform in situ processing of the captured data, and produce ‘highly processed’ information. As a working prototype, our nanosystem senses and classifies ambient gases. Furthermore, because the layers are fabricated on top of silicon logic circuitry, our nanosystem is compatible with existing infrastructure for silicon-based technologies. Such complex nano-electronic systems will be essential for future high-performance and highly energy-efficient electronic systems.

Carbon nanotubes found safe for reconnecting damaged neurons

(credit: Polina Shuvaeva/iStock)

Multiwall carbon nanotubes (MWCNTs) could safely help repair damaged connections between neurons by serving as supporting scaffolds for growth or as connections between neurons.

That’s the conclusion of an in-vitro (lab) open-access study with cultured neurons (taken from the hippcampus of neonatal rats) by a multi-disciplinary team of scientists in Italy and Spain, published in the journal Nanomedicine: Nanotechnology, Biology, and Medicine.

A multi-walled carbon nanotube (credit: Eric Wieser/CC)

The study addressed whether MWCNTs that are interfaced to neurons affect synaptic transmission by modifying the lipid (fatty) cholesterol structure in artificial neural membranes.

Significantly, they found that MWCNTs:

  • Facilitate the full growth of neurons and the formation of new synapses. “This growth, however, is not indiscriminate and unlimited since, as we proved, after a few weeks, a physiological balance is attained.”
  • Do not interfere with the composition of lipids (cholesterol in particular), which make up the cellular membrane in neurons.
  • Do not interfere in the transmission of signals through synapses.

The researchers also noted that they recently reported (in an open access paper) low tissue reaction when multiwall carbon nanotubes were implanted in vivo (in live animals) to reconnect damaged spinal neurons.

The researchers say they proved that carbon nanotubes “perform excellently in terms of duration, adaptability and mechanical compatibility with tissue” and that “now we know that their interaction with biological material, too, is efficient. Based on this evidence, we are already studying an in vivo application, and preliminary results appear to be quite promising in terms of recovery of lost neurological functions.”

The research team comprised scientists from SISSA (International School for Advanced Studies), the University of Trieste, ELETTRA Sincrotrone, and two Spanish institutions, Basque Foundation for Science and CIC BiomaGUNE.


Abstract of Sculpting neurotransmission during synaptic development by 2D nanostructured interfaces

Carbon nanotube-based biomaterials critically contribute to the design of many prosthetic devices, with a particular impact in the development of bioelectronics components for novel neural interfaces. These nanomaterials combine excellent physical and chemical properties with peculiar nanostructured topography, thought to be crucial to their integration with neural tissue as long-term implants. The junction between carbon nanotubes and neural tissue can be particularly worthy of scientific attention and has been reported to significantly impact synapse construction in cultured neuronal networks. In this framework, the interaction of 2D carbon nanotube platforms with biological membranes is of paramount importance. Here we study carbon nanotube ability to interfere with lipid membrane structure and dynamics in cultured hippocampal neurons. While excluding that carbon nanotubes alter the homeostasis of neuronal membrane lipids, in particular cholesterol, we document in aged cultures an unprecedented functional integration between carbon nanotubes and the physiological maturation of the synaptic circuits.

Meditation, yoga, and tai chi can reverse damaging effects of stress, new study suggests

Gentle exercise like tai chi can reduce the risk of inflammation-related diseases like cancer and accelerated aging. (credit: iStock)

Mind-body interventions such as meditation, yoga*, and tai chi can reverse the molecular reactions in our DNA that cause ill-health and depression, according to a study by scientists at the universities of Coventry and Radboud.

When a person is exposed to a stressful event, their sympathetic nervous system (responsible for the “fight-or-flight” response) is triggered, which increases production of a molecule called nuclear factor kappa B (NF-kB). That molecule then activates genes to produce proteins called cytokines that cause inflammation at the cellular level, affecting the body, brain, and immune system.

That’s useful as a short-lived fight-or-flight reaction. However, if persistent, it leads to a higher risk of cancer, accelerated aging, and psychiatric disorders like depression.

But in a paper published June 16, 2017 in the open-access journal Frontiers in Immunology, the researchers reveal findings of 18 studies (featuring 846 participants over 11 years) indicating that people who practice mind-body interventions exhibit the opposite effect. They showed a decrease in production of NF-kB and cytokines — reducing the pro-inflammatory gene expression pattern and the risk of inflammation-related diseases and conditions.

David Gorski, MD, PhD, has published a critique of this study here. (Lead author Ivana Burić has replied in the comments below.)

Lowering risks from sitting

Brisk walks can offset health hazards of sitting (credit: iStock)

In addition to stress effects, increased sitting is known to be associated with an increased risk of cardiovascular disease, diabetes, and death from all causes.

But regular two-minute brisk walks every 30 minutes (in addition to daily 30-minute walks) significantly reduce levels of triglyceride (lipid, or fatty acid) levels that lead to clogged arteries, researchers from New Zealand’s University of Otago report in a paper published June 19, 2017 in the Journal of Clinical Lipidology.**

The lipid levels were measured in response to a meal consumed around 24 hours after starting the activity. High levels of triglycerides are linked to hardening of the arteries and other cardiovascular conditions.

They previously found that brisk walks for two minutes every 30 minutes also lower blood glucose and insulin levels.

OK, it’s time for that two-minute brisk walk. … So, you’re still sitting there, aren’t you? :)

* However, yoga causes musculoskeletal pain in more than 10 per cent of practitioners per year, according to recent research at the University of Sydney published in the Journal of Bodywork and Movement Therapies. “We also found that yoga can exacerbate existing pain, with 21 per cent of existing injuries made worse by doing yoga, particularly pre-existing musculoskeletal pain in the upper limbs,” said lead researcher Associate Professor Evangelos Pappas from the University’s Faculty of Health Sciences.

“In terms of severity, more than one-third of cases of pain caused by yoga were serious enough to prevent yoga participation and lasted more than 3 months.” The study found that most “new” yoga pain was in the upper extremities (shoulder, elbow, wrist, hand), possibly due to downward dog and similar postures that put weight on the upper limbs. However, 74 per cent of participants in the study reported that existing pain was actually improved by yoga, highlighting the complex relationship between musculoskeletal pain and yoga practice.

** Scientists at the University of Utah School of Medicine previously came to a similar conclusion in a 2015 paper published in the Clinical Journal of the American Society of Nephrology (CJASN).

They used observational data from the National Health and Nutrition Examination Survey (NHANES) to examine whether longer durations of low-intensity activities (e.g., standing) vs. light-intensity activities (e.g., casual walking, light gardening, cleaning) extend the lifespan of people who are sedentary for more than half of their waking hours.

They found that adding two minutes of low-intensity activities every hour (plus 2.5 hours of moderate exercise each week, which strengthens the heart, muscles, and bones) was associated with a 33 percent lower risk of dying. “It was fascinating to see the results because the current national focus is on moderate or vigorous activity,” says lead author Srinivasan Beddhu, M.D., professor of internal medicine. “To see that light activity had an association with lower mortality is intriguing.”

UPDATE July 5, 2017 — Added mention of a critique to the Coventry–Radboud study.

 

 

‘Mind reading’ technology identifies complex thoughts, using machine learning and fMRI

(Top) Predicted brain activation patterns and semantic features (colors) for two pairs of sentences. (Left: “The flood damaged the hospital”; (Right): “The storm destroyed the theater.” (Bottom) observed similar activation patterns and semantic features. (credit: Jing Wang et al./Human Brain Mapping)

By combining machine-learning algorithms with fMRI brain imaging technology, Carnegie Mellon University (CMU) scientists have discovered, in essense, how to “read minds.”

The researchers used functional magnetic resonance imaging (fMRI) to view how the brain encodes various thoughts (based on blood-flow patterns in the brain). They discovered that the mind’s building blocks for constructing complex thoughts are formed, not by words, but by specific combinations of the brain’s various sub-systems.

Following up on previous research, the findings, published in Human Brain Mapping (open-access preprint here) and funded by the U.S. Intelligence Advanced Research Projects Activity (IARPA), provide new evidence that the neural dimensions of concept representation are universal across people and languages.

“One of the big advances of the human brain was the ability to combine individual concepts into complex thoughts, to think not just of ‘bananas,’ but ‘I like to eat bananas in evening with my friends,’” said CMU’s Marcel Just, the D.O. Hebb University Professor of Psychology in the Dietrich College of Humanities and Social Sciences. “We have finally developed a way to see thoughts of that complexity in the fMRI signal. The discovery of this correspondence between thoughts and brain activation patterns tells us what the thoughts are built of.”

Goal: A brain map of all types of knowledge

(Top) Specific brain regions associated with the four large-scale semantic factors: people (yellow), places (red), actions and their consequences (blue), and feelings (green). (Bottom) Word clouds associated with each large-scale semantic factor underlying sentence representations. These word clouds comprise the seven “neurally plausible semantic features” (such as “high-arousal”) most associated with each of the four semantic factors. (credit: Jing Wang et al./Human Brain Mapping)

The researchers used 240 specific events (described by sentences such as “The storm destroyed the theater”) in the study, with seven adult participants. They measured the brain’s coding of these events using 42 “neurally plausible semantic features” — such as person, setting, size, social interaction, and physical action (as shown in the word clouds in the illustration above). By measuring the specific activation of each of these 42 features in a person’s brain system, the program could tell what types of thoughts that person was focused on.

The researchers used a computational model to assess how the detected brain activation patterns (shown in the top illustration, for example) for 239 of the event sentences corresponded to the detected neurally plausible semantic features that characterized each sentence. The program was then able to decode the features of the 240th left-out sentence. (For “cross-validation,” they did the same for the other 239 sentences.)

The model was able to predict the features of the left-out sentence with 87 percent accuracy, despite never being exposed to its activation before. It was also able to work in the other direction: to predict the activation pattern of a previously unseen sentence, knowing only its semantic features.

“Our method overcomes the unfortunate property of fMRI to smear together the signals emanating from brain events that occur close together in time, like the reading of two successive words in a sentence,” Just explained. “This advance makes it possible for the first time to decode thoughts containing several concepts. That’s what most human thoughts are composed of.”

“A next step might be to decode the general type of topic a person is thinking about, such as geology or skateboarding,” he added. “We are on the way to making a map of all the types of knowledge in the brain.”

Future possibilities

It’s conceivable that the CMU brain-mapping method might be combined one day with other “mind reading” methods, such as UC Berkeley’s method for using fMRI and computational models to decode and reconstruct people’s imagined visual experiences. Plus whatever Neuralink discovers.

Or if the CMU method could be replaced by noninvasive functional near-infrared spectroscopy (fNIRS), Facebook’s Building8 research concept (proposed by former DARPA head Regina Dugan) might be incorporated (a filter for creating quasi ballistic photons, avoiding diffusion and creating a narrow beam for precise targeting of brain areas, combined with a new method of detecting blood-oxygen levels).

Using fNIRS might also allow for adapting the method to infer thoughts of locked-in paralyzed patients, as in the Wyss Center for Bio and Neuroengineering research. It might even lead to ways to generally enhance human communication.

The CMU research is supported by the Office of the Director of National Intelligence (ODNI) via the Intelligence Advanced Research Projects Activity (IARPA) and the Air Force Research Laboratory (AFRL).

CMU has created some of the first cognitive tutors, helped to develop the Jeopardy-winning Watson, founded a groundbreaking doctoral program in neural computation, and is the birthplace of artificial intelligence and cognitive psychology. CMU also launched BrainHub, an initiative that focuses on how the structure and activity of the brain give rise to complex behaviors.


Abstract of Predicting the Brain Activation Pattern Associated With the Propositional Content of a Sentence: Modeling Neural Representations of Events and States

Even though much has recently been learned about the neural representation of individual concepts and categories, neuroimaging research is only beginning to reveal how more complex thoughts, such as event and state descriptions, are neurally represented. We present a predictive computational theory of the neural representations of individual events and states as they are described in 240 sentences. Regression models were trained to determine the mapping between 42 neurally plausible semantic features (NPSFs) and thematic roles of the concepts of a proposition and the fMRI activation patterns of various cortical regions that process different types of information. Given a semantic characterization of the content of a sentence that is new to the model, the model can reliably predict the resulting neural signature, or, given an observed neural signature of a new sentence, the model can predict its semantic content. The models were also reliably generalizable across participants. This computational model provides an account of the brain representation of a complex yet fundamental unit of thought, namely, the conceptual content of a proposition. In addition to characterizing a sentence representation at the level of the semantic and thematic features of its component concepts, factor analysis was used to develop a higher level characterization of a sentence, specifying the general type of event representation that the sentence evokes (e.g., a social interaction versus a change of physical state) and the voxel locations most strongly associated with each of the factors.

How to capture videos of brains in real time

Individual neurons firing within a volume of brain tissue (credit: The Rockefeller University)

A team of scientists has peered into a mouse brain with light, capturing live neural activity of hundreds of individual neurons in a 3D section of tissue at video speed (30 Hz) in a single recording for the first time.

Besides serving as a powerful research tool, this discovery means it may now be possible to “alter stimuli in real time based on what we see going on in the animal’s brain,” said Rockefeller University’s Alipasha Vaziri, senior author of an open-access paper published June 26, 2017 in Nature Methods.

By dramatically reducing the time and computational resources required to generate such an image, the algorithm opens the door to more sophisticated experiments, says Vaziri, head of the Rockefeller Laboratory of Neurotechnology and Biophysics. “Our goal is to better understand brain function by monitoring the dynamics within densely interconnected, three-dimensional networks of neurons,” Vaziri explained.

The research “may open the door to a range of applications, including real-time whole-brain recording and closed-loop interrogation of neuronal population activity in combination with optogenetics and behavior,” the paper authors suggest.

Watching mice think in real time

The scientists first engineered the animals’ neurons to fluoresce (glow), using a method called optogenetics. The stronger the neural signal, the brighter the cells shine. To capture this activity, they used a technique known as “light-field microscopy,” in which an array of lenses generates views from a variety of perspectives. These images are then combined to create a three-dimensional rendering, using a new algorithm called “seeded iterative demixing” (SID) developed by the team.

Without the new algorithm, the individual neurons are difficult to distinguish. (credit: The Rockefeller University)

To record the activity of all neurons at the same time, their images have to be captured on a camera simultaneously. In earlier research, this has made it difficult to distinguish the signals emitted by all cells as the light from the mouse’s neurons bounces off the surrounding, opaque tissue. The neurons typically show up as an indistinct, flickering mass.

The SID algorithm now makes it possible to simultaneously capture both the location of the individual neurons and the timing of their signals within a three-dimensional section of brain containing multiple layers of neurons, down to a depth of 0.38 millimeters.* Vaziri and his colleagues were able to track the precise coordinates of hundreds of active neurons over an extended period of time in mice that were awake and had the option of walking on a customized treadmill.

CLARITY_stained

Three-dimensional view of stained hippocampus with Stanford University’s CLARITY system, showing fluorescent-expressing neurons (green), connecting interneurons (red) and supporting glia (blue). (Credit: Deisseroth lab)

Researchers were previously only able to look into brains of transparent organisms, such as the larvae of zebrafish. Stanford University scientists were able to image mouse brains in 3D (with the CLARITY system), but only for static images.

* “SID can capture neuronal dynamics in vivo within a volume of 900 × 900 × 260 μm located as deep as 380 μm in the mouse cortex or hippocampus at a 30-Hz volume rate while discriminating signals from neurons as close as 20 μm apart, at a computational cost three orders of magnitude less than that of frame-by-frame image reconstruction.” – Tobias Nöbauer et al./Nature Methods

UPDATE June 29, 2017 — Added: “The research ‘may open the door to a range of applications, including real-time whole-brain recording and closed-loop interrogation of neuronal population activity in combination with optogenetics and behavior,’ the paper authors suggest.”


Abstract of Video rate volumetric Ca2+ imaging across cortex using seeded iterative demixing (SID) microscopy

Light-field microscopy (LFM) is a scalable approach for volumetric Ca2+ imaging with high volumetric acquisition rates (up to 100 Hz). Although the technology has enabled whole-brain Ca2+ imaging in semi-transparent specimens, tissue scattering has limited its application in the rodent brain. We introduce seeded iterative demixing (SID), a computational source-extraction technique that extends LFM to the mammalian cortex. SID can capture neuronal dynamics in vivo within a volume of 900 × 900 × 260 μm located as deep as 380 μm in the mouse cortex or hippocampus at a 30-Hz volume rate while discriminating signals from neurons as close as 20 μm apart, at a computational cost three orders of magnitude less than that of frame-by-frame image reconstruction. We expect that the simplicity and scalability of LFM, coupled with the performance of SID, will open up a range of applications including closed-loop experiments.

Smart algorithm automatically adjusts exoskeletons for best walking performance

Walk this way: Metabolic feedback and optimization algorithm automatically tweaks exoskeleton for optimal performance. (credit: Kirby Witte, Katie Poggensee, Pieter Fiers, Patrick Franks & Steve Collins)

Researchers at the College of Engineering at Carnegie Mellon University (CMU) have developed a new automated feedback system for personalizing exoskeletons to achieve optimal performance.

Exoskeletons can be used to augment human abilities. For example, they can provide more endurance while walking, help lift a heavy load, improve athletic performance, and help a stroke patient walk again.

But current one-size-fits-all exoskeleton devices, despite their potential, “have not improved walking performance as much as we think they should,” said Steven Collins, a professor of Mechanical Engineering and senior author of a paper published published Friday June 23, 2017 in Science.

The problem: An exoskeleton needs to be adjusted (and re-adjusted) to work effectively for each user — currently, a time-consuming, iffy manual process.

So the CMU engineers developed a more effective “human-in-the-loop optimization” technique that measures the amount of energy the walker expends by monitoring their breathing* — automatically adjusting the exoskeleton’s ankle dynamics to minimize required human energy expenditure.**

Using real-time metabolic cost estimation for each individual, the CMU software algorithm, combined with versatile emulator hardware, optimized the exoskeleton torque pattern for one ankle while walking, running, and carrying a load on a treadmill. The algorithm automatically made optimized adjustments for each pattern, based on measurements of a person’s energy use for 32 different walking patterns over the course of an hour. (credit: Juanjuan Zhang et al./Science, adapted by KurzweilAI)

In a lab study with 11 healthy volunteers, the new technique resulted in an average reduction in effort of 24% compared to participants walking with the exoskeleton powered off. The technique yielded higher user benefits than in any exoskeleton study to date, including devices acting at all joints on both legs, according to the researchers.

* “In daily life, a proxy measure such as heart rate or muscle activity could be used for optimization, providing noisier but more abundant performance data.” — Juanjuan Zhang et al./Science

** Ankle torque in the lab study was determined by four parameters: peak torque, timing of peak torque, and rise and fall times. This method was chosen to allow comparisons to a prior study that used the same hardware.


Science/AAAS | Personalized Exoskeletons Are Taking Support One Step Farther


Abstract of Human-in-the-loop optimization of exoskeleton assistance during walking

Exoskeletons and active prostheses promise to enhance human mobility, but few have succeeded. Optimizing device characteristics on the basis of measured human performance could lead to improved designs. We have developed a method for identifying the exoskeleton assistance that minimizes human energy cost during walking. Optimized torque patterns from an exoskeleton worn on one ankle reduced metabolic energy consumption by 24.2 ± 7.4% compared to no torque. The approach was effective with exoskeletons worn on one or both ankles, during a variety of walking conditions, during running, and when optimizing muscle activity. Finding a good generic assistance pattern, customizing it to individual needs, and helping users learn to take advantage of the device all contributed to improved economy. Optimization methods with these features can substantially improve performance.

Tactile sensor lets robots gauge objects’ hardness and manipulate small tools

A GelSight sensor attached to a robot’s gripper enables the robot to determine precisely where it has grasped a small screwdriver, removing it from and inserting it back into a slot, even when the gripper screens the screwdriver from the robot’s camera. (credit: Robot Locomotion Group at MIT)

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have added sensors to grippers on robot arms to give robots greater sensitivity and dexterity. The sensor can judge the hardness of surfaces it touches, enabling a robot to manipulate smaller objects than was previously possible.

The “GelSight” sensor consists of a block of transparent soft rubber — the “gel” of its name — with one face coated with metallic paint. It is mounted on one side of a robotic gripper. When the paint-coated face is pressed against an object, the face conforms to the object’s shape and the metallic paint makes the object’s surface reflective. Mounted on the sensor opposite the paint-coated face of the rubber block are three colored lights at different angles and a single camera.

Humans gauge hardness by the degree to which the contact area between the object and our fingers changes as we press on it. Softer objects tend to flatten more, increasing the contact area. The MIT researchers used the same approach.

A GelSight sensor, pressed against each object manually, recorded how the contact pattern changed over time, essentially producing a short movie for each object. A neural network was then used to look for correlations between changes in contact patterns and hardness measurements. The resulting system takes frames of video as inputs and produces hardness scores with very high accuracy.

The researchers also designed control algorithms that use a computer vision system to guide the robot’s gripper toward a tool and then turn location estimation over to a GelSight sensor once the robot has the tool in hand.

“I think that the GelSight technology, as well as other high-bandwidth tactile sensors, will make a big impact in robotics,” says Sergey Levine, an assistant professor of electrical engineering and computer science at the University of California at Berkeley. “For humans, our sense of touch is one of the key enabling factors for our amazing manual dexterity. Current robots lack this type of dexterity and are limited in their ability to react to surface features when manipulating objects. If you imagine fumbling for a light switch in the dark, extracting an object from your pocket, or any of the other numerous things that you can do without even thinking — these all rely on touch sensing.”

The researchers presented their work in two papers at the International Conference on Robotics and Automation.


Wenzhen Yuan | Measuring hardness of fruits with GelSight sensor


Abstract of Tracking Objects with Point Clouds from Vision and Touch

We present an object-tracking framework that fuses point cloud information from an RGB-D camera with tactile information from a GelSight contact sensor. GelSight can be treated as a source of dense local geometric information, which we incorporate directly into a conventional point-cloud-based articulated object tracker based on signed-distance functions. Our implementation runs at 12 Hz using an online depth reconstruction algorithm for GelSight and a modified secondorder update for the tracking algorithm. We present data from hardware experiments demonstrating that the addition of contact-based geometric information significantly improves the pose accuracy during contact, and provides robustness to occlusions of small objects by the robot’s end effector.

Two drones see through walls in 3D using WiFi signals

Transmit and receive drones perform 3D imaging through walls using WiFi (credit: Chitra R. Karanam and Yasamin Mostofi/ACM)

Researchers at the University of California Santa Barbara have demonstrated the first three-dimensional imaging of objects through walls using an ordinary wireless signal.

Applications could include emergency search-and-rescue, archaeological discovery, and structural monitoring, according to the researchers. Other applications could include military and law-enforcement surveillance.

Calculating 3D images from WiFi signals

In the research, two octo-copters (drones) took off and flew outside an enclosed, four-sided brick structure whose interior was unknown to the drones. One drone continuously transmitted a WiFi signal; the other drone (located on a different side of the structure) received that signal and transmitted the changes in received signal strength (“RSSI”) during the flight to a computer, which then calculated 3D high-resolution images of the objects inside (which do not need to move).

Structure and resulting 3D image (credit: Chitra R. Karanam and Yasamin Mostofi/ACM)

Interestingly, the equipment is all commercially available: two drones with “yagi” antenna, WiFi router, Tango tablet (for real-time localization), and Raspberry Pi computer with network interface to record measurements.

This development builds on previous 2D work by professor Yasamin Mostofi’s lab, which has pioneered sensing and imaging with everyday radio frequency signals such as WiFi. Mostofi says the success of the 3D experiments is due to the drones’ ability to approach the area from several angles, and to new methodology* developed by her lab.

The research is described in an open-access paper published April 2017 in proceedings of the Association for Computing Machinery/Institute of Electrical and Electronics Engineers International Conference on Information Processing in Sensor Networks (IPSN).

A later paper by Technical University of Munich physicists also reported a system intended for 3D imaging with WiFi, but with only simulated (and cruder) images. (An earlier 2009 paper by Mostofi et al. also reported simulated results for 3D see-through imaging of structures.)

Block diagram of the 3D through-wall imaging system (credit: Chitra R. Karanam and Yasamin Mostofi/ACM)

* The researchers’ approach to enabling 3D through-wall imaging utilizes four tightly integrated key components, according to the paper.

(1) They proposed robotic paths that can capture the spatial variations in all three dimensions as much as possible, while maintaining the efficiency of the operation. 

(2) They modeled the three-dimensional unknown area of interest as a Markov Random Field to capture the spatial dependencies, and utilized a graph-based belief propagation approach to update the imaging decision of each voxel (the smallest unit of a 3D image) based on the decisions of the neighboring voxels. 

(3) To approximate the interaction of the transmitted wave with the area of interest, they used a linear wave model.

(4) They took advantage of the compressibility of the information content to image the area with a very small number of WiFi measurements (less than 4 percent).


Mostofi Lab | X-ray Eyes in the Sky: Drones and WiFi for 3D Through-Wall Imaging


Abstract of 3D Through-Wall Imaging with Unmanned Aerial Vehicles Using WiFi

In this paper, we are interested in the 3D through-wall imaging of a completely unknown area, using WiFi RSSI and Unmanned Aerial Vehicles (UAVs) that move outside of the area of interest to collect WiFi measurements. It is challenging to estimate a volume represented by an extremely high number of voxels with a small number of measurements. Yet many applications are time-critical and/or limited on resources, precluding extensive measurement collection. In this paper, we then propose an approach based on Markov random field modeling, loopy belief propagation, and sparse signal processing for 3D imaging based on wireless power measurements. Furthermore, we show how to design ecient aerial routes that are informative for 3D imaging. Finally, we design and implement a complete experimental testbed and show high-quality 3D robotic through-wall imaging of unknown areas with less than 4% of measurements.