Human vs. deep-neural-network performance in object recognition

(credit: UC Santa Barbara)

Before you read this: look for toothbrushes in the photo above.

Did you notice the huge toothbrush on the left? Probably not. That’s because when humans search through scenes for a particular object, we often miss objects whose size is inconsistent with the rest of the scene, according to scientists in the Department of Psychological & Brain Sciences at UC Santa Barbara.

The scientists are investigating this phenomenon in an effort to better understand how humans and computers compare in doing visual searches. Their findings are published in the journal Current Biology.

Hiding in plain sight

“When something appears at the wrong scale, you will miss it more often because your brain automatically ignores it,” said UCSB professor Miguel Eckstein, who specializes in computational human vision, visual attention, and search.

The experiment used scenes of ordinary objects featured in computer-generated images that varied in color, viewing angle, and size, mixed with “target-absent” scenes. The researchers asked 60 viewers to search for these objects (e.g., toothbrush, parking meter, computer mouse) while eye-tracking software monitored the paths of their gaze.

The researchers found that people tended to miss the target more often when it was mis-scaled (too large or too small) — even when looking directly at the target object.

Computer vision, by contrast, doesn’t have this issue, the scientists reported. However, in the experiments, the researchers found that the most advanced form of computer vision — deep neural networks — had its own limitations.

Human search strategies that could improve computer vision

Red rectangle marks incorrect image identification as a cell phone by a deep-learning algorithm (credit: UC Santa Barbara)

For example, a CNN deep-learning neural net incorrectly identified a computer keyboard as a cell phone, based on similarity in shape and the location of the object in spatial proximity to a human hand (as would be expected of a cell phone). But for humans, the object’s size (compared to the nearby hands) is clearly seen as inconsistent with a cell phone.

“This strategy allows humans to reduce false positives when making fast decisions,” the researchers note in the paper.

“The idea is when you first see a scene, your brain rapidly processes it within a few hundred milliseconds or less, and then you use that information to guide your search towards likely locations where the object typically appears,” Eckstein said. “Also, you focus your attention on objects that are actually at the size that is consistent with the object that you’re looking for.”

That is, human brains use the relationships between objects and their context within the scene to guide their eyes — a useful strategy to process scenes rapidly, eliminate distractors, and reduce false positives.

This finding might suggest ways to improve computer vision by implementing some of the tricks the brain utilizes to reduce false positives, according to the researchers.

Future research

“There are some theories that suggest that people with autism spectrum disorder focus more on local scene information and less on global structure,” says Eckstein, who is contemplating a follow-up study. “So there is a possibility that people with autism spectrum disorder might miss the mis-scaled objects less often, but we won’t know that until we do the study.”

In the more immediate future, the team’s research will look into the brain activity that occurs when we view mis-scaled objects.

“Many studies have identified brain regions that process scenes and objects, and now researchers are trying to understand which particular properties of scenes and objects are represented in these regions,” said postdoctoral researcher Lauren Welbourne, whose current research concentrates on how objects are represented in the cortex, and how scene context influences the perception of objects.

“So what we’re trying to do is find out how these brain areas respond to objects that are either correctly or incorrectly scaled within a scene. This may help us determine which regions are responsible for making it more difficult for us to find objects if they are mis-scaled.”


Abstract of Humans, but Not Deep Neural Networks, Often Miss Giant Targets in Scenes

Even with great advances in machine vision, animals are still unmatched in their ability to visually search complex scenes. Animals from bees [ 1, 2 ] to birds [ 3 ] to humans [ 4–12 ] learn about the statistical relations in visual environments to guide and aid their search for targets. Here, we investigate a novel manner in which humans utilize rapidly acquired information about scenes by guiding search toward likely target sizes. We show that humans often miss targets when their size is inconsistent with the rest of the scene, even when the targets were made larger and more salient and observers fixated the target. In contrast, we show that state-of-the-art deep neural networks do not exhibit such deficits in finding mis-scaled targets but, unlike humans, can be fooled by target-shaped distractors that are inconsistent with the expected target’s size within the scene. Thus, it is not a human deficiency to miss targets when they are inconsistent in size with the scene; instead, it is a byproduct of a useful strategy that the brain has implemented to rapidly discount potential distractors.

Human vs. deep-neural-network performance in object recognition

(credit: UC Santa Barbara)

Before you read this: look for toothbrushes in the photo above.

Did you notice the huge toothbrush on the left? Probably not. That’s because when humans search through scenes for a particular object, we often miss objects whose size is inconsistent with the rest of the scene, according to scientists in the Department of Psychological & Brain Sciences at UC Santa Barbara.

The scientists are investigating this phenomenon in an effort to better understand how humans and computers compare in doing visual searches. Their findings are published in the journal Current Biology.

Hiding in plain sight

“When something appears at the wrong scale, you will miss it more often because your brain automatically ignores it,” said UCSB professor Miguel Eckstein, who specializes in computational human vision, visual attention, and search.

The experiment used scenes of ordinary objects featured in computer-generated images that varied in color, viewing angle, and size, mixed with “target-absent” scenes. The researchers asked 60 viewers to search for these objects (e.g., toothbrush, parking meter, computer mouse) while eye-tracking software monitored the paths of their gaze.

The researchers found that people tended to miss the target more often when it was mis-scaled (too large or too small) — even when looking directly at the target object.

Computer vision, by contrast, doesn’t have this issue, the scientists reported. However, in the experiments, the researchers found that the most advanced form of computer vision — deep neural networks — had its own limitations.

Human search strategies that could improve computer vision

Red rectangle marks incorrect image identification as a cell phone by a deep-learning algorithm (credit: UC Santa Barbara)

For example, a CNN deep-learning neural net incorrectly identified a computer keyboard as a cell phone, based on similarity in shape and the location of the object in spatial proximity to a human hand (as would be expected of a cell phone). But for humans, the object’s size (compared to the nearby hands) is clearly seen as inconsistent with a cell phone.

“This strategy allows humans to reduce false positives when making fast decisions,” the researchers note in the paper.

“The idea is when you first see a scene, your brain rapidly processes it within a few hundred milliseconds or less, and then you use that information to guide your search towards likely locations where the object typically appears,” Eckstein said. “Also, you focus your attention on objects that are actually at the size that is consistent with the object that you’re looking for.”

That is, human brains use the relationships between objects and their context within the scene to guide their eyes — a useful strategy to process scenes rapidly, eliminate distractors, and reduce false positives.

This finding might suggest ways to improve computer vision by implementing some of the tricks the brain utilizes to reduce false positives, according to the researchers.

Future research

“There are some theories that suggest that people with autism spectrum disorder focus more on local scene information and less on global structure,” says Eckstein, who is contemplating a follow-up study. “So there is a possibility that people with autism spectrum disorder might miss the mis-scaled objects less often, but we won’t know that until we do the study.”

In the more immediate future, the team’s research will look into the brain activity that occurs when we view mis-scaled objects.

“Many studies have identified brain regions that process scenes and objects, and now researchers are trying to understand which particular properties of scenes and objects are represented in these regions,” said postdoctoral researcher Lauren Welbourne, whose current research concentrates on how objects are represented in the cortex, and how scene context influences the perception of objects.

“So what we’re trying to do is find out how these brain areas respond to objects that are either correctly or incorrectly scaled within a scene. This may help us determine which regions are responsible for making it more difficult for us to find objects if they are mis-scaled.”


Abstract of Humans, but Not Deep Neural Networks, Often Miss Giant Targets in Scenes

Even with great advances in machine vision, animals are still unmatched in their ability to visually search complex scenes. Animals from bees [ 1, 2 ] to birds [ 3 ] to humans [ 4–12 ] learn about the statistical relations in visual environments to guide and aid their search for targets. Here, we investigate a novel manner in which humans utilize rapidly acquired information about scenes by guiding search toward likely target sizes. We show that humans often miss targets when their size is inconsistent with the rest of the scene, even when the targets were made larger and more salient and observers fixated the target. In contrast, we show that state-of-the-art deep neural networks do not exhibit such deficits in finding mis-scaled targets but, unlike humans, can be fooled by target-shaped distractors that are inconsistent with the expected target’s size within the scene. Thus, it is not a human deficiency to miss targets when they are inconsistent in size with the scene; instead, it is a byproduct of a useful strategy that the brain has implemented to rapidly discount potential distractors.

Artificial ‘skin’ gives robotic hand a sense of touch

University of Houston researchers have reported a development in stretchable electronics that can serve as artificial skin for a robotic hand and biomedical devices (credit: University of Houston)

A team of researchers from the University of Houston has reported a development in stretchable electronics that can serve as an artificial skin, allowing a robotic hand to sense the difference between hot and cold, and also offering advantages for a wide range of biomedical devices.

The work, reported in the open-access journal Science Advances, describes a new mechanism for producing stretchable electronics, a process that relies upon readily available materials and could be scaled up for commercial production.

Cunjiang Yu, Bill D. Cook Assistant Professor of mechanical engineering and lead author of the paper, said the work is the first to create a semiconductor in a rubber composite format, designed to allow the electronic components to retain functionality even after the material is stretched by 50 percent.

He noted that traditional semiconductors are brittle and using them in otherwise stretchable materials has required a complicated system of mechanical accommodations. That’s both more complex and less stable than the new discovery, as well as more expensive, he said. “Our strategy has advantages for simple fabrication, scalable manufacturing, high-density integration, large strain tolerance, and low cost,” he said.

Photograph of a robotic hand with intrinsically stretchable rubbery sensors (credit: Hae-Jin Kim et al./Science Advances)

The team used the skin to demonstrate that a robotic hand could sense the temperature of hot and iced water in a cup. The skin also was able to interpret computer signals sent to the hand and reproduce the signals as American Sign Language.

Uses of the stretchable skin include soft wearable electronics such as health monitors, medical implants, and human-machine interfaces.

The stretchable composite semiconductor was prepared by using a silicon-based polymer known as polydimethylsiloxane (PDMS) and tiny nanowires to create a solution that was then hardened into a material that used the nanowires to transport electric current.


Abstract of Rubbery electronics and sensors from intrinsically stretchable elastomeric composites of semiconductors and conductors

A general strategy to impart mechanical stretchability to stretchable electronics involves engineering materials into special architectures to accommodate or eliminate the mechanical strain in nonstretchable electronic materials while stretched. We introduce an all solution–processed type of electronics and sensors that are rubbery and intrinsically stretchable as an outcome from all the elastomeric materials in percolated composite formats with P3HT-NFs [poly(3-hexylthiophene-2,5-diyl) nanofibrils] and AuNP-AgNW (Au nanoparticles with conformally coated silver nanowires) in PDMS (polydimethylsiloxane). The fabricated thin-film transistors retain their electrical performances by more than 55% upon 50% stretching and exhibit one of the highest P3HT-based field-effect mobilities of 1.4 cm2/V∙s, owing to crystallinity improvement. Rubbery sensors, which include strain, pressure, and temperature sensors, show reliable sensing capabilities and are exploited as smart skins that enable gesture translation for sign language alphabet and haptic sensing for robotics to illustrate one of the applications of the sensors.

A battery-free origami robot powered and controlled by external magnetic fields

Wirelessly powered and controlled magnetic folding robot arm can grasp and bend (credit: Wyss Institute at Harvard University)

Harvard University researchers have created a battery-free, folding robot “arm” with multiple “joints,” gripper “hand,” and actuator “muscles” — all powered and controlled wirelessly by an external resonant magnetic field.

The design is inspired by the traditional Japanese art of origami (used to transform a simple sheet of paper into complex, three-dimensional shapes through a specific pattern of folds, creases, and crimps). The prototype device is capable of complex, repeatable movements at millimeter to centimeter scales.

The research, by scientists at the Wyss Institute for Biologically Inspired Engineering and the John A. Paulson School of Engineering and Applied Sciences (SEAS), is reported in Science Robotics.

How it works

Design of small-scale-structure prototype of wirelessly controlled robotic arm (credit: Mustafa Boyvat et al./Science Robotics)

The researchers designed a 0.8-gram prototype small-scale-structure* prototype robotic “arm” capable of bending and opening or closing a gripper around an object. The “arm” is constructed with a special origami-like pattern that uses hinges (“joints”) to permit it to bend. There is also a “hand” (gripper — left panel in above image) that opens or closes.

To power the device, an external coil with its own power source (see video below) is used to generate a low-frequency magnetic field that induces an electrical current in three magnetic coils. The current heats the spiral-wire shape-memory-alloy actuator wires (coiled wire shown in inset above). That causes the actuator wires (“muscles”) to contract, making the attached nearby “joints” bend, and folding the robot body.

Mechanism of the origami gripper (for small-scale prototype design). (Left) The coil SMA actuator pushes the center link connected to both fingers and the gripper opens fingers, enabled by dynamic folding at the joints (left). The plate spring, which is a passive compression spring, pulls the link back as the gripper closes the fingers, again by rotations at folding joints (center). (Right) A photo of the gripper showing the SMA actuator wire attached at the center link. (credit: Mustafa Boyvat et al./Science Robotics)

By changing the resonant frequency of the external electromagnetic field, the two longer actuator wires (coiled wires shown in above illustration) are instead heated and stretched, opening the gripper (“hand”).

In both cases, when the external field-induced current stops, the actuators relax, springing back to their “memory” positions and causing the robot body to straighten out or the gripper’s outer triangles to close.

Minimally invasive medicine and surgery applications

As an example of a practical future application, instead of having an uncomfortable endoscope put down their throat to assist a doctor with surgery, a patient could just swallow a micro-robot that could move around and perform simple tasks, like holding tissue or filming, powered by a coil outside their body.

Using a much larger source coil — on the order of yards in diameter — could enable wireless, battery-free communication between multiple “smart” objects in a room or building.

“Medical devices today are commonly limited by the size of the batteries that power them, whereas these remotely powered origami robots can break through that size barrier and potentially offer entirely new, minimally invasive approaches for medicine and surgery in the future,” says Wyss Founding Director Donald Ingber, who is also the Judah Folkman Professor of Vascular Biology at Harvard Medical School and the Vascular Biology Program at Boston Children’s Hospital, as well as a Professor of Bioengineering at Harvard’s School of Engineering and Applied Sciences.

This work was supported by the National Science Foundation, the U.S. Army Research Laboratory, and the Swiss National Science Foundation.

* A large-scale-structure prototype version has minor differences, including 12-cm folding lines vs. 1.7-cm folding lines in the smaller version.

Wyss Institute | Battery-Free Folding Robots


Abstract of Addressable wireless actuation for multijoint folding robots and devices

“Printing” robots and other complex devices through a process of origami-like folding is an emerging and promising manufacturing method due to the inherent simplicity and low cost of folding-based assembly. Folding is used in this class of device to create both complex static structures and flexure-based compliant mechanisms. Dependency on batteries to power these folds with no external wires is a hurdle to giving small-scale folding robots and devices functionality. We demonstrate a battery-free wireless folding method for dynamic multijoint structures, achieving addressable folding motions—both individual and collective folding—using only basic passive electronic components on the device. The method is based on electromagnetic power transmission and resonance selectivity for actuation of resistive shape memory alloy actuators without the need for physical connection or line of sight. We demonstrate the utility of this approach using two folded devices at different sizes using different circuit approaches.

Scientists remove one of the final barriers to making lifelike robots

(L) The electrically actuated muscle with thin resistive wire in a rest position; (R) The muscle is expanded using only a low voltage (8V). (credit: Aslan Miriyev/Columbia Engineering)

Researchers at the Columbia Engineering Creative Machines lab have developed a 3D-printable, synthetic soft muscle that can mimic natural biological systems, lifting 1000 times its own weight. The artificial muscle is three times stronger than natural muscle and can push, pull, bend, twist, and lift weight — no external devices required.

Existing soft-actuator technologies are typically based on bulky pneumatic or hydraulic inflation of elastomer skins that expand when air or liquid is supplied to them, which require external compressors and pressure-regulating equipment.

“We’ve been making great strides toward making robot minds, but robot bodies are still primitive,” said Hod Lipson, PhD, a professor of mechanical engineering. “This is a big piece of the puzzle and, like biology, the new actuator can be shaped and reshaped a thousand ways. We’ve overcome one of the final barriers to making lifelike robots.”

The research findings are described in an open-access study published Tuesday Sept. 19, 2017 by Nature Communications.

Replicating natural motion

Inspired by living organisms, soft-material robotics hold promise for areas where robots need to contact and interact with humans, such as manufacturing and healthcare. Unlike rigid robots, soft robots can replicate natural motion — grasping and manipulation — to provide medical and other types of assistance, perform delicate tasks, or pick up soft objects.

Structure and principle of operation of the soft composite material (stereoscope image scale bar is 1 mm). Upon heating the composite to a temperature of 78.4 °C, ethanol boils and the local pressure inside the micro-bubbles grows, forcing the elastic silicone elastomer matrix to comply by expansion in order to reduce the pressure. (credit: Aslan Miriyev et al./Nature Communications)

To achieve an actuator with high stress and high strain coupled with low density, the researchers used a silicone rubber matrix with ethanol (alcohol) distributed throughout in micro-bubbles. This design combines the elastic properties and extreme volume change attributes of other material systems while also being easy to fabricate, low cost, and made of environmentally safe materials.*

The researchers next plan to use conductive (heatable) materials to replace the embedded wire, accelerate the muscle’s response time, and increase its shelf life. Long-term, they plan to involve artificial intelligence to learn to control the muscle — perhaps a final milestone towards replicating natural human motion.

* After being 3D-printed into the desired shape, the artificial muscle was electrically actuated using a thin resistive wire and low-power (8V). It was tested in a variety of robotic applications, where it showed significant expansion-contraction ability and was capable of expansion up to 900% when electrically heated to 80°C. The new material has a strain density (the amount of deformation in the direction of an applied force without damage) that is 15 times larger than natural muscle.


Columbia Engineering | Soft Materials for Soft Actuators

Roboticists show off their new advances in “soft robots” (credit: Reuters TV)


Abstract of Soft material for soft actuators

Inspired by natural muscle, a key challenge in soft robotics is to develop self-contained electrically driven soft actuators with high strain density. Various characteristics of existing technologies, such as the high voltages required to trigger electroactive polymers ( > 1KV), low strain ( < 10%) of shape memory alloys and the need for external compressors and pressure-regulating components for hydraulic or pneumatic fluidicelastomer actuators, limit their practicality for untethered applications. Here we show a single self-contained soft robust composite material that combines the elastic properties of a polymeric matrix and the extreme volume change accompanying liquid–vapor transition. The material combines a high strain (up to 900%) and correspondingly high stress (up to 1.3 MPa) with low density (0.84 g cm−3). Along with its extremely low cost (about 3 cent per gram), simplicity of fabrication and environment-friendliness, these properties could enable new kinds of electrically driven entirely soft robots.

Leading AI country will be ‘ruler of the world,’ says Putin

DoD autonomous drone swarms concept (credit: U.S. Dept. of Defense)

Russian President Vladimir Putin warned Friday (Sept. 1, 2017) that the country that becomes the leader in developing artificial intelligence will be “the ruler of the world,” reports the Associated Press.

AI development “raises colossal opportunities and threats that are difficult to predict now,” Putin said in a lecture to students, warning that “it would be strongly undesirable if someone wins a monopolist position.”

Future wars will be fought by autonomous drones, Putin suggested, and “when one party’s drones are destroyed by drones of another, it will have no other choice but to surrender.”

U.N. urged to address lethal autonomous weapons

AI experts worldwide are also concerned. On August 20, 116 founders of robotics and artificial intelligence companies from 26 countries, including Elon Musk* and Google DeepMind’s Mustafa Suleyman, signed an open letter asking the United Nations to “urgently address the challenge of lethal autonomous weapons (often called ‘killer robots’) and ban their use internationally.”

“Lethal autonomous weapons threaten to become the third revolution in warfare,” the letter states. “Once developed, they will permit armed conflict to be fought at a scale greater than ever, and at timescales faster than humans can comprehend. These can be weapons of terror, weapons that despots and terrorists use against innocent populations, and weapons hacked to behave in undesirable ways. We do not have long to act. Once this Pandora’s box is opened, it will be hard to close.”

Unfortunately, the box may have already been opened. Three examples:

Russia. In 2014, Dmitry Andreyev of the Russian Strategic Missile Forces announced that mobile robots would be standing guard over five ballistic missile installations, New Scientist reported. Armed with a heavy machine gun, this “mobile robotic complex … can detect and destroy targets, without human involvement.”

Uran-9 unmanned combat ground vehicle (credit: Vitaly V. Kuzmin/CC)

In 2016, Russian military equipment manufacturer JSC 766 UPTK announced what appears to be the commercial version: the Uran-9 multipurpose unmanned ground combat vehicle. “In autonomous mode, the vehicle can automatically identify, detect, track and defend [against] enemy targets based on the pre-programmed path set by the operator,” the company said.

United States. In a 2016 report, the U.S. Department of Defense advocated self-organizing “autonomous unmanned” (UA) swarms of small drones that would assist frontline troops in real time by surveillance, jamming/spoofing enemy electronics, and autonomously firing against the enemy.

The authors warned that “autonomy — fueled by advances in artificial intelligence — has attained a ‘tipping point’ in value. Autonomous capabilities are increasingly ubiquitous and are readily available to allies and adversaries alike.” The report advised that the Department of Defense “must take immediate action to accelerate its exploitation of autonomy while also preparing to counter autonomy employed by adversaries.”**

South Korea. Designed initially for the DMZ, Super aEgis II, a robot-sentry machine gun designed by Dodaam Systems, can identify, track, and automatically destroy a human target 3 kilometers away, assuming that capability is turned on.

* “China, Russia, soon all countries w strong computer science. Competition for AI superiority at national level most likely cause of WW3 imo.” — Elon Musk tweet 2:33 AM – 4 Sep 2017

** While it doesn’t use AI, the U.S. Navy’s computer-controlled, radar-guided Phalanx gun system can automatically detect, track, evaluate, and fire at incoming missiles and aircraft that it judges to be a threat.

UPDATE Sept. 5, 2017: Added Musk tweet in footnote

Will AI enable the third stage of life?

In his new book Life 3.0: Being Human in the Age of Artificial Intelligence, MIT physicist and AI researcher Max Tegmark explores the future of technology, life, and intelligence.

The question of how to define life is notoriously controversial. Competing definitions abound, some of which include highly specific requirements such as being composed of cells, which might disqualify both future intelligent machines and extraterrestrial civilizations. Since we don’t want to limit our thinking about the future of life to the species we’ve encountered so far, let’s instead define life very broadly, simply as a process that can retain its complexity and replicate.

What’s replicated isn’t matter (made of atoms) but information (made of bits) specifying how the atoms are arranged. When a bacterium makes a copy of its DNA, no new atoms are created, but a new set of atoms are arranged in the same pattern as the original, thereby copying the information.

In other words, we can think of life as a self-replicating information-processing system whose information (software) determines both its behavior and the blueprints for its hardware.

Like our Universe itself, life gradually grew more complex and interesting, and as I’ll now explain, I find it helpful to classify life forms into three levels of sophistication: Life 1.0, 2.0 and 3.0.

It’s still an open question how, when and where life first appeared in our Universe, but there is strong evidence that here on Earth life first appeared about 4 billion years ago.

Before long, our planet was teeming with a diverse panoply of life forms. The most successful ones, which soon outcompeted the rest, were able to react to their environment in some way.

Specifically, they were what computer scientists call “intelligent agents”: entities that collect information about their environment from sensors and then process this information to decide how to act back on their environment. This can include highly complex information processing, such as when you use information from your eyes and ears to decide what to say in a conversation. But it can also involve hardware and software that’s quite simple.

For example, many bacteria have a sensor measuring the sugar concentration in the liquid around them and can swim using propeller-shaped structures called flagella. The hardware linking the sensor to the flagella might implement the following simple but useful algorithm: “If my sugar concentration sensor reports a lower value than a couple of seconds ago, then reverse the rotation of my flagella so that I change direction.”

You’ve learned how to speak and countless other skills. Bacteria, on the other hand, aren’t great learners. Their DNA specifies not only the design of their hardware, such as sugar sensors and flagella, but also the design of their software. They never learn to swim toward sugar; instead, that algorithm was hard- coded into their DNA from the start.

There was of course a learning process of sorts, but it didn’t take place during the lifetime of that particular bacterium. Rather, it occurred during the preceding evolution of that species of bacteria, through a slow trial-and-error process spanning many generations, where natural selection favored those random DNA mutations that improved sugar consumption. Some of these mutations helped by improving the design of flagella and other hardware, while other mutations improved the bacterial information-processing system that implements the sugar-finding algorithm and other software.


“Tegmark’s new book is a deeply thoughtful guide to the most important conversation of our time, about how to create a benevolent future civilization as we merge our biological thinking with an even greater intelligence of our own creation.” — Ray Kurzweil, Inventor, Author and Futurist, author of The Singularity Is Near and How to Create a Mind


Such bacteria are an example of what I’ll call “Life 1.0”: life where both the hardware and software are evolved rather than designed. You and I, on the other hand, are examples of “Life 2.0”: life whose hardware is evolved, but whose software is largely designed. By your software, I mean all the algorithms and knowledge that you use to process the information from your senses and decide what to do—everything from the ability to recognize your friends when you see them to your ability to walk, read, write, calculate, sing and tell jokes.

You weren’t able to perform any of those tasks when you were born, so all this software got programmed into your brain later through the process we call learning. Whereas your childhood curriculum is largely designed by your family and teachers, who decide what you should learn, you gradually gain more power to design your own software.

Perhaps your school allows you to select a foreign language: Do you want to install a software module into your brain that enables you to speak French, or one that enables you to speak Spanish? Do you want to learn to play tennis or chess? Do you want to study to become a chef, a lawyer or a pharmacist? Do you want to learn more about artificial intelligence (AI) and the future of life by reading a book about it?

This ability of Life 2.0 to design its software enables it to be much smarter than Life 1.0. High intelligence requires both lots of hardware (made of atoms) and lots of software (made of bits). The fact that most of our human hardware is added after birth (through growth) is useful, since our ultimate size isn’t limited by the width of our mom’s birth canal. In the same way, the fact that most of our human software is added after birth (through learning) is useful, since our ultimate intelligence isn’t limited by how much information can be transmitted to us at conception via our DNA, 1.0-style.

I weigh about twenty-five times more than when I was born, and the synaptic connections that link the neurons in my brain can store about a hundred thousand times more information than the DNA that I was born with. Your synapses store all your knowledge and skills as roughly 100 terabytes’ worth of information, while your DNA stores merely about a gigabyte, barely enough to store a single movie download. So it’s physically impossible for an infant to be born speaking perfect English and ready to ace her college entrance exams: there’s no way the information could have been preloaded into her brain, since the main information module she got from her parents (her DNA) lacks sufficient information-storage capacity.

The ability to design its software enables Life 2.0 to be not only smarter than Life 1.0, but also more flexible. If the environment changes, 1.0 can only adapt by slowly evolving over many generations. Life 2.0, on the other hand, can adapt almost instantly, via a software update. For example, bacteria frequently encountering antibiotics may evolve drug resistance over many generations, but an individual bacterium won’t change its behavior at all; in contrast, a girl learning that she has a peanut allergy will immediately change her behavior to start avoiding peanuts.

This flexibility gives Life 2.0 an even greater edge at the population level: even though the information in our human DNA hasn’t evolved dramatically over the past fifty thousand years, the information collectively stored in our brains, books and computers has exploded. By installing a software module enabling us to communicate through sophisticated spoken language, we ensured that the most useful information stored in one person’s brain could get copied to other brains, potentially surviving even after the original brain died.

By installing a software module enabling us to read and write, we became able to store and share vastly more information than people could memorize. By developing brain software capable of producing technology (i.e., by studying science and engineering), we enabled much of the world’s information to be accessed by many of the world’s humans with just a few clicks.

This flexibility has enabled Life 2.0 to dominate Earth. Freed from its genetic shackles, humanity’s combined knowledge has kept growing at an accelerating pace as each breakthrough enabled the next: language, writing, the printing press, modern science, computers, the internet, etc. This ever-faster cultural evolution of our shared software has emerged as the dominant force shaping our human future, rendering our glacially slow biological evolution almost irrelevant.

Yet despite the most powerful technologies we have today, all life forms we know of remain fundamentally limited by their biological hardware. None can live for a million years, memorize all of Wikipedia, understand all known science or enjoy spaceflight without a spacecraft. None can transform our largely lifeless cosmos into a diverse biosphere that will flourish for billions or trillions of years, enabling our Universe to finally fulfill its potential and wake up fully. All this requires life to undergo a final upgrade, to Life 3.0, which can design not only its software but also its hardware. In other words, Life 3.0 is the master of its own destiny, finally fully free from its evolutionary shackles.

The boundaries between the three stages of life are slightly fuzzy. If bacteria are Life 1.0 and humans are Life 2.0, then you might classify mice as 1.1: they can learn many things, but not enough to develop language or invent the internet. Moreover, because they lack language, what they learn gets largely lost when they die, not passed on to the next generation. Similarly, you might argue that today’s humans should count as Life 2.1: we can perform minor hardware upgrades such as implanting artificial teeth, knees and pacemakers, but nothing as dramatic as getting ten times taller or acquiring a thousand times bigger brain.

In summary, we can divide the development of life into three stages, distinguished by life’s ability to design itself:

• Life 1.0 (biological stage): evolves its hardware and software

• Life 2.0 (cultural stage): evolves its hardware, designs much of its software

• Life 3.0 (technological stage): designs its hardware and software

After 13.8 billion years of cosmic evolution, development has accelerated dramatically here on Earth: Life 1.0 arrived about 4 billion years ago, Life 2.0 (we humans) arrived about a hundred millennia ago, and many AI researchers think that Life 3.0 may arrive during the coming century, perhaps even during our lifetime, spawned by progress in AI. What will happen, and what will this mean for us? That’s the topic of this book.

From the book Life 3.0: Being Human in the Age of Artificial Intelligence by Max Tegmark, © 2017 by Max Tegmark. Published by arrangement with Alfred A. Knopf, an imprint of The Knopf Doubleday Publishing Group, a division of Penguin Random House LLC.

How to design a custom robot in minutes without being a roboticist

Robot designs by novices using Interactive Robogami (credit: MIT CSAIL)

MIT’s new “Interactive Robogami” system will let you design a robot in minutes and then 3D-print and assemble it in about four hours.

“Designing robots usually requires expertise that only mechanical engineers and roboticists have,” says PhD student Adriana Schulz, co-lead author of a paper in The International Journal of Robotics Research and a researcher in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). “What’s exciting here is that we’ve created a tool that allows a casual user to design their own robot by giving them this expert knowledge.”

Interactive Robogami uses simulations and interactive feedback with algorithms for design composition, allowing users to focus on high-level conceptual design. Users can choose from a library of more than 50 different bodies, wheels, legs and “peripherals,” as well as a selection of different steps (“gaits”).

Gallery of designs created by a novice user after a 20-minute training session. Each of the models took between 3 and 25 minutes to design and contains multiple components from the database. (credit: Adriana Schulz et al./The International Journal of Robotics Research)

The system checks to makes sure a design is actually physically possible, analyzing factors such as speed and stability. Once designed, the team’s origami-inspired “3-D print and fold” technique allows for printing the design as flat faces connected at joints, and then folding the design into the final shape, combining the most effective parts of 2D and 3D printing.*

CSAIL director Daniela Rus, a Professor of Electrical Engineering and Computer Science, hopes people will be able to use the system to incorporate robots that can help with everyday tasks, and that similar systems with rapid printing technologies will enable large-scale customization and production of robots.

“These tools enable new approaches to teaching computational thinking and creating,” says Rus. “Students can not only learn by coding and making their own robots, but by bringing to life conceptual ideas about what their robots can actually do.”

This research was supported by the National Science Foundation’s Expeditions in Computing program.

* To test the system, the team used eight subjects who were given 20 minutes of training and asked to perform two tasks: Create a mobile, stable car design in just ten minutes, and create a trajectory to navigate a robot through an obstacle course in the least amount of travel time. The team fabricated a total of six robots, each of which took 10 to 15 minutes to design, 3 to 7 hours to print and 30 to 90 minutes to assemble. The team found that their 3D print-and-fold method reduced printing time by 73 percent and the amount of material used by 70 percent. The robots also demonstrated a wide range of movement, like using single legs to walk, using different step sequences, and using legs and wheels simultaneously.


Abstract of Interactive robogami: An end-to-end system for design of robots with ground locomotion

This paper aims to democratize the design and fabrication of robots, enabling people of all skill levels to make robots without needing expert domain knowledge. Existing work in computational design and rapid fabrication has explored this question of customization for physical objects but so far has not been able to conquer the complexity of robot designs. We have developed Interactive Robogami, a tool for composition-based design of ground robots that can be fabricated as flat sheets and then folded into 3D structures. This rapid prototyping process enables users to create lightweight, affordable, and materially versatile robots with short turnaround time. Using Interactive Robogami, designers can compose new robot designs from a database of print-and-fold parts. The designs are tested for the users’ functional specifications via simulation and fabricated on user satisfaction. We present six robots designed and fabricated using a 3D printing based approach, as well as a larger robot cut from sheet metal. We have also conducted a user study that demonstrates that our tool is intuitive for novice designers and expressive enough to create a wide variety of ground robot designs.

Ray Kurzweil reveals plans for ‘linguistically fluent’ Google software

Smart Reply (credit: Google Research)

Ray Kuzweil, a director of engineering at Google, reveals plans for a future version of Google’s “Smart Reply” machine-learning email software (and more) in a Wired article by Tom Simonite published Wednesday (Aug. 2, 2017).

Running on mobile Gmail and Google Inbox, Smart Reply suggests up to three replies to an email message, saving typing time or giving you ideas for a better reply.

Smarter autocomplete

Kurzweil’s team is now “experimenting with empowering Smart Reply to elaborate on its initial terse suggestions,” Simonite says.

“Tapping a Continue button [in response to an email] might cause ‘Sure I’d love to come to your party!’ to expand to include, for example, ‘Can I bring something?’ He likes the idea of having AI pitch in anytime you’re typing, a bit like an omnipresent, smarter version of Google’s search autocomplete. ‘You could have similar technology to help you compose documents or emails by giving you suggestions of how to complete your sentence,’ Kurzweil says.”

As Simonite notes, Kurzweil’s software is based on his hierarchical theory of intelligence, articulated in Kurzweil’s latest book, How to Create a Mind and in more detail in an arXiv paper by Kurzweil and key members of his team, published in May.

“Kurzweil’s work outlines a path to create a simulation of the human neocortex (the outer layer of the brain where we do much of our thinking) by building a hierarchy of similarly structured components that encode increasingly abstract ideas as sequences,” according to the paper. “Kurzweil provides evidence that the neocortex is a self-organizing hierarchy of modules, each of which can learn, remember, recognize and/or generate a sequence, in which each sequence consists of a sequential pattern from lower-level modules.”

The paper further explains that Smart Reply previously used a “long short-term memory” (LSTM) network*, “which are much slower than feed-forward networks [used in the new software] for training and inference” because with LSTM, it takes more computation to handle longer sequences of words.

Kurzweil’s team was able to produce email responses of similar quality to LSTM, but using fewer computational resources by training hierarchically connected layers of simulated neurons on clustered numerical representations of text. Essentially, the approach propagates information through a sequence of ever more complex pattern recognizers until the final patterns are matched to optimal responses.

Kona: linguistically fluent software

But underlying Smart Reply is “a system for understanding the meaning of language, according to Kurzweil,” Simonite reports.

“Codenamed Kona, the effort is aiming for nothing less than creating software as linguistically fluent as you or me. ‘I would not say it’s at human levels, but I think we’ll get there,’ Kurzweil says. More applications of Kona are in the works and will surface in future Google products, he promises.”

* The previous sequence-to-sequence (Seq2Seq) framework [described in this paper] uses “recurrent neural networks (RNNs), typically long short-term memory (LSTM) networks, to encode sequences of word embeddings into representations that depend on the order, and uses a decoder RNN to generate output sequences word by word. …While Seq2Seq models provide a generalized solution, it is not obvious that they are maximally efficient, and training these systems can be slow and complicated.”

Ray Kurzweil reveals plans for ‘linguistically fluent’ Google software

Smart Reply (credit: Google Research)

Ray Kuzweil, a director of engineering at Google, reveals plans for a future version of Google’s “Smart Reply” machine-learning email software (and more) in a Wired article by Tom Simonite published Wednesday (Aug. 2, 2017).

Running on mobile Gmail and Google Inbox, Smart Reply suggests up to three replies to an email message, saving typing time or giving you ideas for a better reply.

Smarter autocomplete

Kurzweil’s team is now “experimenting with empowering Smart Reply to elaborate on its initial terse suggestions,” Simonite says.

“Tapping a Continue button [in response to an email] might cause ‘Sure I’d love to come to your party!’ to expand to include, for example, ‘Can I bring something?’ He likes the idea of having AI pitch in anytime you’re typing, a bit like an omnipresent, smarter version of Google’s search autocomplete. ‘You could have similar technology to help you compose documents or emails by giving you suggestions of how to complete your sentence,’ Kurzweil says.”

As Simonite notes, Kurzweil’s software is based on his hierarchical theory of intelligence, articulated in Kurzweil’s latest book, How to Create a Mind and in more detail in an arXiv paper by Kurzweil and key members of his team, published in May.

“Kurzweil’s work outlines a path to create a simulation of the human neocortex (the outer layer of the brain where we do much of our thinking) by building a hierarchy of similarly structured components that encode increasingly abstract ideas as sequences,” according to the paper. “Kurzweil provides evidence that the neocortex is a self-organizing hierarchy of modules, each of which can learn, remember, recognize and/or generate a sequence, in which each sequence consists of a sequential pattern from lower-level modules.”

The paper further explains that Smart Reply previously used a “long short-term memory” (LSTM) network*, “which are much slower than feed-forward networks [used in the new software] for training and inference” because with LSTM, it takes more computation to handle longer sequences of words.

Kurzweil’s team was able to produce email responses of similar quality to LSTM, but using fewer computational resources by training hierarchically connected layers of simulated neurons on clustered numerical representations of text. Essentially, the approach propagates information through a sequence of ever more complex pattern recognizers until the final patterns are matched to optimal responses.

Kona: linguistically fluent software

But underlying Smart Reply is “a system for understanding the meaning of language, according to Kurzweil,” Simonite reports.

“Codenamed Kona, the effort is aiming for nothing less than creating software as linguistically fluent as you or me. ‘I would not say it’s at human levels, but I think we’ll get there,’ Kurzweil says. More applications of Kona are in the works and will surface in future Google products, he promises.”

* The previous sequence-to-sequence (Seq2Seq) framework [described in this paper] uses “recurrent neural networks (RNNs), typically long short-term memory (LSTM) networks, to encode sequences of word embeddings into representations that depend on the order, and uses a decoder RNN to generate output sequences word by word. …While Seq2Seq models provide a generalized solution, it is not obvious that they are maximally efficient, and training these systems can be slow and complicated.”