Can you teach AI common sense? | MCUTimes

Can you teach AI common sense?

All Transform 2021 sessions are available on-demand now. Look now.

Even before they speak their first words, human babies develop mental models about objects and people. This is one of the key features that allows us humans to learn to live socially and collaborate (or compete) with each other. But for artificial intelligence, even the most basic behavioral reasoning tasks remain a challenge.

Advanced models for deep learning can perform complex tasks such as discovering people and objects in images, sometimes even better than humans. But they struggle to move beyond the visual features of images and draw conclusions about what other agents are doing or want to achieve.

To help fill this gap, researchers at IBM, the Massachusetts Institute of Technology and Harvard University have developed a series of tests that help evaluate the capacity of AI models to reason as children by observing and making sense of the world. .

“Like infants, it is essential for machinery agents to develop a sufficient capacity to understand human minds in order to participate in social interactions,” the AI ​​researchers write in a new paper which introduces the dataset, called AGENT.

AGENT was presented at this year’s International Conference on Machine Learning (ICML) and is an important benchmark for measuring AI systems’ reasoning capabilities.

Adherence to and prediction of agent behavior

There is a lot of work involved in testing common sense and reasoning in AI systems. Many of them focus on natural language comprehension, including the famous one Turing test and Winograd schemes. In contrast, the AGENT project focuses on the kind of reasoning capabilities that people learn before they can speak.

“Our goal, according to the literature in developmental psychology, is to create a benchmark for evaluating specific health abilities related to intuitive psychology that babies learn in the pre-linguistic phase (in the first 18 months of their lives),” Dan Gutfreund, principal told investigator at MIT-IBM Watson AI Lab TechTalks.

As children, we learn to tell the difference between objects and agents by observing our environments. As we watch events unfold, we develop intuitive psychological skills, predict other people’s goals by observing their actions, and continue to correct and update our mental. We learn all this with little or no instructions.

The idea behind the AGENT test (action, goal, efficiency, coNstraint, uTility) is to assess how well AI systems can mimic this basic skill, what they can develop psychological reasoning skills, and how well the representations they learn generalize to new situations. The dataset contains short sequences that show an agent navigating to one of several objects. The sequences are produced in ThreeDWorld, a virtual 3D environment designed for the training of AI agents.

The AGENT test takes place in two phases. First, AI is presented with one or two sequences depicting the agent’s behavior. These examples should make AI familiar with the preferences of the virtual agent. For example, an agent can always select a type of object regardless of the obstacles that stand in its way, or it can select the nearest and most accessible object regardless of its type.

After the announcement phase, AI is shown a test sequence and it has to decide if the agent is acting in an expected or surprising way.

The tests, a total of 3,360, span four types of scenarios starting with very simple behavior (the agent prefers one type of object regardless of the environment) rather than more complicated challenges (the agent manifests cost reward estimation and weighs the difficulty of achieving a goal against the reward it receives). AI must also consider the action agent’s action efficiency (for example, it should not make unnecessary jumps when there are no obstacles). And in some of the challenges, the stage is partially closed to make it more difficult to reason about the environment.

Realistic scenarios in an artificial environment

The test’s designers have included human inductive prejudices, meaning that agents and the environment are governed by rules that would be rational for humans (e.g., the cost of jumping or climbing an obstacle grows with its height). This decision helps make the challenges more realistic and easier to evaluate. The researchers also note that this kind of prejudice is also important in creating AI systems that are better adapted and compatible with human behavior and can interact with human counterparts.

AI researchers tested the challenges on human volunteers through Amazon Mechanical Turk. Their findings show that humans can solve an average of 91 percent of the challenges by observing the confidentiality sequences and judging the test specimens. This means that people use their prior knowledge of the world and human / animal behavior to make sense of how the agents make decisions (eg everything else being equal, an agent will choose the object with higher reward).

AI researchers deliberately limited the size of the dataset to prevent unintelligent shortcuts to solve the problems. With a very large data set, a machine learning model might learn to make accurate predictions without gaining the underlying knowledge of agent behavior. “Training from scratch on just our dataset does not work. Instead, we suggest that in order to pass the tests, it is necessary to acquire additional knowledge either via inductive biases in the architecture or from training in additional data, ”the researchers write.

However, the researchers have implemented some shortcuts in the tests. The AGENT dataset includes depth maps, segmentation maps, and bounding boxes with objects and obstacles for each scene in the scene. The scenes are also extremely simple in visual detail and consist of eight different colors. All of this makes it easier for AI systems to process the information on stage and focus on the reasoning part of the challenge.

Do current AI solve AI challenges?

The researchers tested the AGENT challenge on two AI models for baseline. The first, Bayesian Inverse Planning and Core Knowledge (BIPaCK), is a generative model that integrates physics simulation and planning.

Above: The BIPaCK model uses planning and physics engines to predict the agent’s trajectory

This model uses the full earth-truth information provided by the dataset and feeds it into its physics and planning engine to predict the agent’s trajectory. The researchers’ experiments show that BIPaCK is able to perform at the level or even better than humans when it has full information about the scene.

However, in the real world, AI systems do not have access to accurately annotated Earth truth information and must perform the complicated task of detecting objects against different backgrounds and lighting conditions, a problem that humans and animals solve easily but still challenge computer vision. systems.

In their paper, the researchers acknowledge that BIPaCK “requires an accurate reconstruction of the 3D mode and a built-in model of the physical dynamics that will not necessarily be available in real scenes.”

The second model that the researchers tested, codenamed ToMnet-G, is an extended version of Theory of Mind Neural Network (ToMnet), proposed by researchers at DeepMind in 2018. ToMnet-G uses neural graphical networks to encode the state of the scenes, including objects, obstacles, and the location of the agent. Then these codings feed into long short-term memory networks (LSTM) to track the agent’s path across the frame sequence. The model uses the representations it extracts from the confidentiality videos to predict the agent’s behavior in the test videos and rate them as expected or surprising.

ToMnet-G model

Above: The ToMnet-G model uses neural graphical networks and LSTMs to embed stage representations and predict agent behavior

The advantage of ToMnet-G is that it does not require the preconstructed physics and sensible knowledge of BIPaCK. It learns everything from videos and previous training on other datasets. On the other hand, ToMnet-G often learns the wrong representations and cannot generalize its behavior to new scenarios or when it has limited confidentiality information.

“Without many built-in priorities, ToMnet-G shows promising results when trained and tested on similar scenarios, but it still lacks a strong generalization capacity both within scenarios and across them,” the researchers note in their paper.

The contrast between the two models highlights the challenges of the simplest tasks that people learn without any instructions.

“We need to remember that our benchmark, by design, shows very simple synthetic scenarios that each time address a specific aspect of common sense,” Gutfreund said. In the real world, people are able to analyze complex scenes very quickly, where at the same time there are many aspects of common sense related to physics, psychology, language and more. AI models are still far from capable of doing anything close to that. ”

Common sense and the future of AI

“We believe that the path from narrow to broad AI should include models that have common sense,” Gutfreund said. “Common sense capabilities are important building blocks of understanding and interaction in the world and can facilitate the acquisition of new capabilities.”

Many researchers believe that common sense and reasoning can solve many of the problems that current AI systems face, such as their need for extensive amounts of training data, their struggle with causality, and their fragility in dealing with new situations. Common sense and reasoning are important areas of research for the AI ​​community, and they have become the focus of some of the brightest minds in the field, including the pioneers of deep learning.

Resolving AGENT can be a small but important step towards creating AI agents who behave robustly in the unpredictable world of humans.

“It will be difficult to convince people to trust autonomous agents, such as does not behave in a common sensual way, ”Said Gutfreund. “Consider, for example, a robot to help the elderly. If this robot does not follow the sensible principle that agents pursue their targets effectively and will move in zig zag rather than in a straight line when asked to fetch milk in the refrigerator, it will not be very convenient or credible. ”

AGENT is a part of Machine common sense (MCS) program from the Defense Advanced Research Projects Agency (DARPA). MCS follows two overall goals. The first is to create machines that can teach children to reason about objects, agents, and space as children. AGENT falls into this category. The second goal is to develop systems that can learn by reading structured and unstructured knowledge from the web, as a human researcher would do. This is different from current approaches to natural language comprehension, which only focuses on capturing statistical correlations between words and word sequences in very large text corpora.

“We are now working on using AGENT as a test environment for babies. Together with the rest of the DARPA MCS program practitioners, we plan to explore more complex scenarios of common sense related to multiple agents (e.g., helping or hindering each other) and the use of tools to achieve goals (e.g. Keys to open doors). We are also working on other core areas of knowledge related to intuitive physics and spatial understanding, ”said Gutfreund.

Ben Dickson is a software engineer and founder of TechTalks, a blog that explores ways in which technology solves and creates problems.

This story originally originated on Copyright 2021


VentureBeat’s mission is to be a digital marketplace for tech makers to gain knowledge about transforming technology and transactions. Our site provides important information about computer technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community to access:

  • updated information on the topics that interest you
  • our newsletters
  • gated thought leader content and reduced access to our precious events, such as Transform 2021: Learn more
  • networking features and more

sign up

Disclaimers for

All the information on this website - - is published in good faith and for general information purpose only. does not make any warranties about the completeness, reliability, and accuracy of this information. Any action you take upon the information you find on this website (, is strictly at your own risk. will not be liable for any losses and/or damages in connection with the use of our website.

Leave a Comment