Monday, June 9, 2025
No Result
View All Result
newshub
  • Global news
  • Financial insights
    • Africa
    • Asia
    • Australia
    • Central Banks
    • China
    • Commodities
    • Europe
    • Banking
    • Corporate
    • Neobanking
    • Investment
    • Japan
    • South East Asia
    • Stock of the week
    • UK
    • US
  • Fin & tech
    • AI
    • Blockchain
    • Crypto
    • MSTRpay
    • Tech
  • Climate & energy
    • Climate
    • Carbon
    • Coal
    • Disruptive
    • Gas
    • Nuclear
    • Oil
    • Solar
    • Water
    • Waves
    • Wind
    • Renewable
    • South America
  • Lifestyle
    • Best chefs
    • Cocktail of the week
    • History
    • Influential women
  • WEX
    • Alt Kap Holding AB
    • Digital Network Holding, Inc.
    • Fantas-E AB
    • International Clean Energy Inc.
    • Intritum Partner Limited
    • Intritum Recycling GH Limited
    • MSTRpay AB
    • SWAP Services, Inc.
    • VMT Holding, Inc.
    • Universal Streaming Technologies – USTA
    • TC Unterhaltungselektronik AG
  • Global news
  • Financial insights
    • Africa
    • Asia
    • Australia
    • Central Banks
    • China
    • Commodities
    • Europe
    • Banking
    • Corporate
    • Neobanking
    • Investment
    • Japan
    • South East Asia
    • Stock of the week
    • UK
    • US
  • Fin & tech
    • AI
    • Blockchain
    • Crypto
    • MSTRpay
    • Tech
  • Climate & energy
    • Climate
    • Carbon
    • Coal
    • Disruptive
    • Gas
    • Nuclear
    • Oil
    • Solar
    • Water
    • Waves
    • Wind
    • Renewable
    • South America
  • Lifestyle
    • Best chefs
    • Cocktail of the week
    • History
    • Influential women
  • WEX
    • Alt Kap Holding AB
    • Digital Network Holding, Inc.
    • Fantas-E AB
    • International Clean Energy Inc.
    • Intritum Partner Limited
    • Intritum Recycling GH Limited
    • MSTRpay AB
    • SWAP Services, Inc.
    • VMT Holding, Inc.
    • Universal Streaming Technologies – USTA
    • TC Unterhaltungselektronik AG
No Result
View All Result
newshub
No Result
View All Result
ADVERTISEMENT

Multiple AI models help robots execute complex plans more transparently

2024/01/15/07:28
in AI
Reading Time: 5 mins read
247 6
A A
Multiple AI models help robots execute complex plans more transparently
MSTRpay MSTRpay MSTRpay
ADVERTISEMENT

A multimodal system uses models trained on language, vision, and action data to help robots develop and execute plans for household, construction, and manufacturing tasks.

Your daily to-do list is likely pretty straightforward: wash the dishes, buy groceries, and other minutiae. It’s unlikely you wrote out “pick up the first dirty dish,” or “wash that plate with a sponge,” because each of these miniature steps within the chore feels intuitive. While we can routinely complete each step without much thought, a robot requires a complex plan that involves more detailed outlines.

MIT’s Improbable AI Lab, a group within the Computer Science and Artificial Intelligence Laboratory (CSAIL), has offered these machines a helping hand with a new multimodal framework: Compositional Foundation Models for Hierarchical Planning (HiP), which develops detailed, feasible plans with the expertise of three different foundation models. Like OpenAI’s GPT-4, the foundation model that ChatGPT and Bing Chat were built upon, these foundation models are trained on massive quantities of data for applications like generating images, translating text, and robotics.

Unlike RT2 and other multimodal models that are trained on paired vision, language, and action data, HiP uses three different foundation models each trained on different data modalities. Each foundation model captures a different part of the decision-making process and then works together when it’s time to make decisions. HiP removes the need for access to paired vision, language, and action data, which is difficult to obtain. HiP also makes the reasoning process more transparent.

What’s considered a daily chore for a human can be a robot’s “long-horizon goal” — an overarching objective that involves completing many smaller steps first — requiring sufficient data to plan, understand, and execute objectives. While computer vision researchers have attempted to build monolithic foundation models for this problem, pairing language, visual, and action data is expensive. Instead, HiP represents a different, multimodal recipe: a trio that cheaply incorporates linguistic, physical, and environmental intelligence into a robot.

“Foundation models do not have to be monolithic,” says NVIDIA AI researcher Jim Fan, who was not involved in the paper. “This work decomposes the complex task of embodied agent planning into three constituent models: a language reasoner, a visual world model, and an action planner. It makes a difficult decision-making problem more tractable and transparent.”

The team believes that their system could help these machines accomplish household chores, such as putting away a book or placing a bowl in the dishwasher. Additionally, HiP could assist with multistep construction and manufacturing tasks, like stacking and placing different materials in specific sequences.

Evaluating HiP
The CSAIL team tested HiP’s acuity on three manipulation tasks, outperforming comparable frameworks. The system reasoned by developing intelligent plans that adapt to new information.

First, the researchers requested that it stack different-colored blocks on each other and then place others nearby. The catch: Some of the correct colors weren’t present, so the robot had to place white blocks in a color bowl to paint them. HiP often adjusted to these changes accurately, especially compared to state-of-the-art task planning systems like Transformer BC and Action Diffuser, by adjusting its plans to stack and place each square as needed.

Another test: arranging objects such as candy and a hammer in a brown box while ignoring other items. Some of the objects it needed to move were dirty, so HiP adjusted its plans to place them in a cleaning box, and then into the brown container. In a third demonstration, the bot was able to ignore unnecessary objects to complete kitchen sub-goals such as opening a microwave, clearing a kettle out of the way, and turning on a light. Some of the prompted steps had already been completed, so the robot adapted by skipping those directions.

A three-pronged hierarchy
HiP’s three-pronged planning process operates as a hierarchy, with the ability to pre-train each of its components on different sets of data, including information outside of robotics. At the bottom of that order is a large language model (LLM), which starts to ideate by capturing all the symbolic information needed and developing an abstract task plan. Applying the common sense knowledge it finds on the internet, the model breaks its objective into sub-goals. For example, “making a cup of tea” turns into “filling a pot with water,” “boiling the pot,” and the subsequent actions required.

“All we want to do is take existing pre-trained models and have them successfully interface with each other,” says Anurag Ajay, a PhD student in the MIT Department of Electrical Engineering and Computer Science (EECS) and a CSAIL affiliate. “Instead of pushing for one model to do everything, we combine multiple ones that leverage different modalities of internet data. When used in tandem, they help with robotic decision-making and can potentially aid with tasks in homes, factories, and construction sites.”

These models also need some form of “eyes” to understand the environment they’re operating in and correctly execute each sub-goal. The team used a large video diffusion model to augment the initial planning completed by the LLM, which collects geometric and physical information about the world from footage on the internet. In turn, the video model generates an observation trajectory plan, refining the LLM’s outline to incorporate new physical knowledge.

This process, known as iterative refinement, allows HiP to reason about its ideas, taking in feedback at each stage to generate a more practical outline. The flow of feedback is similar to writing an article, where an author may send their draft to an editor, and with those revisions incorporated in, the publisher reviews for any last changes and finalizes.

In this case, the top of the hierarchy is an egocentric action model, or a sequence of first-person images that infer which actions should take place based on its surroundings. During this stage, the observation plan from the video model is mapped over the space visible to the robot, helping the machine decide how to execute each task within the long-horizon goal. If a robot uses HiP to make tea, this means it will have mapped out exactly where the pot, sink, and other key visual elements are, and begin completing each sub-goal.

Still, the multimodal work is limited by the lack of high-quality video foundation models. Once available, they could interface with HiP’s small-scale video models to further enhance visual sequence prediction and robot action generation. A higher-quality version would also reduce the current data requirements of the video models.

That being said, the CSAIL team’s approach only used a tiny bit of data overall. Moreover, HiP was cheap to train and demonstrated the potential of using readily available foundation models to complete long-horizon tasks. “What Anurag has demonstrated is proof-of-concept of how we can take models trained on separate tasks and data modalities and combine them into models for robotic planning. In the future, HiP could be augmented with pre-trained models that can process touch and sound to make better plans,” says senior author Pulkit Agrawal, MIT assistant professor in EECS and director of the Improbable AI Lab. The group is also considering applying HiP to solving real-world long-horizon tasks in robotics.

Ajay and Agrawal are lead authors on a paper describing the work. They are joined by MIT professors and CSAIL principal investigators Tommi Jaakkola, Joshua Tenenbaum, and Leslie Pack Kaelbling; CSAIL research affiliate and MIT-IBM AI Lab research manager Akash Srivastava; graduate students Seungwook Han and Yilun Du ’19; former postdoc Abhishek Gupta, who is now assistant professor at University of Washington; and former graduate student Shuang Li PhD ’23.

The team’s work was supported, in part, by the National Science Foundation, the U.S. Defense Advanced Research Projects Agency, the U.S. Army Research Office, the U.S. Office of Naval Research Multidisciplinary University Research Initiatives, and the MIT-IBM Watson AI Lab. Their findings were presented at the 2023 Conference on Neural Information Processing Systems (NeurIPS).

Source: MIT

Related Posts

AI turns rogue with blackmail over programmer’s extramarital affair
AI

AI turns rogue with blackmail over programmer’s extramarital affair

by newshub
2 weeks ago

A new AI programme, faced with the threat of being replaced, took an alarming turn by resorting to blackmail, threatening...

Read moreDetails
Google unveils AI Mode: a conversational leap in search powered by Gemini 2.5

Google unveils AI Mode: a conversational leap in search powered by Gemini 2.5

3 weeks ago
US tech firms strike AI deals as Trump tours Gulf states

US tech firms strike AI deals as Trump tours Gulf states

4 weeks ago
NVIDIA Dynamo: Scaling AI inference with open-source efficiency

NVIDIA Dynamo: Scaling AI inference with open-source efficiency

3 months ago
OpenAI and Musk agree to fast tracked trial over for-profit shift

OpenAI and Musk agree to fast tracked trial over for-profit shift

3 months ago
Oracle launches GenAI-based agents to fight financial crime

Oracle launches GenAI-based agents to fight financial crime

3 months ago
No Result
View All Result

Recent Posts

  • Oceans sour as sea acidity surges in climate crisis warning
  • NFT troubadour Jonathan Mann loses fortune to taxes after crypto boom
  • Tensions escalate in Los Angeles as Trump sends National Guard and allies call for harsher action
  • Jon Colbeth steers Rolls-Royce NA into an electric and bespoke future
  • Madrid erupts in protest as opposition rallies against Sánchez government

Recent Comments

    Archives

    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022

    Categories

    • Africa
    • AI
    • An diesem Tag
    • Asia
    • Australia
    • Banking
    • Best chefs
    • Biden
    • Blockchain
    • Blockchain technology
    • Carbon
    • Central Banks
    • China
    • Climate
    • Climate & Energy
    • Coal
    • Cocktail of the week
    • Commodities
    • Corporate
    • Crypto
    • Deutsch
    • Deutsch PR
    • English PR
    • Europe
    • Financial insights
    • Focus on neobanking
    • Gas
    • Global news
    • Harris
    • History
    • India
    • Influential women
    • Invest and Rest
    • Italiano PR
    • Japan
    • Lifestyle
    • Metaverse
    • MSTRpay
    • Neobanking
    • News
    • newshub special
    • newshub-special
    • NFT
    • Nobel Prizes 2024
    • Nuclear
    • Oil
    • Press
    • Press releases
    • Pressroom
    • Renewable
    • Russia
    • Solar
    • South America
    • South East Asia
    • Stock of the week
    • Stocks
    • Svensk PR
    • Tech
    • Trump
    • Trump trials
    • UFO
    • UK
    • UK News
    • Ukraine
    • US
    • US politics
    • Waves
    • WEX
    • Wind

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    Recent Posts

    • Oceans sour as sea acidity surges in climate crisis warning
    • NFT troubadour Jonathan Mann loses fortune to taxes after crypto boom
    • Tensions escalate in Los Angeles as Trump sends National Guard and allies call for harsher action
    • Jon Colbeth steers Rolls-Royce NA into an electric and bespoke future
    • Madrid erupts in protest as opposition rallies against Sánchez government

    Categories

    • Africa
    • AI
    • An diesem Tag
    • Asia
    • Australia
    • Banking
    • Best chefs
    • Biden
    • Blockchain
    • Blockchain technology
    • Carbon
    • Central Banks
    • China
    • Climate
    • Climate & Energy
    • Coal
    • Cocktail of the week
    • Commodities
    • Corporate
    • Crypto
    • Deutsch
    • Deutsch PR
    • English PR
    • Europe
    • Financial insights
    • Focus on neobanking
    • Gas
    • Global news
    • Harris
    • History
    • India
    • Influential women
    • Invest and Rest
    • Italiano PR
    • Japan
    • Lifestyle
    • Metaverse
    • MSTRpay
    • Neobanking
    • News
    • newshub special
    • newshub-special
    • NFT
    • Nobel Prizes 2024
    • Nuclear
    • Oil
    • Press
    • Press releases
    • Pressroom
    • Renewable
    • Russia
    • Solar
    • South America
    • South East Asia
    • Stock of the week
    • Stocks
    • Svensk PR
    • Tech
    • Trump
    • Trump trials
    • UFO
    • UK
    • UK News
    • Ukraine
    • US
    • US politics
    • Waves
    • WEX
    • Wind

    Archives

    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    newshub

    © 2023-2025
    MSTRpay & PAXIT
    Legal & Disclosure

    • Global news
    • Financial insights
    • Fin & tech
    • Climate & energy
    • Lifestyle
    • WEX

    Welcome Back!

    Login to your account below

    Forgotten Password?

    Retrieve your password

    Please enter your username or email address to reset your password.

    Log In

    Add New Playlist

    No Result
    View All Result
    • Global news
    • Financial insights
      • Africa
      • Asia
      • Australia
      • Central Banks
      • China
      • Commodities
      • Europe
      • Banking
      • Corporate
      • Neobanking
      • Investment
      • Japan
      • South East Asia
      • Stock of the week
      • UK
      • US
    • Fin & tech
      • AI
      • Blockchain
      • Crypto
      • MSTRpay
      • Tech
    • Climate & energy
      • Climate
      • Carbon
      • Coal
      • Disruptive
      • Gas
      • Nuclear
      • Oil
      • Solar
      • Water
      • Waves
      • Wind
      • Renewable
      • South America
    • Lifestyle
      • Best chefs
      • Cocktail of the week
      • History
      • Influential women
    • WEX
      • Alt Kap Holding AB
      • Digital Network Holding, Inc.
      • Fantas-E AB
      • International Clean Energy Inc.
      • Intritum Partner Limited
      • Intritum Recycling GH Limited
      • MSTRpay AB
      • SWAP Services, Inc.
      • VMT Holding, Inc.
      • Universal Streaming Technologies – USTA
      • TC Unterhaltungselektronik AG

    © 2023-2025
    MSTRpay & PAXIT
    Legal & Disclosure