Friday, June 6, 2025
No Result
View All Result
newshub
  • Global news
  • Financial insights
    • Africa
    • Asia
    • Australia
    • Central Banks
    • China
    • Commodities
    • Europe
    • Banking
    • Corporate
    • Neobanking
    • Investment
    • Japan
    • South East Asia
    • Stock of the week
    • UK
    • US
  • Fin & tech
    • AI
    • Blockchain
    • Crypto
    • MSTRpay
    • Tech
  • Climate & energy
    • Climate
    • Carbon
    • Coal
    • Disruptive
    • Gas
    • Nuclear
    • Oil
    • Solar
    • Water
    • Waves
    • Wind
    • Renewable
    • South America
  • Lifestyle
    • Best chefs
    • Cocktail of the week
    • History
    • Influential women
  • WEX
    • Alt Kap Holding AB
    • Digital Network Holding, Inc.
    • Fantas-E AB
    • International Clean Energy Inc.
    • Intritum Partner Limited
    • Intritum Recycling GH Limited
    • MSTRpay AB
    • SWAP Services, Inc.
    • VMT Holding, Inc.
    • Universal Streaming Technologies – USTA
    • TC Unterhaltungselektronik AG
  • Global news
  • Financial insights
    • Africa
    • Asia
    • Australia
    • Central Banks
    • China
    • Commodities
    • Europe
    • Banking
    • Corporate
    • Neobanking
    • Investment
    • Japan
    • South East Asia
    • Stock of the week
    • UK
    • US
  • Fin & tech
    • AI
    • Blockchain
    • Crypto
    • MSTRpay
    • Tech
  • Climate & energy
    • Climate
    • Carbon
    • Coal
    • Disruptive
    • Gas
    • Nuclear
    • Oil
    • Solar
    • Water
    • Waves
    • Wind
    • Renewable
    • South America
  • Lifestyle
    • Best chefs
    • Cocktail of the week
    • History
    • Influential women
  • WEX
    • Alt Kap Holding AB
    • Digital Network Holding, Inc.
    • Fantas-E AB
    • International Clean Energy Inc.
    • Intritum Partner Limited
    • Intritum Recycling GH Limited
    • MSTRpay AB
    • SWAP Services, Inc.
    • VMT Holding, Inc.
    • Universal Streaming Technologies – USTA
    • TC Unterhaltungselektronik AG
No Result
View All Result
newshub
No Result
View All Result
ADVERTISEMENT

Large language models don’t behave like people, even though we may expect them to

2024/07/28/11:22
in AI
Reading Time: 5 mins read
235 17
A A
Large language models don’t behave like people, even though we may expect them to
MSTRpay MSTRpay MSTRpay
ADVERTISEMENT

A new study shows someone’s beliefs about an LLM play a significant role in the model’s performance and are important for how it is deployed.

One thing that makes large language models (LLMs) so powerful is the diversity of tasks to which they can be applied. The same machine-learning model that can help a graduate student draft an email could also aid a clinician in diagnosing cancer.

However, the wide applicability of these models also makes them challenging to evaluate in a systematic way. It would be impossible to create a benchmark dataset to test a model on every type of question that can be asked.

In a new paper, MIT researchers took a different approach. They argue that, because humans decide when to deploy large language models, evaluating a model requires an understanding of how people form beliefs about its capabilities.

For example, the graduate student must decide whether the model could be helpful in drafting a particular email, and the clinician must determine which cases would be best to consult the model on.

Building off this idea, the researchers created a framework to evaluate an LLM based on its alignment with a human’s beliefs about how it will perform on a certain task.

They introduce a human generalization function — a model of how people update their beliefs about an LLM’s capabilities after interacting with it. Then, they evaluate how aligned LLMs are with this human generalization function.

Their results indicate that when models are misaligned with the human generalization function, a user could be overconfident or underconfident about where to deploy it, which might cause the model to fail unexpectedly. Furthermore, due to this misalignment, more capable models tend to perform worse than smaller models in high-stakes situations.

“These tools are exciting because they are general-purpose, but because they are general-purpose, they will be collaborating with people, so we have to take the human in the loop into account,” says study co-author Ashesh Rambachan, assistant professor of economics and a principal investigator in the Laboratory for Information and Decision Systems (LIDS).

Rambachan is joined on the paper by lead author Keyon Vafa, a postdoc at Harvard University; and Sendhil Mullainathan, an MIT professor in the departments of Electrical Engineering and Computer Science and of Economics, and a member of LIDS. The research will be presented at the International Conference on Machine Learning.

Human generalization
As we interact with other people, we form beliefs about what we think they do and do not know. For instance, if your friend is finicky about correcting people’s grammar, you might generalize and think they would also excel at sentence construction, even though you’ve never asked them questions about sentence construction.

“Language models often seem so human. We wanted to illustrate that this force of human generalization is also present in how people form beliefs about language models,” Rambachan says.

As a starting point, the researchers formally defined the human generalization function, which involves asking questions, observing how a person or LLM responds, and then making inferences about how that person or model would respond to related questions.

If someone sees that an LLM can correctly answer questions about matrix inversion, they might also assume it can ace questions about simple arithmetic. A model that is misaligned with this function — one that doesn’t perform well on questions a human expects it to answer correctly — could fail when deployed.

With that formal definition in hand, the researchers designed a survey to measure how people generalize when they interact with LLMs and other people.

They showed survey participants questions that a person or LLM got right or wrong and then asked if they thought that person or LLM would answer a related question correctly. Through the survey, they generated a dataset of nearly 19,000 examples of how humans generalize about LLM performance across 79 diverse tasks.

Measuring misalignment

They found that participants did quite well when asked whether a human who got one question right would answer a related question right, but they were much worse at generalizing about the performance of LLMs.

“Human generalization gets applied to language models, but that breaks down because these language models don’t actually show patterns of expertise like people would,” Rambachan says.

People were also more likely to update their beliefs about an LLM when it answered questions incorrectly than when it got questions right. They also tended to believe that LLM performance on simple questions would have little bearing on its performance on more complex questions.

In situations where people put more weight on incorrect responses, simpler models outperformed very large models like GPT-4.

“Language models that get better can almost trick people into thinking they will perform well on related questions when, in actuality, they don’t,” he says.

One possible explanation for why humans are worse at generalizing for LLMs could come from their novelty — people have far less experience interacting with LLMs than with other people.

“Moving forward, it is possible that we may get better just by virtue of interacting with language models more,” he says.

To this end, the researchers want to conduct additional studies of how people’s beliefs about LLMs evolve over time as they interact with a model. They also want to explore how human generalization could be incorporated into the development of LLMs.

“When we are training these algorithms in the first place, or trying to update them with human feedback, we need to account for the human generalization function in how we think about measuring performance,” he says.

In the meanwhile, the researchers hope their dataset could be used a benchmark to compare how LLMs perform related to the human generalization function, which could help improve the performance of models deployed in real-world situations.

“To me, the contribution of the paper is twofold. The first is practical: The paper uncovers a critical issue with deploying LLMs for general consumer use. If people don’t have the right understanding of when LLMs will be accurate and when they will fail, then they will be more likely to see mistakes and perhaps be discouraged from further use. This highlights the issue of aligning the models with people’s understanding of generalization,” says Alex Imas, professor of behavioural science and economics at the University of Chicago’s Booth School of Business, who was not involved with this work. “The second contribution is more fundamental: The lack of generalization to expected problems and domains helps in getting a better picture of what the models are doing when they get a problem ‘correct.’ It provides a test of whether LLMs ‘understand’ the problem they are solving.”

Source: MIT News

Related Posts

AI turns rogue with blackmail over programmer’s extramarital affair
AI

AI turns rogue with blackmail over programmer’s extramarital affair

by newshub
2 weeks ago

A new AI programme, faced with the threat of being replaced, took an alarming turn by resorting to blackmail, threatening...

Read moreDetails
Google unveils AI Mode: a conversational leap in search powered by Gemini 2.5

Google unveils AI Mode: a conversational leap in search powered by Gemini 2.5

2 weeks ago
US tech firms strike AI deals as Trump tours Gulf states

US tech firms strike AI deals as Trump tours Gulf states

3 weeks ago
NVIDIA Dynamo: Scaling AI inference with open-source efficiency

NVIDIA Dynamo: Scaling AI inference with open-source efficiency

3 months ago
OpenAI and Musk agree to fast tracked trial over for-profit shift

OpenAI and Musk agree to fast tracked trial over for-profit shift

3 months ago
Oracle launches GenAI-based agents to fight financial crime

Oracle launches GenAI-based agents to fight financial crime

3 months ago
No Result
View All Result

Recent Posts

  • Sir David: Beckham to be knighted by King Charles in long-awaited honour
  • European markets edge lower after ECB rate cut and political shocks
  • Robert F. Kennedy dies from from shot wounds
  • Russia escalates aerial assault on Kyiv
  • Musk signals ambition to launch his own political party

Recent Comments

    Archives

    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022

    Categories

    • Africa
    • AI
    • An diesem Tag
    • Asia
    • Australia
    • Banking
    • Best chefs
    • Biden
    • Blockchain
    • Blockchain technology
    • Carbon
    • Central Banks
    • China
    • Climate
    • Climate & Energy
    • Coal
    • Cocktail of the week
    • Commodities
    • Corporate
    • Crypto
    • Deutsch
    • Deutsch PR
    • English PR
    • Europe
    • Financial insights
    • Focus on neobanking
    • Gas
    • Global news
    • Harris
    • History
    • India
    • Influential women
    • Invest and Rest
    • Italiano PR
    • Japan
    • Lifestyle
    • Metaverse
    • MSTRpay
    • Neobanking
    • News
    • newshub special
    • newshub-special
    • NFT
    • Nobel Prizes 2024
    • Nuclear
    • Oil
    • Press
    • Press releases
    • Pressroom
    • Renewable
    • Russia
    • Solar
    • South America
    • South East Asia
    • Stock of the week
    • Stocks
    • Svensk PR
    • Tech
    • Trump
    • Trump trials
    • UFO
    • UK
    • UK News
    • Ukraine
    • US
    • US politics
    • Waves
    • WEX
    • Wind

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    Recent Posts

    • Sir David: Beckham to be knighted by King Charles in long-awaited honour
    • European markets edge lower after ECB rate cut and political shocks
    • Robert F. Kennedy dies from from shot wounds
    • Russia escalates aerial assault on Kyiv
    • Musk signals ambition to launch his own political party

    Categories

    • Africa
    • AI
    • An diesem Tag
    • Asia
    • Australia
    • Banking
    • Best chefs
    • Biden
    • Blockchain
    • Blockchain technology
    • Carbon
    • Central Banks
    • China
    • Climate
    • Climate & Energy
    • Coal
    • Cocktail of the week
    • Commodities
    • Corporate
    • Crypto
    • Deutsch
    • Deutsch PR
    • English PR
    • Europe
    • Financial insights
    • Focus on neobanking
    • Gas
    • Global news
    • Harris
    • History
    • India
    • Influential women
    • Invest and Rest
    • Italiano PR
    • Japan
    • Lifestyle
    • Metaverse
    • MSTRpay
    • Neobanking
    • News
    • newshub special
    • newshub-special
    • NFT
    • Nobel Prizes 2024
    • Nuclear
    • Oil
    • Press
    • Press releases
    • Pressroom
    • Renewable
    • Russia
    • Solar
    • South America
    • South East Asia
    • Stock of the week
    • Stocks
    • Svensk PR
    • Tech
    • Trump
    • Trump trials
    • UFO
    • UK
    • UK News
    • Ukraine
    • US
    • US politics
    • Waves
    • WEX
    • Wind

    Archives

    • June 2025
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    newshub

    © 2023-2025
    MSTRpay & PAXIT
    Legal & Disclosure

    • Global news
    • Financial insights
    • Fin & tech
    • Climate & energy
    • Lifestyle
    • WEX

    Welcome Back!

    Login to your account below

    Forgotten Password?

    Retrieve your password

    Please enter your username or email address to reset your password.

    Log In

    Add New Playlist

    No Result
    View All Result
    • Global news
    • Financial insights
      • Africa
      • Asia
      • Australia
      • Central Banks
      • China
      • Commodities
      • Europe
      • Banking
      • Corporate
      • Neobanking
      • Investment
      • Japan
      • South East Asia
      • Stock of the week
      • UK
      • US
    • Fin & tech
      • AI
      • Blockchain
      • Crypto
      • MSTRpay
      • Tech
    • Climate & energy
      • Climate
      • Carbon
      • Coal
      • Disruptive
      • Gas
      • Nuclear
      • Oil
      • Solar
      • Water
      • Waves
      • Wind
      • Renewable
      • South America
    • Lifestyle
      • Best chefs
      • Cocktail of the week
      • History
      • Influential women
    • WEX
      • Alt Kap Holding AB
      • Digital Network Holding, Inc.
      • Fantas-E AB
      • International Clean Energy Inc.
      • Intritum Partner Limited
      • Intritum Recycling GH Limited
      • MSTRpay AB
      • SWAP Services, Inc.
      • VMT Holding, Inc.
      • Universal Streaming Technologies – USTA
      • TC Unterhaltungselektronik AG

    © 2023-2025
    MSTRpay & PAXIT
    Legal & Disclosure