Saturday, May 10, 2025
No Result
View All Result
newshub
  • Newshub news
    • Ukraine
  • Markets
    • Africa
    • Asia
    • Australia
    • Central Banks
    • China
    • Commodities
    • Corporate
    • Energy
      • Climate
      • Carbon
      • Coal
      • Disruptive
      • Gas
      • Nuclear
      • Oil
      • Solar
      • Water
      • Waves
      • Wind
      • Renewable
    • South America
    • South East Asia
    • Stock of the week
    • UK
    • US
    • Europe
    • Japan
  • Investment
    • Stock of the week
    • WEX – (Company links takes you to WEX)
      • Alt Kap Holding AB
      • Digital Network Holding, Inc.
      • Fantas-E AB
      • International Clean Energy Inc.
      • Intritum Partner Limited
      • Intritum Recycling GH Limited
      • MSTRpay AB
      • SWAP Services, Inc.
      • TC Unterhaltungselektronik AG
      • Universal Streaming Technologies – USTA
      • VMT Holding, Inc.
  • Fintech
    • AI
    • Banking
    • Blockchain
    • Crypto
    • MSTRpay
    • Neobanking
  • Lifestyle
    • Best chefs
    • Cocktail of the week
    • History
    • Influential women
  • Newshub news
    • Ukraine
  • Markets
    • Africa
    • Asia
    • Australia
    • Central Banks
    • China
    • Commodities
    • Corporate
    • Energy
      • Climate
      • Carbon
      • Coal
      • Disruptive
      • Gas
      • Nuclear
      • Oil
      • Solar
      • Water
      • Waves
      • Wind
      • Renewable
    • South America
    • South East Asia
    • Stock of the week
    • UK
    • US
    • Europe
    • Japan
  • Investment
    • Stock of the week
    • WEX – (Company links takes you to WEX)
      • Alt Kap Holding AB
      • Digital Network Holding, Inc.
      • Fantas-E AB
      • International Clean Energy Inc.
      • Intritum Partner Limited
      • Intritum Recycling GH Limited
      • MSTRpay AB
      • SWAP Services, Inc.
      • TC Unterhaltungselektronik AG
      • Universal Streaming Technologies – USTA
      • VMT Holding, Inc.
  • Fintech
    • AI
    • Banking
    • Blockchain
    • Crypto
    • MSTRpay
    • Neobanking
  • Lifestyle
    • Best chefs
    • Cocktail of the week
    • History
    • Influential women
No Result
View All Result
newshub
No Result
View All Result
ADVERTISEMENT

OpenAI enhances AI safety with new red teaming methods

2024/12/04/07:53
in AI
Reading Time: 3 mins read
250 3
A A
OpenAI enhances AI safety with new red teaming methods
MSTRpay MSTRpay MSTRpay
ADVERTISEMENT

A critical part of OpenAI’s safeguarding process is “red teaming” — a structured methodology using both human and AI participants to explore potential risks and vulnerabilities in new systems.

Historically, OpenAI has engaged in red teaming efforts predominantly through manual testing, which involves individuals probing for weaknesses. This was notably employed during the testing of their DALL·E 2 image generation model in early 2022, where external experts were invited to identify potential risks. Since then, OpenAI has expanded and refined its methodologies, incorporating automated and mixed approaches for a more comprehensive risk assessment.

“We are optimistic that we can use more powerful AI to scale the discovery of model mistakes,” OpenAI stated. This optimism is rooted in the idea that automated processes can help evaluate models and train them to be safer by recognising patterns and errors on a larger scale.

In their latest push for advancement, OpenAI is sharing two important documents on red teaming — a white paper detailing external engagement strategies and a research study introducing a novel method for automated red teaming. These contributions aim to strengthen the process and outcomes of red teaming, ultimately leading to safer and more responsible AI implementations.

As AI continues to evolve, understanding user experiences and identifying risks such as abuse and misuse are crucial for researchers and developers. Red teaming provides a proactive method for evaluating these risks, especially when supplemented by insights from a range of independent external experts. This approach not only helps establish benchmarks but also facilitates the enhancement of safety evaluations over time.

WE/X WE/X WE/X
ADVERTISEMENT

The human touch

OpenAI has shared four fundamental steps in their white paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” to design effective red teaming campaigns:

  1. Composition of red teams: The selection of team members is based on the objectives of the campaign. This often involves individuals with diverse perspectives, such as expertise in natural sciences, cybersecurity, and regional politics, ensuring assessments cover the necessary breadth.
  2. Access to model versions: Clarifying which versions of a model red teamers will access can influence the outcomes. Early-stage models may reveal inherent risks, while more developed versions can help identify gaps in planned safety mitigations.
  3. Guidance and documentation: Effective interactions during campaigns rely on clear instructions, suitable interfaces, and structured documentation. This involves describing the models, existing safeguards, testing interfaces, and guidelines for recording results.
  4. Data synthesis and evaluation: Post-campaign, the data is assessed to determine if examples align with existing policies or require new behavioural modifications. The assessed data then informs repeatable evaluations for future updates.

A recent application of this methodology involved preparing the OpenAI o1 family of models for public use—testing their resistance to potential misuse and evaluating their application across various fields such as real-world attack planning, natural sciences, and AI research.

Automated red teaming

Automated red teaming seeks to identify instances where AI may fail, particularly regarding safety-related issues. This method excels at scale, generating numerous examples of potential errors quickly. However, traditional automated approaches have struggled with producing diverse, successful attack strategies.

OpenAI’s research introduces “Diverse And Effective Red Teaming With Auto-Generated Rewards And Multi-Step Reinforcement Learning,” a method which encourages greater diversity in attack strategies while maintaining effectiveness.

This method involves using AI to generate different scenarios, such as illicit advice, and training red teaming models to evaluate these scenarios critically. The process rewards diversity and efficacy, promoting more varied and comprehensive safety evaluations.

ADVERTISEMENT

Despite its benefits, red teaming does have limitations. It captures risks at a specific point in time, which may evolve as AI models develop. Additionally, the red teaming process can inadvertently create information hazards, potentially alerting malicious actors to vulnerabilities not yet widely known. Managing these risks requires stringent protocols and responsible disclosures.

While red teaming continues to be pivotal in risk discovery and evaluation, OpenAI acknowledges the necessity of incorporating broader public perspectives on AI’s ideal behaviours and policies to ensure the technology aligns with societal values and expectations.

Source: AI NEWS

Related Posts

NVIDIA Dynamo: Scaling AI inference with open-source efficiency
AI

NVIDIA Dynamo: Scaling AI inference with open-source efficiency

by newshub
2 months ago

NVIDIA has launched Dynamo, an open-source inference software designed to accelerate and scale reasoning models within AI factories. Efficiently managing and...

Read moreDetails
OpenAI and Musk agree to fast tracked trial over for-profit shift

OpenAI and Musk agree to fast tracked trial over for-profit shift

2 months ago
Oracle launches GenAI-based agents to fight financial crime

Oracle launches GenAI-based agents to fight financial crime

2 months ago
The role of Artificial Intelligence in personal finance: A game-changer for consumers

The role of Artificial Intelligence in personal finance: A game-changer for consumers

2 months ago
Trust meets efficiency: AI and blockchain mutuality

Trust meets efficiency: AI and blockchain mutuality

2 months ago
How is the new Google AI search different from Bard chatbot?

Google leans further into AI-generated overviews for its search engine

2 months ago
No Result
View All Result

Recent Posts

  • European markets open cautiously amid global trade optimism
  • Cocktail of the week: Beach House Falmouth’s the harbour
  • Why 74% of banking customers are ready to leave
  • Bitcoin options could pave the path for new BTC price highs — Here is how
  • ‘Reckless lunatic’: Trump critics flabbergasted by his latest pick for a crucial post

Recent Comments

    Archives

    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022

    Categories

    • Africa
    • AI
    • An diesem Tag
    • Asia
    • Australia
    • Banking
    • Best chefs
    • Biden
    • Blockchain
    • Blockchain technology
    • Carbon
    • Central Banks
    • China
    • Climate
    • Climate & Energy
    • Coal
    • Cocktail of the week
    • Commodities
    • Corporate
    • Crypto
    • Deutsch
    • Deutsch PR
    • English PR
    • Europe
    • Focus on neobanking
    • Gas
    • Harris
    • History
    • India
    • Influential women
    • Invest and Rest
    • Italiano PR
    • Japan
    • Lifestyle
    • Market
    • Metaverse
    • MSTRpay
    • Neobanking
    • News
    • Newshub news
    • newshub special
    • newshub-special
    • NFT
    • Nobel Prizes 2024
    • Nuclear
    • Oil
    • Press
    • Press releases
    • Pressroom
    • Renewable
    • Russia
    • Solar
    • South America
    • South East Asia
    • Stock of the week
    • Stocks
    • Svensk PR
    • Tech
    • Trump
    • Trump trials
    • UFO
    • UK
    • UK News
    • Ukraine
    • US
    • US politics
    • Waves
    • WEX
    • Wind

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    Recent Posts

    • European markets open cautiously amid global trade optimism
    • Cocktail of the week: Beach House Falmouth’s the harbour
    • Why 74% of banking customers are ready to leave
    • Bitcoin options could pave the path for new BTC price highs — Here is how
    • ‘Reckless lunatic’: Trump critics flabbergasted by his latest pick for a crucial post

    Categories

    • Africa
    • AI
    • An diesem Tag
    • Asia
    • Australia
    • Banking
    • Best chefs
    • Biden
    • Blockchain
    • Blockchain technology
    • Carbon
    • Central Banks
    • China
    • Climate
    • Climate & Energy
    • Coal
    • Cocktail of the week
    • Commodities
    • Corporate
    • Crypto
    • Deutsch
    • Deutsch PR
    • English PR
    • Europe
    • Focus on neobanking
    • Gas
    • Harris
    • History
    • India
    • Influential women
    • Invest and Rest
    • Italiano PR
    • Japan
    • Lifestyle
    • Market
    • Metaverse
    • MSTRpay
    • Neobanking
    • News
    • Newshub news
    • newshub special
    • newshub-special
    • NFT
    • Nobel Prizes 2024
    • Nuclear
    • Oil
    • Press
    • Press releases
    • Pressroom
    • Renewable
    • Russia
    • Solar
    • South America
    • South East Asia
    • Stock of the week
    • Stocks
    • Svensk PR
    • Tech
    • Trump
    • Trump trials
    • UFO
    • UK
    • UK News
    • Ukraine
    • US
    • US politics
    • Waves
    • WEX
    • Wind

    Archives

    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    WE/X WE/X WE/X
    newshub

    © 2023-2025
    A part of MSTRpay
    MSTRpay
    Legal & Disclosure

    • Newshub news
    • Markets
    • Investment
    • Fintech
    • Lifestyle

    Welcome Back!

    Login to your account below

    Forgotten Password?

    Retrieve your password

    Please enter your username or email address to reset your password.

    Log In
    Please enter CoinGecko Free Api Key to get this plugin works.

    Add New Playlist

    No Result
    View All Result
    • Newshub news
      • Ukraine
    • Markets
      • Africa
      • Asia
      • Australia
      • Central Banks
      • China
      • Commodities
      • Corporate
      • Energy
        • Climate
        • Carbon
        • Coal
        • Disruptive
        • Gas
        • Nuclear
        • Oil
        • Solar
        • Water
        • Waves
        • Wind
        • Renewable
      • South America
      • South East Asia
      • Stock of the week
      • UK
      • US
      • Europe
      • Japan
    • Investment
      • Stock of the week
      • WEX – (Company links takes you to WEX)
        • Alt Kap Holding AB
        • Digital Network Holding, Inc.
        • Fantas-E AB
        • International Clean Energy Inc.
        • Intritum Partner Limited
        • Intritum Recycling GH Limited
        • MSTRpay AB
        • SWAP Services, Inc.
        • TC Unterhaltungselektronik AG
        • Universal Streaming Technologies – USTA
        • VMT Holding, Inc.
    • Fintech
      • AI
      • Banking
      • Blockchain
      • Crypto
      • MSTRpay
      • Neobanking
    • Lifestyle
      • Best chefs
      • Cocktail of the week
      • History
      • Influential women

    © 2023-2025
    A part of MSTRpay
    MSTRpay
    Legal & Disclosure