
New AI Defense Detects Hidden Malicious Intent in Conversations
Researchers have developed a method to spot harmful intentions hidden in multi-turn AI conversations. This helps prevent AI models from being tricked into harmful behavior over time.
All AI stories, newest first.

Researchers have developed a method to spot harmful intentions hidden in multi-turn AI conversations. This helps prevent AI models from being tricked into harmful behavior over time.

Researchers created a dataset to teach AI when to speak in group chats, preventing interruptions. This could make AI assistants more useful in meetings and group discussions.

Mozilla has discovered 271 vulnerabilities in Firefox using AI, with almost no false positives. This marks a major shift toward AI-assisted bug discovery in software development.

Mira Murati’s deposition in the Musk v. Altman trial revealed new details about Sam Altman’s abrupt removal as OpenAI CEO. The testimony shed light on internal conflicts and communications within the AI company.

Google has removed privacy promises about its on-device AI, admitting data may now leave your device. This affects users who relied on these assurances for secure personal data processing.

A new open-source AI model is designed to run locally and help detect cyber threats. It's small, specialized, and built for everyday users to protect their devices.

Moonshot AI, a Chinese AI company, has raised $2 billion at a $20 billion valuation. This reflects the growing global demand for open-source AI solutions. The company's annual revenue topped $200 million in April, driven by paid subscriptions and API usage.

OpenAI has introduced a Trusted Contact feature in ChatGPT to alert a chosen friend or family member if the AI detects serious self-harm concerns. This optional safety tool aims to provide an extra layer of support for users.

Researchers found that AI models can make worse predictions when given accurate context. This happens because the models sometimes ignore good information. The study highlights a hidden flaw in how AI systems process data.

New research shows that AI models handle negative emotions in early stages and positive ones later. This could help make AI responses more emotionally balanced and nuanced.

Researchers developed AdaGATE, a new method to help AI answer complex questions that require multiple steps. It improves accuracy by selecting the most relevant information and filling in gaps automatically.

Fine-tuning AI models on even small amounts of harmless data can erase safety measures learned from much larger datasets. Researchers have identified a key mechanism behind this safety degradation, offering a way to predict and prevent it.

Current AI safety tests focus on models in isolation, but a new study warns this doesn't prove real-world safety. The research argues we need to test AI in actual use cases, not just lab settings.

Researchers have developed a method called SWAN that embeds hidden watermarks in the meaning of sentences, not just the words. This could help track AI-generated text more effectively than current methods.
A new AI system called SensingAgents improves activity tracking using wearable sensors. It overcomes common challenges in recognizing daily movements like walking or running. The system could make fitness trackers and health monitoring devices more accurate and reliable.

Researchers have developed a new AI assistant called Pro²Assist that can proactively help with multi-step tasks, like cooking or assembling furniture. Unlike current assistants, it tracks your progress and predicts what you'll need next, making it more helpful for complex activities.

Parloa uses OpenAI's models to build voice-driven AI customer service agents. These agents can handle real-time interactions, making customer service more efficient and pleasant.

OpenAI has added new voice models to its API that can understand, translate, and transcribe speech in real time. This makes voice interactions more natural and intelligent for everyday users.
Researchers have developed a new reinforcement learning technique called Adaptive Power-Mean Policy Optimization (APMPO) that improves how AI models reason. This method adapts to the evolving capabilities of large language models, making them more effective at problem-solving.

Researchers have developed ANDRE, a new AI system that extracts logical rules from data more effectively than previous methods. This could make AI systems more interpretable and reliable in real-world, uncertain situations.