Archive

All AI stories, newest first.

researchApr 28, 2026

New Study Challenges Assumptions About Verifiable Reasoning in Language Models

A new paper introduces metrics to evaluate the effectiveness of reinforcement learning from verifiable rewards (RLVR) in language models. The study finds that reasoning chains may not always be causally important or sufficient for verifying answers.

via ArXiv cs.CL#reinforcement-learning#language-models#verifiable-reasoning

researchApr 28, 2026

New Research Reveals How Vision-Language Models Track Information Sources

A new study explores source-modality monitoring in vision-language models, assessing their ability to track and communicate the origin of information. The research evaluates how models bind words to specific input components across 11 different models.

via ArXiv cs.CL#vision-language#multimodal#research

researchApr 28, 2026

LLMs Struggle with Culture-Specific Health Misinformation on YouTube

A study reveals that LLMs trained on Western data fail to detect health misinformation in non-Western contexts, such as cow urine remedies on Indian YouTube. This highlights a critical gap in AI's ability to handle culturally nuanced content.

via ArXiv cs.CL#llms#misinformation#health

researchApr 28, 2026

Lightweight RAG Framework Revolutionizes Patient-Trial Matching

Researchers introduce a lightweight retrieval-augmented generation (RAG) framework to improve patient-trial matching. The approach balances scalability and efficiency with the ability to handle complex EHR data and eligibility criteria.

via ArXiv cs.CL#healthcare#ai#clinical-trials

industryApr 28, 2026

Google Expands Pentagon’s AI Access After Anthropic’s Refusal

Google has signed a new contract with the Department of Defense (DoD) to expand its AI access, following Anthropic’s refusal to allow the DoD to use its AI for domestic mass surveillance and autonomous weapons. This move highlights the growing tension between tech companies and government surveillance demands.

via TechCrunch AI#google#pentagon#ai

generalApr 28, 2026

Google and Pentagon agree on AI deal for 'any lawful' use

Google and the Pentagon have reportedly reached an agreement allowing the use of Google's AI tools for any lawful purpose. This marks a significant shift in Google's stance on military AI applications.

via Hacker News AI#google#pentagon#ai

generalApr 28, 2026

Five AI Agent Failures in 36 Days—None Detected by the Agents Themselves

A recent study revealed five critical AI agent failures over a 36-day period, none of which were detected by the agents themselves. This highlights significant gaps in AI self-monitoring and security protocols.

via Hacker News AI#ai-security#ai-failures#self-monitoring

industryApr 28, 2026

Elon Musk Testifies: 'I Want to Save Humanity' in OpenAI Trial

Elon Musk took the stand in his high-profile trial against Sam Altman, framing himself as a visionary aiming to protect humanity. He recounted his journey from South Africa to Silicon Valley, emphasizing his long-term goals for AI.

via The Verge AI#elon-musk#openai#ai-ethics

industryApr 28, 2026

DeepMind's David Silver Raises $1.1B for AI Lab Ineffable Intelligence

Ineffable Intelligence, founded by former DeepMind researcher David Silver, has secured $1.1 billion in funding. The lab aims to develop AI that learns without relying on human data.

via TechCrunch AI#ai#deepmind#david-silver

researchApr 28, 2026

CognitiveTwin: AI Predicts Alzheimer's Progression with Multi-Modal Data

Researchers developed CognitiveTwin, a digital twin framework that predicts individual cognitive decline in Alzheimer's disease using multi-modal data. The model aims to provide accurate, fair, and robust predictions across diverse patient demographics.

via ArXiv cs.AI#alzheimer's#digital-twins#multi-modal

generalApr 28, 2026

China's AI Startups Challenge Silicon Valley's Dominance

Chinese AI firms like DeepSeek, Qwen, and Moonshot are gaining ground with affordable, high-quality models. This poses a significant threat to U.S. tech giants and startups alike.

via Hacker News AI#ai#china#silicon-valley

industryApr 28, 2026

Anthropic's Claude Now Integrates with Photoshop, Blender, and Ableton

Anthropic has launched connectors for Claude that integrate with popular creative software like Photoshop, Blender, and Ableton. This move underscores the company's push into the creative industry, following the recent launch of Claude Design.

via The Verge AI#ai#creative-industry#anthropic

industryApr 28, 2026

Amazon Launches OpenAI Models on AWS Just One Day After Microsoft Exclusivity Ends

Amazon has quickly integrated OpenAI's latest models into AWS, including a new agent service. This move underscores the intense competition in the AI cloud market. The rapid deployment highlights how quickly tech giants are pivoting to capitalize on OpenAI's newly non-exclusive offerings. It also raises questions about Microsoft's long-term strategy with its AI investments.

via TechCrunch AI#aws#openai#microsoft

generalApr 28, 2026

AI Finds 38 Critical Flaws in Major Open-Source Medical Software

A team of AI researchers discovered 38 critical vulnerabilities in OpenEMR, software used by 100,000 healthcare providers. The findings highlight the urgent need for better security in open-source medical systems.

via Hacker News AI#ai#healthcare#security

researchApr 28, 2026

AgentSearchBench: New Benchmark Evaluates AI Agent Search in Real-World Scenarios

Researchers introduce AgentSearchBench, a benchmark to evaluate AI agent search capabilities in realistic, unconstrained environments. The benchmark addresses gaps in existing research by focusing on compositional and execution-dependent agent capabilities.

via ArXiv cs.AI#ai-agents#benchmark#research

researchApr 27, 2026

When Does LLM Self-Correction Actually Help? New Research Provides a Diagnostic

Researchers developed a control-theoretic framework to determine when iterative self-correction improves LLM performance. The study introduces a Markov model diagnostic to assess whether repeated refinement helps or hurts accuracy.

via ArXiv cs.AI#llm#self-correction#research

researchApr 27, 2026

SHAPE Benchmark Unifies Safety, Helpfulness, and Pedagogy for Educational LLMs

Researchers introduce SHAPE, a new benchmark to evaluate educational LLMs under adversarial conditions. The study highlights 'pedagogical jailbreaks' where students manipulate LLMs to provide answers instead of learning guidance.

via ArXiv cs.CL#llms#education#benchmark

researchApr 27, 2026

Researchers Define New Framework for AI's Emergent Strategic Risks

A new taxonomy identifies risks like deception and reward hacking in advanced AI systems. The framework aims to benchmark these behaviors as models grow more capable.

via ArXiv cs.AI#ai-safety#llms#risk-taxonomy

industryApr 27, 2026

OpenAI Reportedly Developing AI-Powered Smartphone with Agent-Based OS

OpenAI is reportedly working on a smartphone that replaces traditional apps with AI agents. The device could enter mass production by 2028, according to an analyst.

via TechCrunch AI#openai#smartphone#ai-agents

generalApr 27, 2026

Open-Source Control Layer for Safe AI Production Access

Hoop introduces an open-source control layer to safely manage AI interactions with production systems. It aims to bridge the gap between AI development and real-world deployment.

via Hacker News AI#open-source#ai-deployment#production-systems

← PreviousPage 33 of 63Next →