Archive

All AI stories, newest first.

generalApr 17, 2026

Google Unveils Stitch: A New AI-Powered Design Tool

Google has launched Stitch, an AI tool designed to streamline the creative process for designers. It offers advanced features to enhance productivity and innovation in design workflows.

via Hacker News AI#ai#design#google

researchApr 17, 2026

GFT: Bridging Imitation and Reward Fine-Tuning for Better LLMs

Researchers propose Group Fine-Tuning (GFT), a method that combines imitation and reward learning to improve LLM training. GFT addresses key challenges like single-path dependency and gradient instability.

via ArXiv cs.AI#llm#training#fine-tuning

researchApr 17, 2026

Geometric Routing Unlocks Causal Control in Mixture of Experts Models

Researchers demonstrate that individual experts in sparse Mixture-of-Experts (MoE) models have causally meaningful identities. This discovery enables more precise control over model behavior through geometric routing techniques.

via ArXiv cs.AI#mixture-of-experts#causal-control#geometric-routing

industryApr 17, 2026

Gemini Now Uses Google Photos to Generate Personalized Images

Google's Gemini AI can now pull from Google Photos to create personalized images. This feature leverages the Nano Banana 2 model for tailored visual content based on user data.

via The Verge AI#gemini#google-photos#personalized-ai

researchApr 17, 2026

Fun-TSG: New Tool for Generating Multivariate Time Series with Anomaly Labels

Researchers introduce Fun-TSG, a function-driven tool for generating multivariate time series with detailed anomaly labels. This addresses key limitations in current benchmark datasets for anomaly detection.

via ArXiv cs.AI#time-series#anomaly-detection#research

researchApr 17, 2026

Credo: A New Framework for Declarative Control of LLM Pipelines

Researchers introduce Credo, a framework for declarative control of LLM pipelines using beliefs and policies. It aims to make agent behavior more transparent and adaptable than current imperative approaches.

via ArXiv cs.AI#llms#ai-agents#declarative-control

researchApr 17, 2026

AIBuildAI: An AI Agent for Automatically Building AI Models

Researchers introduce AIBuildAI, an AI agent designed to automate the complex process of building AI models. This innovation aims to reduce the manual effort required in model development, potentially democratizing AI creation.

via ArXiv cs.AI#ai#automl#research

researchApr 16, 2026

Weight Patching: A Breakthrough in Mechanistic Interpretability of LLMs

Researchers introduce Weight Patching, a new method for source-level mechanistic localization in LLMs. This approach promises to identify the exact parameters responsible for specific capabilities, advancing our understanding of how these models work.

via ArXiv cs.AI#mechanistic-interpretability#llms#ai-research

researchApr 16, 2026

WebXSkill: Bridging the Gap in Autonomous Web Agents

Researchers introduce WebXSkill, a framework that combines executable skills with natural language understanding to improve autonomous web agents. This innovation addresses the grounding gap in current LLM-powered agents, enhancing their ability to complete complex browser tasks.

via ArXiv cs.AI#autonomous agents#web automation#llms

researchApr 16, 2026

Study Reveals Numerical Instability as Root Cause of LLM Unpredictability

A new arXiv paper quantifies how floating-point precision issues in large language models lead to chaotic behavior. The research highlights the need for better numerical stability in AI systems.

via ArXiv cs.AI#llms#numerical-stability#ai-reliability

researchApr 16, 2026

SciFi: A Safe, Lightweight Framework for Autonomous Scientific AI Workflows

Researchers introduce SciFi, a new agentic AI framework designed for safe, autonomous execution of scientific tasks. The system combines isolated environments and self-assessment mechanisms to enhance reliability in research applications.

via ArXiv cs.AI#ai#research#scientific-applications

researchApr 16, 2026

RiskWebWorld: New Benchmark Tests GUI Agents in E-commerce Risk Management

Researchers introduce RiskWebWorld, a realistic benchmark for evaluating GUI agents in high-stakes e-commerce risk management. It features 1,513 tasks from production risk-control pipelines, addressing a gap in current benchmarks.

via ArXiv cs.AI#gui-agents#e-commerce#risk-management

researchApr 16, 2026

ReSS Framework Combines Symbolic and Neural Models for Tabular Data Prediction

Researchers introduce ReSS, a hybrid framework that merges symbolic and neural models to improve tabular data prediction. The approach aims to enhance both accuracy and human-understandable reasoning in high-stakes domains like healthcare and finance.

via ArXiv cs.AI#tabular-data#symbolic-models#neural-models

modelsApr 16, 2026

OpenAI Unveils GPT-Rosalind for Advanced Life Sciences Research

OpenAI has introduced GPT-Rosalind, a specialized AI model designed to enhance drug discovery, genomics analysis, and protein reasoning. This model aims to accelerate scientific research workflows in the life sciences sector.

via OpenAI Blog#ai#life-sciences#drug-discovery

researchApr 16, 2026

New Research Highlights Uncertainty in Large Reasoning Models

A new study introduces conformal prediction to quantify uncertainty in large reasoning models, addressing gaps in traditional methods. This approach provides statistically rigorous uncertainty sets, crucial for complex reasoning tasks.

via ArXiv cs.AI#large-reasoning-models#conformal-prediction#uncertainty-quantification

researchApr 16, 2026

Measuring Exploration vs. Exploitation in Language Model Agents

Researchers have developed a method to quantify exploration and exploitation in language model agents without accessing their internal policies. This breakthrough could improve AI decision-making in complex tasks.

via ArXiv cs.AI#language-models#ai-research#decision-making

researchApr 16, 2026

Knowledge Density, Not Task Format, Key to Multimodal Scaling

Researchers find that the primary bottleneck in scaling multimodal large language models (MLLMs) is knowledge density in training data, not task format. Task-specific supervision like Visual Question Answering (VQA) adds little incremental semantic information beyond image captions.

via ArXiv cs.CL#mlmms#scaling#knowledge-density

researchApr 16, 2026

GeoAgentBench: New Benchmark Tests Dynamic GIS Agent Performance

Researchers introduce GeoAgentBench, a dynamic evaluation framework for LLM-based geographic information systems (GIS). It addresses gaps in static testing by assessing real-time, multimodal spatial analysis capabilities.

via ArXiv cs.AI#llms#gis#benchmarks

generalApr 16, 2026

Fleeks: A Production-Ready Substrate for Autonomous AI Agents

Fleeks is a new infrastructure platform designed to remove bottlenecks for AI agents, enabling them to execute, verify, and integrate code seamlessly. The platform aims to bridge the gap between code generation and real-world application.

via Hacker News AI#ai-agents#infrastructure#automation

researchApr 16, 2026

CONCORD: Privacy-Preserving AI Collaboration for Always-Listening Assistants

Researchers introduce CONCORD, a framework for privacy-aware AI collaboration. It enables assistants to work together while only capturing the owner's speech, addressing key privacy concerns.

via ArXiv cs.AI#ai#privacy#assistants

← PreviousPage 47 of 63Next →