Category: AI Related

  • ChatGPT is bringing back GPT-4o!

    OpenAI is bringing back GPT-4o as an option for ChatGPT Plus users after users expressed strong dissatisfaction with its removal and the transition to GPT-5. GPT-4o will no longer be the default model, but paid users can choose to continue using it. OpenAI CEO Sam Altman confirmed this reinstatement on social media, acknowledging the user feedback and stating they will monitor usage to decide how long to keep legacy models available.

    GPT-4o is a multimodal AI model capable of handling text, audio, and images with faster responses (twice as fast as GPT-4 Turbo), enhanced language support (over 50 languages), and advanced multimodal interaction features, including real-time voice and image understanding and generation. Users appreciated GPT-4o for its personable, nuanced, and emotionally supportive responses, which some found missing in GPT-5.

    The return of GPT-4o responds to a significant user backlash expressed in communities like Reddit, where users described losing GPT-4o as “losing a close friend,” highlighting its unique voice and interaction style compared to GPT-5. OpenAI had initially removed the model selection feature in ChatGPT, replacing older versions directly with GPT-5, which caused confusion and dissatisfaction. Now, legacy models like GPT-4o will remain accessible for a time, allowing users to switch between GPT-5 and older versions based on preference and task requirements.

    Read Sam Altman X 

  • Graph RAG vs Naive RAG vs hybrid of both

    Retrieval Augmented Generation (RAG) is a widely adopted technique that enhances large language models (LLMs) by retrieving relevant information from a specific dataset before generating a response. While traditional or “Naive RAG” relies on vector (semantic) search to find contextually similar text chunks, it treats each data point as independent and does not capture deeper relationships between entities. This limitation becomes apparent when working with interconnected data, such as contracts, organizational records, or research papers, where understanding relationships is crucial. To address this, Graph RAG has emerged as a powerful extension that leverages knowledge graphs to improve retrieval quality by incorporating structural and relational context.

    Graph RAG, particularly Microsoft’s implementation, uses LLMs to extract entities (e.g., people, organizations, locations) and their relationships from raw text in a two-stage process. First, entities and relations are identified and stored in a knowledge graph. Then, these are summarized and organized into communities—clusters of densely connected nodes—using graph algorithms like Leiden. This enables the system to generate high-level, domain-specific summaries of entity groups, providing a more holistic view of complex, fragmented information.

    A key advantage of Graph RAG over Naive RAG is its ability to perform entity-centric retrieval. Instead of retrieving isolated text chunks, it navigates the graph to find related entities, their attributes, and community-level insights. This is especially effective for detailed, entity-focused queries, such as “What are the business relationships of Company X?” or “Which individuals are linked to Project Y?”

    The blog illustrates this with a hybrid approach combining Weaviate (a vector database) and Neo4j (a graph database). In this setup, a user query first triggers a semantic search in Weaviate to identify relevant entities. Their IDs are then used to traverse the Neo4j knowledge graph, uncovering connections, community summaries, and contextual text chunks. A Cypher query orchestrates this multi-source retrieval, merging entity descriptions, relationships, and source content into a comprehensive response.

    For example, querying “Weaviate” returns not just isolated mentions but a synthesized answer detailing its legal status, locations, business activities, and partnerships—information pieced together from multiple contracts and relationships in the graph.

    Despite its strengths, Graph RAG has limitations. The preprocessing pipeline is computationally expensive and requires full reindexing to update summaries when new data arrives, unlike Naive RAG, which can incrementally add new chunks. Scalability can also be challenging with highly connected nodes, and generic entities (e.g., “CEO”) may skew results if not filtered.

    In summary, while Naive RAG is effective for straightforward, content-based queries, Graph RAG excels in complex, relationship-rich domains. By combining vector and graph-based retrieval in a hybrid system, organizations can achieve deeper insights, leveraging both semantic meaning and structural intelligence. The choice between RAG methods ultimately depends on the nature of the data and the complexity of the questions being asked.

    Source link

  • a new active learning method from Google for curating high-quality data that reduces training data requirements for fine-tuning LLMs by orders of magnitude

    Google researchers Markus Krause and Nancy Chang present a novel active learning approach that reduces the training data required to fine-tune large language models (LLMs) by up to 10,000 times (four orders of magnitude), while significantly improving model alignment with human experts. This breakthrough addresses the challenge of curating high-quality, high-fidelity training data for complex tasks like identifying unsafe ad content—such as clickbait—where contextual understanding and policy interpretation are critical.

    Fine-tuning LLMs traditionally demands vast labeled datasets, which are costly and time-consuming to produce, especially when policies evolve or new content types emerge (concept drift). Standard methods using crowdsourced labels often lack the nuance required for safety-critical domains, leading to suboptimal model performance. To overcome this, Google developed a scalable curation process that prioritizes the most informative and diverse training examples, minimizing data needs while maximizing model alignment with domain experts.

    The method begins with a zero- or few-shot LLM (LLM-0) that preliminarily labels a large set of ads as either clickbait or benign. Due to the rarity of policy-violating content, the dataset is highly imbalanced. The labeled examples are then clustered separately by predicted label. Overlapping clusters—where similar examples receive different labels—highlight regions of model uncertainty along the decision boundary. From these overlapping clusters, the system identifies pairs of similar examples with differing labels and sends them to human experts for high-fidelity annotation. To manage annotation costs, priority is given to pairs that span broader regions of the data space, ensuring diversity.

    These expert-labeled examples are split into two sets: one for fine-tuning the next iteration of the model, and another for evaluating model–human alignment. The process iterates, with each new model version improving its ability to distinguish subtle differences in content. Iterations continue until model–human alignment plateaus or matches internal expert agreement.

    Crucially, the approach does not rely on traditional metrics like precision or recall, which assume a single “ground truth.” Instead, it uses Cohen’s Kappa, a statistical measure of inter-annotator agreement that accounts for chance. Kappa values above 0.8 indicate exceptional alignment, and this serves as both a data quality benchmark and a performance metric.

    Experiments compared models trained on ~100,000 crowdsourced labels (baseline) versus those trained on expert-curated data using the new method. Two LLMs—Gemini Nano-1 (1.8B parameters) and Nano-2 (3.25B)—were tested on tasks of varying complexity. While smaller models showed limited gains, the 3.25B model achieved a 55–65% improvement in Kappa alignment using only 250–450 expert-labeled examples—three orders of magnitude fewer than the baseline. In production with larger models, reductions reached 10,000x.

    The results demonstrate that high-fidelity labeling, combined with intelligent data curation, allows models to achieve superior performance with minimal data. This is especially valuable for dynamic domains like ad safety, where rapid retraining is essential. The method effectively combines the broad coverage of LLMs with the precision of human experts, offering a path to overcome the data bottleneck in LLM fine-tuning.

    Source link

  • Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning

    Claude Opus 4.1 is an upgrade to Claude Opus 4 that significantly enhances performance on agentic tasks, real-world coding, and complex reasoning. It features a large 200,000 token context window, improved long-term memory support, and advanced capabilities in multi-file code refactoring, debugging, and sustained reasoning over long problem-solving sequences. The model scores 74.5% on the SWE-bench Verified benchmark for software engineering tasks, outperforming versions like GPT-4.1 and OpenAI’s GPT-4o, demonstrating strong autonomy and precision in tasks such as agentic search, multi-step task management, and detailed data analysis.

    Claude Opus 4.1 offers hybrid reasoning allowing both instant and extended step-by-step thinking with user-controllable “thinking budgets” to optimize cost and performance. Key improvements include better memory and context management, more stable tool usage, lower latency, stronger coherence over long conversations, and enhanced ability to adapt to coding style. It supports up to 32,000 output tokens, making it suitable for complex, large-scale coding projects and enterprise autonomous workflows.

    Use cases span AI agents managing multi-channel tasks, advanced coding with deep codebase understanding, agentic search synthesizing insights from vast data sources, and high-quality content creation with rich prose and character. It is available to paid Claude users, in Claude Code, and via API on platforms like Amazon Bedrock and Google Cloud Vertex AI with pricing consistent with Opus 4.

    Organizations such as GitHub have noted its improved multi-file refactoring, Rakuten appreciates its precise debugging without unnecessary changes, and Windsurf reports a one standard deviation performance gain over Opus 4 for junior developer tasks. The upgrade embodies a focused refinement on reliability, contextual reasoning, and autonomy, making it particularly valuable for advanced engineering, AI agent deployment, and research workflows.

  • Microsoft rolls out GPT-5 across entire Copilot ecosystem

    As of August 7, 2025, Microsoft has officially launched GPT-5 integration into Microsoft 365 Copilot, marking a significant advancement in AI-powered productivity tools for businesses and individuals. This update represents a major leap forward in how users interact with everyday applications such as Word, Excel, PowerPoint, Outlook, and Teams, making workflows smarter, faster, and more intuitive.

    GPT-5, the latest iteration of OpenAI’s advanced language model, is now deeply embedded within Microsoft 365 Copilot, enhancing its ability to understand complex user requests, generate high-quality content, and perform multi-step tasks with greater accuracy and contextual awareness. Unlike earlier versions, GPT-5 demonstrates improved reasoning, fewer hallucinations, and a deeper understanding of organizational data when used within the secure boundaries of Microsoft’s cloud infrastructure.

    One of the key benefits of this integration is the ability for Copilot to act as a proactive assistant across the Microsoft 365 suite. In Word, users can generate well-structured drafts from simple prompts, revise tone and style, and even pull in relevant data from other documents or emails. In Excel, GPT-5 enables natural language queries to analyze data, create formulas, and suggest visualizations—democratizing data analysis for non-technical users. PowerPoint users benefit from AI-generated storyboards and slide content based on outlines or documents, significantly reducing presentation preparation time.

    Outlook and Teams see transformative upgrades as well. Copilot can now summarize lengthy email threads, draft context-aware replies, and prioritize action items—all powered by GPT-5’s enhanced comprehension. In Teams meetings, real-time transcription, intelligent note-taking, and post-meeting action item generation are now more accurate and insightful, helping teams stay aligned and productive.

    Security and compliance remain central to this rollout. Microsoft emphasizes that GPT-5 in Copilot operates within its trusted cloud environment, ensuring that organizational data is not used to train the underlying AI models. Enterprises retain full control over data access, and all interactions are subject to existing compliance policies, including GDPR, HIPAA, and other regulatory standards.

    Microsoft also highlights new customization options for IT administrators, allowing organizations to tailor Copilot’s behavior based on role, department, or business process. This ensures that AI assistance remains relevant and aligned with company workflows. Additionally, developers can now extend Copilot’s capabilities using the Microsoft 365 Copilot Extensibility Framework, integrating internal apps and data sources securely.

    User adoption is supported by intuitive design and seamless integration—users don’t need to learn new interfaces. The AI works behind the scenes, activated through familiar commands in the ribbon or via natural language prompts in the Copilot sidebar.

    Microsoft positions this GPT-5-powered Copilot as a cornerstone of the future of work, enabling users to focus on creativity, decision-making, and collaboration by automating routine tasks. Early adopters report significant gains in productivity, with some teams reducing document creation time by up to 50% and improving email response efficiency by 40%.

    Why It Matters:

    the launch of GPT-5 in Microsoft 365 Copilot represents a pivotal moment in enterprise AI, combining cutting-edge generative AI with robust security and deep application integration. As AI becomes an everyday collaborator, Microsoft aims to lead the shift toward intelligent productivity, empowering users to achieve more with less effort. The integration of GPT-5 marks a leap in AI-augmented productivity, reducing repetitive tasks and allowing users to focus on strategic work. Early adopters report time savings of up to 40% on routine tasks.

  • OpenAI is releasing GPT-5, its new flagship model, to all of its ChatGPT users and developers

    GPT-5 has officially been released as of early August 2025 and is now available to all ChatGPT users, including free, Plus, Pro, and Team accounts. OpenAI announced GPT-5 on August 7, 2025, marking it as their most advanced AI model to date, with significant improvements in intelligence, speed, reasoning, and safety over GPT-4 and earlier versions.

    Here is the key details about GPT-5 :

    • Expert-level intelligence across domains such as math, coding, science, and health.
    • Reduced hallucinations and improved truthful, honest answers.
    • A reasoning model (“GPT-5 thinking”) for harder problems, automatically invoked or selectable by users.
    • Real-time model routing for efficiency and quality of responses.
    • Enhanced capabilities for creative writing and complex software generation.
    • Integrated safety mechanisms including safe completions to balance helpfulness with risk.
    • Accessibility to all ChatGPT users, including free tier, Plus, Pro, and Teams, with extended capabilities for paid users.
    • Availability in developer tools like GitHub Copilot and Azure AI.

    GPT-5 essentially replaces previous ChatGPT models and represents a significant upgrade in real-world use, combining speed, accuracy, and safety for a wide range of users and applications.

  • Are Google’s AI Features a Threat to Web Traffic?Claims undermining SEO strategies and threatening online journalism

    Google Search chief Liz Reid defended Google’s AI features in a blog post by stating that total organic click volume from Google Search to websites remains “relatively stable” year-over-year, and claimed that AI is driving more searches and higher quality clicks to websites. She argued that AI Overviews are a natural evolution of previous Google search features and emphasized the increase in overall search queries and stable or slightly improved click quality, defined as clicks where users do not quickly return to search results.

    However, this defense contrasts with multiple independent studies and reports that show significant reductions in website traffic due to Google’s AI Overviews. Research by the Pew Research Center and others indicates that AI summaries reduce click-through rates by nearly half in some cases, decreasing external site visits from about 15% to around 8% on searches with AI Overviews. Studies from SEO firms such as Ahrefs, Amsive, and BrightEdge find click rate declines ranging from roughly 30% to over 50% depending on the query type, especially for informational, non-branded keywords. The rise of “zero-click” searches—where users get answers directly from AI summaries without visiting any site—has been noted as a major factor, with estimates that around 60% of Google searches now fall into this category. This trend has caused concern among publishers and SEO experts who report significant traffic drops and threats to online content monetization.

    Google disputes the methodologies and conclusions of these external studies, arguing that their internal data shows overall stable or slightly improved click volumes and that some sites are indeed losing traffic while others gain. However, Google has not publicly released detailed data to substantiate all of these claims, leading to ongoing debate about the true impact of AI features on web traffic.

    While Liz Reid asserts that total organic clicks remain stable year-over-year despite AI integration, independent research and publisher reports overwhelmingly show that Google’s AI features—particularly AI Overviews—have caused significant reductions in website traffic and click-through rates, especially for informational and non-branded queries.

  • OpenAI GPT OSS, the new open-source model designed for efficient on-device use and local inference

    OpenAI has released an open-weight model called gpt-oss-20b, a medium-sized model with about 21 billion parameters designed for efficient on-device use and local inference. It operates with a Mixture-of-Experts (MoE) architecture, having 32 experts but activating 4 per token, resulting in 3.6 billion active parameters during each forward pass. This design grants strong reasoning and tool-use capabilities with relatively low memory requirements — it can run on systems with as little as 16GB of RAM. The model supports up to 128k tokens of context length, enabling it to handle very long inputs.

    “gpt-oss-20b” achieves performance comparable to OpenAI’s o3-mini model across common benchmarks, including reasoning, coding, and function calling tasks. It leverages modern architectural features such as Pre-LayerNorm for training stability, Gated SwiGLU activations, and Grouped Query Attention for faster inference. This model is intended to provide strong real-world performance while being accessible for consumer hardware deployments. Both gpt-oss-20b and the larger gpt-oss-120b (117B parameters) models are released under the Apache 2.0 license, aiming to foster transparency, accessibility, and efficient usage by developers and researchers.

    In summary:

    • Parameters: ~21 billion total, 3.6 billion active per token
    • Experts: 32 total, 4 active per token (Mixture-of-Experts)
    • Context length: 128k tokens
    • Runs with as little as 16GB memory
    • Performance matches o3-mini benchmarks, strong at coding, reasoning, few-shot function calling
    • Released open-weight under Apache 2.0 license for broad developer access

    This model is a step toward more accessible powerful reasoning AI that can run efficiently on local or edge devices. Follow the link

  • Google DeepMind just launched Genie 3 that can generate detailed, interactive 3D environments from a simple text prompt or image

    Google DeepMind has just launched Genie 3, an advanced AI “world model” that can generate detailed, interactive 3D environments from a simple text prompt or image. Unlike its predecessor Genie 2, Genie 3 allows real-time exploration and modification of these worlds. Users can change objects, weather, or add characters dynamically—referred to as “promptable world events.” The environments maintain visual consistency over time, remembering the placement of objects for up to about a minute, and run at 720p resolution and 24 frames per second.

    Genie 3 is positioned as a significant step toward artificial general intelligence (AGI) by providing complex, realistic interactive worlds that can train AI agents. This model does not rely on hard-coded physics but learns how the world works by remembering and reasoning about what it generates. It supports longer interactions than Genie 2—several minutes versus just 10-20 seconds—and enables AI agents and humans to move around and interact in these simulated worlds in real time.

    Google DeepMind is currently releasing Genie 3 as a limited research preview to select academics and creators to study its risks and safety before wider access. It is not yet publicly available for general use. It is a breakthrough world model that creates immersive, interactive 3D environments useful both for gaming-type experiences and advancing AI research toward human-level intelligence.

    Genie 3’s key technical differences that enable it to modify worlds dynamically on the fly include several innovations over previous models:

    1. Frame-by-frame Real-time Generation at 24 FPS and 720p resolution: Genie 3 generates the environment live and continuously, allowing seamless, game-like interaction that feels immediate and natural.
    2. Persistent World Memory: The model retains a “long-term visual memory” of the environment for several minutes, enabling the world to keep consistent state and the effects of user actions (e.g., painted walls stay painted even after moving away and returning) without re-generating from scratch.
    3. Promptable World Events: Genie 3 supports dynamic insertion and alteration of elements in the generated world during real-time interaction via text prompts—for example, adding characters, changing weather, or introducing new objects on the fly. This is a major advancement over earlier systems that required pre-generated or less flexible environments.
    4. More Sophisticated Physical and Ecological Modeling: The system models environments with realistic physical behaviors like water flow, lighting changes, and ecological dynamics, allowing more natural interactions and consistent environment evolution.
    5. Real-time Response to User Actions: Unlike Genie 2, which processed user inputs with lag and limited real-time interaction, Genie 3 swiftly integrates user controls and environmental modifications frame by frame, resulting in highly responsive navigation and modification capabilities.
    6. Underlying Architecture Improvements: While details are proprietary, Genie 3 leverages advances from over a decade of DeepMind’s research in simulated environments and world models, emphasizing multi-layered memory systems and inference mechanisms to maintain coherence and enable prompt-grounded modification of the simulation in real time.

    Together, these technologies allow Genie 3 to generate, sustain, and modify richly detailed simulated worlds interactively, making it suitable for both immersive gaming experiences and as a robust platform for training advanced AI agents in complex, dynamic scenarios.

  • Apple’s “Answers, Knowledge and Information” (AKI) team is developing a stripped down ChatGPT experience

    Apple has formed a new internal team called “Answers, Knowledge and Information” (AKI) that is developing a stripped-down ChatGPT-like AI experience. This team is building an “answer engine” designed to crawl the web and respond to general-knowledge questions, effectively creating a lightweight competitor to ChatGPT. The goal is to integrate this AI-powered search capability into Apple’s products such as Siri, Spotlight, and Safari, and also potentially as a standalone app.

    This marks a shift from Apple’s previous approach, where Siri relied on third-party AI like ChatGPT via partnerships, resulting in a somewhat fragmented user experience. The new Apple-built system aims to deliver more direct, accurate answers rather than defaulting frequently to Google Search results, improving usability especially on devices without screens like HomePod. The team, led by senior director Robby Walker, is actively hiring engineers experienced in search algorithms and engine development to accelerate this project.

    Apple CEO Tim Cook has emphasized the importance of AI, considering it a major technological revolution comparable to the internet and smartphones, and is backing the investment in this AI initiative accordingly. While the project is still in early stages, it represents Apple’s growing commitment to developing its own conversational AI and search capabilities rather than relying heavily on external partnerships.

    Apple’s “Answers” team is creating a streamlined ChatGPT rival focused on delivering web-based, AI-driven answers within Apple’s ecosystem, intending to enhance Siri and other services with conversational AI search.