• Microsoft Word can now read you document overviews like podcasts

    Microsoft Word, integrated with Microsoft 365 Copilot, now offers a feature that can generate audio overviews of documents that you can listen to like podcasts. This tool produces smart, summarized narrations of Word documents, PDFs, or Teams meeting recordings stored in OneDrive. Users can customize the listening experience with playback controls such as speed adjustment, jumping forward/backward, pausing, and saving the audio to OneDrive for later or sharing.

    There are two styles available for the audio overviews:

    • Summary Style: A single AI voice provides a clear, quick summary of the main points.
    • Podcast Style: Two AI voices (male and female, with neutral American accents) engage in a conversational discussion about the document’s content, creating a dynamic, story-like podcast feel.

    This feature is currently available only in English and requires a Microsoft 365 Copilot license. It works on documents stored online in OneDrive or SharePoint but doesn’t support local files. Generation time is typically a few minutes, even for large documents.

    To use it, open a document in Word on Windows or the web, click the Copilot button on the Home tab, and ask the AI to generate an audio overview. The resulting audio has a media player embedded with controls, and you can switch between summary and podcast styles.

    This audio overview feature enhances productivity by allowing users to absorb key document insights hands-free, useful for multitasking or on the move.

  • ChatGPT is bringing back GPT-4o!

    OpenAI is bringing back GPT-4o as an option for ChatGPT Plus users after users expressed strong dissatisfaction with its removal and the transition to GPT-5. GPT-4o will no longer be the default model, but paid users can choose to continue using it. OpenAI CEO Sam Altman confirmed this reinstatement on social media, acknowledging the user feedback and stating they will monitor usage to decide how long to keep legacy models available.

    GPT-4o is a multimodal AI model capable of handling text, audio, and images with faster responses (twice as fast as GPT-4 Turbo), enhanced language support (over 50 languages), and advanced multimodal interaction features, including real-time voice and image understanding and generation. Users appreciated GPT-4o for its personable, nuanced, and emotionally supportive responses, which some found missing in GPT-5.

    The return of GPT-4o responds to a significant user backlash expressed in communities like Reddit, where users described losing GPT-4o as “losing a close friend,” highlighting its unique voice and interaction style compared to GPT-5. OpenAI had initially removed the model selection feature in ChatGPT, replacing older versions directly with GPT-5, which caused confusion and dissatisfaction. Now, legacy models like GPT-4o will remain accessible for a time, allowing users to switch between GPT-5 and older versions based on preference and task requirements.

    Read Sam Altman X 

  • Graph RAG vs Naive RAG vs hybrid of both

    Retrieval Augmented Generation (RAG) is a widely adopted technique that enhances large language models (LLMs) by retrieving relevant information from a specific dataset before generating a response. While traditional or “Naive RAG” relies on vector (semantic) search to find contextually similar text chunks, it treats each data point as independent and does not capture deeper relationships between entities. This limitation becomes apparent when working with interconnected data, such as contracts, organizational records, or research papers, where understanding relationships is crucial. To address this, Graph RAG has emerged as a powerful extension that leverages knowledge graphs to improve retrieval quality by incorporating structural and relational context.

    Graph RAG, particularly Microsoft’s implementation, uses LLMs to extract entities (e.g., people, organizations, locations) and their relationships from raw text in a two-stage process. First, entities and relations are identified and stored in a knowledge graph. Then, these are summarized and organized into communities—clusters of densely connected nodes—using graph algorithms like Leiden. This enables the system to generate high-level, domain-specific summaries of entity groups, providing a more holistic view of complex, fragmented information.

    A key advantage of Graph RAG over Naive RAG is its ability to perform entity-centric retrieval. Instead of retrieving isolated text chunks, it navigates the graph to find related entities, their attributes, and community-level insights. This is especially effective for detailed, entity-focused queries, such as “What are the business relationships of Company X?” or “Which individuals are linked to Project Y?”

    The blog illustrates this with a hybrid approach combining Weaviate (a vector database) and Neo4j (a graph database). In this setup, a user query first triggers a semantic search in Weaviate to identify relevant entities. Their IDs are then used to traverse the Neo4j knowledge graph, uncovering connections, community summaries, and contextual text chunks. A Cypher query orchestrates this multi-source retrieval, merging entity descriptions, relationships, and source content into a comprehensive response.

    For example, querying “Weaviate” returns not just isolated mentions but a synthesized answer detailing its legal status, locations, business activities, and partnerships—information pieced together from multiple contracts and relationships in the graph.

    Despite its strengths, Graph RAG has limitations. The preprocessing pipeline is computationally expensive and requires full reindexing to update summaries when new data arrives, unlike Naive RAG, which can incrementally add new chunks. Scalability can also be challenging with highly connected nodes, and generic entities (e.g., “CEO”) may skew results if not filtered.

    In summary, while Naive RAG is effective for straightforward, content-based queries, Graph RAG excels in complex, relationship-rich domains. By combining vector and graph-based retrieval in a hybrid system, organizations can achieve deeper insights, leveraging both semantic meaning and structural intelligence. The choice between RAG methods ultimately depends on the nature of the data and the complexity of the questions being asked.

    Source link

  • a new active learning method from Google for curating high-quality data that reduces training data requirements for fine-tuning LLMs by orders of magnitude

    Google researchers Markus Krause and Nancy Chang present a novel active learning approach that reduces the training data required to fine-tune large language models (LLMs) by up to 10,000 times (four orders of magnitude), while significantly improving model alignment with human experts. This breakthrough addresses the challenge of curating high-quality, high-fidelity training data for complex tasks like identifying unsafe ad content—such as clickbait—where contextual understanding and policy interpretation are critical.

    Fine-tuning LLMs traditionally demands vast labeled datasets, which are costly and time-consuming to produce, especially when policies evolve or new content types emerge (concept drift). Standard methods using crowdsourced labels often lack the nuance required for safety-critical domains, leading to suboptimal model performance. To overcome this, Google developed a scalable curation process that prioritizes the most informative and diverse training examples, minimizing data needs while maximizing model alignment with domain experts.

    The method begins with a zero- or few-shot LLM (LLM-0) that preliminarily labels a large set of ads as either clickbait or benign. Due to the rarity of policy-violating content, the dataset is highly imbalanced. The labeled examples are then clustered separately by predicted label. Overlapping clusters—where similar examples receive different labels—highlight regions of model uncertainty along the decision boundary. From these overlapping clusters, the system identifies pairs of similar examples with differing labels and sends them to human experts for high-fidelity annotation. To manage annotation costs, priority is given to pairs that span broader regions of the data space, ensuring diversity.

    These expert-labeled examples are split into two sets: one for fine-tuning the next iteration of the model, and another for evaluating model–human alignment. The process iterates, with each new model version improving its ability to distinguish subtle differences in content. Iterations continue until model–human alignment plateaus or matches internal expert agreement.

    Crucially, the approach does not rely on traditional metrics like precision or recall, which assume a single “ground truth.” Instead, it uses Cohen’s Kappa, a statistical measure of inter-annotator agreement that accounts for chance. Kappa values above 0.8 indicate exceptional alignment, and this serves as both a data quality benchmark and a performance metric.

    Experiments compared models trained on ~100,000 crowdsourced labels (baseline) versus those trained on expert-curated data using the new method. Two LLMs—Gemini Nano-1 (1.8B parameters) and Nano-2 (3.25B)—were tested on tasks of varying complexity. While smaller models showed limited gains, the 3.25B model achieved a 55–65% improvement in Kappa alignment using only 250–450 expert-labeled examples—three orders of magnitude fewer than the baseline. In production with larger models, reductions reached 10,000x.

    The results demonstrate that high-fidelity labeling, combined with intelligent data curation, allows models to achieve superior performance with minimal data. This is especially valuable for dynamic domains like ad safety, where rapid retraining is essential. The method effectively combines the broad coverage of LLMs with the precision of human experts, offering a path to overcome the data bottleneck in LLM fine-tuning.

    Source link

  • Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning

    Claude Opus 4.1 is an upgrade to Claude Opus 4 that significantly enhances performance on agentic tasks, real-world coding, and complex reasoning. It features a large 200,000 token context window, improved long-term memory support, and advanced capabilities in multi-file code refactoring, debugging, and sustained reasoning over long problem-solving sequences. The model scores 74.5% on the SWE-bench Verified benchmark for software engineering tasks, outperforming versions like GPT-4.1 and OpenAI’s GPT-4o, demonstrating strong autonomy and precision in tasks such as agentic search, multi-step task management, and detailed data analysis.

    Claude Opus 4.1 offers hybrid reasoning allowing both instant and extended step-by-step thinking with user-controllable “thinking budgets” to optimize cost and performance. Key improvements include better memory and context management, more stable tool usage, lower latency, stronger coherence over long conversations, and enhanced ability to adapt to coding style. It supports up to 32,000 output tokens, making it suitable for complex, large-scale coding projects and enterprise autonomous workflows.

    Use cases span AI agents managing multi-channel tasks, advanced coding with deep codebase understanding, agentic search synthesizing insights from vast data sources, and high-quality content creation with rich prose and character. It is available to paid Claude users, in Claude Code, and via API on platforms like Amazon Bedrock and Google Cloud Vertex AI with pricing consistent with Opus 4.

    Organizations such as GitHub have noted its improved multi-file refactoring, Rakuten appreciates its precise debugging without unnecessary changes, and Windsurf reports a one standard deviation performance gain over Opus 4 for junior developer tasks. The upgrade embodies a focused refinement on reliability, contextual reasoning, and autonomy, making it particularly valuable for advanced engineering, AI agent deployment, and research workflows.

  • Microsoft rolls out GPT-5 across entire Copilot ecosystem

    As of August 7, 2025, Microsoft has officially launched GPT-5 integration into Microsoft 365 Copilot, marking a significant advancement in AI-powered productivity tools for businesses and individuals. This update represents a major leap forward in how users interact with everyday applications such as Word, Excel, PowerPoint, Outlook, and Teams, making workflows smarter, faster, and more intuitive.

    GPT-5, the latest iteration of OpenAI’s advanced language model, is now deeply embedded within Microsoft 365 Copilot, enhancing its ability to understand complex user requests, generate high-quality content, and perform multi-step tasks with greater accuracy and contextual awareness. Unlike earlier versions, GPT-5 demonstrates improved reasoning, fewer hallucinations, and a deeper understanding of organizational data when used within the secure boundaries of Microsoft’s cloud infrastructure.

    One of the key benefits of this integration is the ability for Copilot to act as a proactive assistant across the Microsoft 365 suite. In Word, users can generate well-structured drafts from simple prompts, revise tone and style, and even pull in relevant data from other documents or emails. In Excel, GPT-5 enables natural language queries to analyze data, create formulas, and suggest visualizations—democratizing data analysis for non-technical users. PowerPoint users benefit from AI-generated storyboards and slide content based on outlines or documents, significantly reducing presentation preparation time.

    Outlook and Teams see transformative upgrades as well. Copilot can now summarize lengthy email threads, draft context-aware replies, and prioritize action items—all powered by GPT-5’s enhanced comprehension. In Teams meetings, real-time transcription, intelligent note-taking, and post-meeting action item generation are now more accurate and insightful, helping teams stay aligned and productive.

    Security and compliance remain central to this rollout. Microsoft emphasizes that GPT-5 in Copilot operates within its trusted cloud environment, ensuring that organizational data is not used to train the underlying AI models. Enterprises retain full control over data access, and all interactions are subject to existing compliance policies, including GDPR, HIPAA, and other regulatory standards.

    Microsoft also highlights new customization options for IT administrators, allowing organizations to tailor Copilot’s behavior based on role, department, or business process. This ensures that AI assistance remains relevant and aligned with company workflows. Additionally, developers can now extend Copilot’s capabilities using the Microsoft 365 Copilot Extensibility Framework, integrating internal apps and data sources securely.

    User adoption is supported by intuitive design and seamless integration—users don’t need to learn new interfaces. The AI works behind the scenes, activated through familiar commands in the ribbon or via natural language prompts in the Copilot sidebar.

    Microsoft positions this GPT-5-powered Copilot as a cornerstone of the future of work, enabling users to focus on creativity, decision-making, and collaboration by automating routine tasks. Early adopters report significant gains in productivity, with some teams reducing document creation time by up to 50% and improving email response efficiency by 40%.

    Why It Matters:

    the launch of GPT-5 in Microsoft 365 Copilot represents a pivotal moment in enterprise AI, combining cutting-edge generative AI with robust security and deep application integration. As AI becomes an everyday collaborator, Microsoft aims to lead the shift toward intelligent productivity, empowering users to achieve more with less effort. The integration of GPT-5 marks a leap in AI-augmented productivity, reducing repetitive tasks and allowing users to focus on strategic work. Early adopters report time savings of up to 40% on routine tasks.

  • OpenAI is releasing GPT-5, its new flagship model, to all of its ChatGPT users and developers

    GPT-5 has officially been released as of early August 2025 and is now available to all ChatGPT users, including free, Plus, Pro, and Team accounts. OpenAI announced GPT-5 on August 7, 2025, marking it as their most advanced AI model to date, with significant improvements in intelligence, speed, reasoning, and safety over GPT-4 and earlier versions.

    Here is the key details about GPT-5 :

    • Expert-level intelligence across domains such as math, coding, science, and health.
    • Reduced hallucinations and improved truthful, honest answers.
    • A reasoning model (“GPT-5 thinking”) for harder problems, automatically invoked or selectable by users.
    • Real-time model routing for efficiency and quality of responses.
    • Enhanced capabilities for creative writing and complex software generation.
    • Integrated safety mechanisms including safe completions to balance helpfulness with risk.
    • Accessibility to all ChatGPT users, including free tier, Plus, Pro, and Teams, with extended capabilities for paid users.
    • Availability in developer tools like GitHub Copilot and Azure AI.

    GPT-5 essentially replaces previous ChatGPT models and represents a significant upgrade in real-world use, combining speed, accuracy, and safety for a wide range of users and applications.

  • Apple partners with Samsung for revolutionary chip production

    Apple has partnered with Samsung to produce next-generation chips, specifically advanced image sensors, for upcoming iPhones. These chips will be manufactured at Samsung’s semiconductor fab in Austin, Texas, using a new and innovative chipmaking technology that has never been used before anywhere else in the world. This collaboration is part of Apple’s broader $100 billion expansion under its American Manufacturing Program to bolster domestic supply chains and technology development. The technology involves a specialized hybrid bonding process for vertically stacking wafers, aimed at enhancing power efficiency and performance in Apple products, including the iPhone 18 lineup expected next year. This marks a significant revival of Apple-Samsung semiconductor cooperation, which had been dormant since their past legal disputes.

    Here is the key details include:

    • Samsung’s System LSI division designing the custom image sensors.
    • Manufacturing to occur at the Austin, Texas plant.
    • The technology is expected to optimize power consumption and performance of iPhone devices globally.
    • This partnership helps Apple diversify its supply chain, reducing reliance on previous dominant suppliers like Sony.
    • The move aligns with U.S. efforts to reshore semiconductor manufacturing amid geopolitical and trade challenges.

    This agreement is an important strategic win for Samsung, which has faced losses in its logic chip business, while for Apple it represents a major investment in U.S.-based chip production and technological innovation.

  • Are Google’s AI Features a Threat to Web Traffic?Claims undermining SEO strategies and threatening online journalism

    Google Search chief Liz Reid defended Google’s AI features in a blog post by stating that total organic click volume from Google Search to websites remains “relatively stable” year-over-year, and claimed that AI is driving more searches and higher quality clicks to websites. She argued that AI Overviews are a natural evolution of previous Google search features and emphasized the increase in overall search queries and stable or slightly improved click quality, defined as clicks where users do not quickly return to search results.

    However, this defense contrasts with multiple independent studies and reports that show significant reductions in website traffic due to Google’s AI Overviews. Research by the Pew Research Center and others indicates that AI summaries reduce click-through rates by nearly half in some cases, decreasing external site visits from about 15% to around 8% on searches with AI Overviews. Studies from SEO firms such as Ahrefs, Amsive, and BrightEdge find click rate declines ranging from roughly 30% to over 50% depending on the query type, especially for informational, non-branded keywords. The rise of “zero-click” searches—where users get answers directly from AI summaries without visiting any site—has been noted as a major factor, with estimates that around 60% of Google searches now fall into this category. This trend has caused concern among publishers and SEO experts who report significant traffic drops and threats to online content monetization.

    Google disputes the methodologies and conclusions of these external studies, arguing that their internal data shows overall stable or slightly improved click volumes and that some sites are indeed losing traffic while others gain. However, Google has not publicly released detailed data to substantiate all of these claims, leading to ongoing debate about the true impact of AI features on web traffic.

    While Liz Reid asserts that total organic clicks remain stable year-over-year despite AI integration, independent research and publisher reports overwhelmingly show that Google’s AI features—particularly AI Overviews—have caused significant reductions in website traffic and click-through rates, especially for informational and non-branded queries.

  • OpenAI GPT OSS, the new open-source model designed for efficient on-device use and local inference

    OpenAI has released an open-weight model called gpt-oss-20b, a medium-sized model with about 21 billion parameters designed for efficient on-device use and local inference. It operates with a Mixture-of-Experts (MoE) architecture, having 32 experts but activating 4 per token, resulting in 3.6 billion active parameters during each forward pass. This design grants strong reasoning and tool-use capabilities with relatively low memory requirements — it can run on systems with as little as 16GB of RAM. The model supports up to 128k tokens of context length, enabling it to handle very long inputs.

    “gpt-oss-20b” achieves performance comparable to OpenAI’s o3-mini model across common benchmarks, including reasoning, coding, and function calling tasks. It leverages modern architectural features such as Pre-LayerNorm for training stability, Gated SwiGLU activations, and Grouped Query Attention for faster inference. This model is intended to provide strong real-world performance while being accessible for consumer hardware deployments. Both gpt-oss-20b and the larger gpt-oss-120b (117B parameters) models are released under the Apache 2.0 license, aiming to foster transparency, accessibility, and efficient usage by developers and researchers.

    In summary:

    • Parameters: ~21 billion total, 3.6 billion active per token
    • Experts: 32 total, 4 active per token (Mixture-of-Experts)
    • Context length: 128k tokens
    • Runs with as little as 16GB memory
    • Performance matches o3-mini benchmarks, strong at coding, reasoning, few-shot function calling
    • Released open-weight under Apache 2.0 license for broad developer access

    This model is a step toward more accessible powerful reasoning AI that can run efficiently on local or edge devices. Follow the link