Category: AI Related

  • Fellou CE (Concept Edition): The Agentic Browser Redefines Web Interaction (executes tasks, automates workflows, and conducts deep research on behalf of users)

    On August 11, 2025, Fellou, a Silicon Valley-based startup, announced the upcoming launch of Fellou CE (Concept Edition), the world’s first agentic AI browser, set to transform how users interact with the internet. Unlike traditional browsers like Chrome or Safari, Fellou doesn’t just display web content—it actively executes tasks, automates workflows, and conducts deep research on behalf of users. With over 1 million users since its 2025 debut, Fellou is redefining browsing as a proactive, AI-driven experience, positioning itself as a digital partner for professionals, researchers, and creators.

    Fellou’s standout feature, Deep Action, enables the browser to interpret natural language commands and perform complex, multi-step tasks autonomously. For example, users can instruct Fellou to “find the cheapest flights from New York to London and book them” or “draft a LinkedIn article on AI trends.” The browser navigates websites, fills forms, and completes actions without user intervention, leveraging its Eko framework to integrate with platforms like GitHub, LinkedIn, and Notion. This capability, tested successfully in creating private GitHub repositories in under three minutes, showcases Fellou’s ability to handle real-world tasks efficiently.

    The browser’s Deep Search feature conducts parallel searches across public and login-required platforms like X, Reddit, and Quora, generating comprehensive, traceable reports in minutes. For instance, a market analyst can request a report on 2025 EdTech startups, and Fellou will compile funding details, investor data, and market trends from multiple sources, saving hours of manual research. Its Agentic Memory learns from user behavior, refining suggestions and streamlining tasks over time. This adaptive intelligence, combined with a shadow workspace that runs tasks in the background, ensures users can multitask without disruption.

    Fellou prioritizes privacy, processing data locally with AES-256 encryption and deleting cloud-processed data post-task. Its Agent Studio, a marketplace for custom AI agents, fosters a developer ecosystem where users can create or access tailored workflows using natural language. Currently available for Windows and macOS (with Linux and mobile versions in development), Fellou operates a freemium model, offering free access during its Early Adopter Program and planned premium tiers for advanced features.

    Posts on X highlight enthusiasm for Fellou’s potential to “make Chrome look ancient,” with users praising its hands-free automation and report quality. However, its beta phase may involve bugs, and advanced commands require a learning curve. Compared to rivals like Perplexity’s Comet, Fellou’s 5.2x faster task completion (3.7 minutes vs. 11–18 minutes) and context-aware automation set it apart. Co-founded by Yang Xie, a 2021 Forbes U30 Asia honoree, Fellou is poised to lead the agentic browser revolution, empowering users to focus on creativity while AI handles the web’s grunt work.

  • GitHub CEO Thomas Dohmke: “Embrace AI or Leave the Profession”. A clear warning that AI is reshaping software development

    GitHub CEO Thomas Dohmke has issued a strong warning to software developers: they must embrace artificial intelligence (AI) or leave the profession. His message reflects how AI is reshaping software development, transforming developers from traditional coders into “AI managers” or “creative directors of code” who guide, prompt, and review AI-generated code rather than manually writing every line themselves.

    Dohmke’s stance is based on an in-depth study by GitHub involving 22 developers who already extensively use AI tools. He predicts that AI could write up to 90% of all code within the next two to five years, making AI proficiency essential for career survival in software engineering. Developers who adapt are shifting to higher-level roles involving system architecture, critical review of AI output, quality control, and prompt engineering. Those who resist this transformation risk becoming obsolete or forced to leave the field.

    • Next 5 years: AI tools may automate 90% of coding
    • By 2030: 90% automation predicted, with developers urged to upskill amid ethical and competitive challenges

    This evolution entails a fundamental reinvention of the developer role: from manual coding to managing AI systems and focusing on complex design and problem-solving tasks. Dohmke emphasizes that developers should not see AI as a threat but as a collaborative partner that enhances productivity and creativity.

    GitHub’s CEO frames AI adoption not merely as a technological shift but as a critical career imperative, urging the developer community to embrace AI-driven workflows or face obsolescence.

  • Apple’s LLM Technology Boosts Prediction Speed. What is “multi-token prediction” (MTP) framework?

    Apple’s innovation in large language models centers on a “multi-token prediction” (MTP) framework, which enables models to predict multiple tokens simultaneously rather than generating text one token at a time as in traditional autoregressive models. This approach improves inference speed significantly, with reported speedups of 2–3× on general tasks and up to 5× in more predictable domains like coding and math, while maintaining output quality.

    The core of Apple’s MTP framework involves inserting special “mask” tokens into the input prompts. These placeholders allow the model to speculate on several upcoming tokens at once. Each predicted token sequence is then immediately verified against what standard sequential decoding would produce, reverting to single-token prediction if needed to ensure accuracy. This leads to faster text generation without degrading quality, thanks to techniques such as a “gated LoRA adaptation” that balances speculation and verification.

    In training, Apple’s method augments input sequences by appending multiple mask tokens corresponding to future tokens to be predicted. The model learns to output these future tokens jointly while preserving its ability to predict the next token normally. This involves a carefully designed attention mechanism that supports parallel prediction while maintaining autoregressive properties. The training process parallelizes what would otherwise be sequential queries, improving training efficiency and improving the model’s ability to “think ahead” beyond the immediate next token.

    This innovation addresses the inherent bottleneck in traditional autoregressive models, which generate text sequentially, limiting speed and efficiency. By enabling multi-token simultaneous prediction, Apple’s research unlocks latent multi-token knowledge implicitly present in autoregressive models, essentially teaching them to anticipate multiple future words at once, much like human language planning.

    Overall, Apple’s multi-token prediction framework represents a significant advancement in AI language model inference, promising faster, more efficient generation without sacrificing accuracy—key for real-world applications like chatbots and coding assistants.

  • OpenAI gives $1M+ bonuses to 1,000 employees amid talent war

    OpenAI gave special multimillion-dollar bonuses exceeding $1 million to about 1,000 employees on August 7, 2025, as part of its strategy amid intense competition for AI talent. This move came just hours after launching a major product, reflecting the high stakes in the ongoing talent war to secure and retain top AI researchers and engineers.

    In the broader context, this talent war in AI includes massive compensation packages from leading AI and tech companies like Google DeepMind, Meta, and Microsoft, with top researchers receiving offers that can reach tens of millions of dollars annually. OpenAI’s bonuses and compensation packages form part of this competitive landscape, where retaining specialized AI talent is critical due to their immense impact on innovation and company success.

    The median total compensation for OpenAI engineers ranges widely, with some senior engineers earning in excess of $1 million annually, and top researchers receiving over $10 million per year when including stock and bonuses. The $1M+ bonuses to roughly 1,000 employees signify a large-scale, strategic investment by OpenAI to maintain its leadership and workforce stability amid fierce recruiting battles in AI development.

    These large bonuses are a strategic investment by OpenAI reflecting the high stakes in the AI talent war and their transition to a for-profit model allowing more flexible, lucrative employee compensation.

  • Microsoft Word can now read you document overviews like podcasts

    Microsoft Word, integrated with Microsoft 365 Copilot, now offers a feature that can generate audio overviews of documents that you can listen to like podcasts. This tool produces smart, summarized narrations of Word documents, PDFs, or Teams meeting recordings stored in OneDrive. Users can customize the listening experience with playback controls such as speed adjustment, jumping forward/backward, pausing, and saving the audio to OneDrive for later or sharing.

    There are two styles available for the audio overviews:

    • Summary Style: A single AI voice provides a clear, quick summary of the main points.
    • Podcast Style: Two AI voices (male and female, with neutral American accents) engage in a conversational discussion about the document’s content, creating a dynamic, story-like podcast feel.

    This feature is currently available only in English and requires a Microsoft 365 Copilot license. It works on documents stored online in OneDrive or SharePoint but doesn’t support local files. Generation time is typically a few minutes, even for large documents.

    To use it, open a document in Word on Windows or the web, click the Copilot button on the Home tab, and ask the AI to generate an audio overview. The resulting audio has a media player embedded with controls, and you can switch between summary and podcast styles.

    This audio overview feature enhances productivity by allowing users to absorb key document insights hands-free, useful for multitasking or on the move.

  • ChatGPT is bringing back GPT-4o!

    OpenAI is bringing back GPT-4o as an option for ChatGPT Plus users after users expressed strong dissatisfaction with its removal and the transition to GPT-5. GPT-4o will no longer be the default model, but paid users can choose to continue using it. OpenAI CEO Sam Altman confirmed this reinstatement on social media, acknowledging the user feedback and stating they will monitor usage to decide how long to keep legacy models available.

    GPT-4o is a multimodal AI model capable of handling text, audio, and images with faster responses (twice as fast as GPT-4 Turbo), enhanced language support (over 50 languages), and advanced multimodal interaction features, including real-time voice and image understanding and generation. Users appreciated GPT-4o for its personable, nuanced, and emotionally supportive responses, which some found missing in GPT-5.

    The return of GPT-4o responds to a significant user backlash expressed in communities like Reddit, where users described losing GPT-4o as “losing a close friend,” highlighting its unique voice and interaction style compared to GPT-5. OpenAI had initially removed the model selection feature in ChatGPT, replacing older versions directly with GPT-5, which caused confusion and dissatisfaction. Now, legacy models like GPT-4o will remain accessible for a time, allowing users to switch between GPT-5 and older versions based on preference and task requirements.

    Read Sam Altman X 

  • Graph RAG vs Naive RAG vs hybrid of both

    Retrieval Augmented Generation (RAG) is a widely adopted technique that enhances large language models (LLMs) by retrieving relevant information from a specific dataset before generating a response. While traditional or “Naive RAG” relies on vector (semantic) search to find contextually similar text chunks, it treats each data point as independent and does not capture deeper relationships between entities. This limitation becomes apparent when working with interconnected data, such as contracts, organizational records, or research papers, where understanding relationships is crucial. To address this, Graph RAG has emerged as a powerful extension that leverages knowledge graphs to improve retrieval quality by incorporating structural and relational context.

    Graph RAG, particularly Microsoft’s implementation, uses LLMs to extract entities (e.g., people, organizations, locations) and their relationships from raw text in a two-stage process. First, entities and relations are identified and stored in a knowledge graph. Then, these are summarized and organized into communities—clusters of densely connected nodes—using graph algorithms like Leiden. This enables the system to generate high-level, domain-specific summaries of entity groups, providing a more holistic view of complex, fragmented information.

    A key advantage of Graph RAG over Naive RAG is its ability to perform entity-centric retrieval. Instead of retrieving isolated text chunks, it navigates the graph to find related entities, their attributes, and community-level insights. This is especially effective for detailed, entity-focused queries, such as “What are the business relationships of Company X?” or “Which individuals are linked to Project Y?”

    The blog illustrates this with a hybrid approach combining Weaviate (a vector database) and Neo4j (a graph database). In this setup, a user query first triggers a semantic search in Weaviate to identify relevant entities. Their IDs are then used to traverse the Neo4j knowledge graph, uncovering connections, community summaries, and contextual text chunks. A Cypher query orchestrates this multi-source retrieval, merging entity descriptions, relationships, and source content into a comprehensive response.

    For example, querying “Weaviate” returns not just isolated mentions but a synthesized answer detailing its legal status, locations, business activities, and partnerships—information pieced together from multiple contracts and relationships in the graph.

    Despite its strengths, Graph RAG has limitations. The preprocessing pipeline is computationally expensive and requires full reindexing to update summaries when new data arrives, unlike Naive RAG, which can incrementally add new chunks. Scalability can also be challenging with highly connected nodes, and generic entities (e.g., “CEO”) may skew results if not filtered.

    In summary, while Naive RAG is effective for straightforward, content-based queries, Graph RAG excels in complex, relationship-rich domains. By combining vector and graph-based retrieval in a hybrid system, organizations can achieve deeper insights, leveraging both semantic meaning and structural intelligence. The choice between RAG methods ultimately depends on the nature of the data and the complexity of the questions being asked.

    Source link

  • a new active learning method from Google for curating high-quality data that reduces training data requirements for fine-tuning LLMs by orders of magnitude

    Google researchers Markus Krause and Nancy Chang present a novel active learning approach that reduces the training data required to fine-tune large language models (LLMs) by up to 10,000 times (four orders of magnitude), while significantly improving model alignment with human experts. This breakthrough addresses the challenge of curating high-quality, high-fidelity training data for complex tasks like identifying unsafe ad content—such as clickbait—where contextual understanding and policy interpretation are critical.

    Fine-tuning LLMs traditionally demands vast labeled datasets, which are costly and time-consuming to produce, especially when policies evolve or new content types emerge (concept drift). Standard methods using crowdsourced labels often lack the nuance required for safety-critical domains, leading to suboptimal model performance. To overcome this, Google developed a scalable curation process that prioritizes the most informative and diverse training examples, minimizing data needs while maximizing model alignment with domain experts.

    The method begins with a zero- or few-shot LLM (LLM-0) that preliminarily labels a large set of ads as either clickbait or benign. Due to the rarity of policy-violating content, the dataset is highly imbalanced. The labeled examples are then clustered separately by predicted label. Overlapping clusters—where similar examples receive different labels—highlight regions of model uncertainty along the decision boundary. From these overlapping clusters, the system identifies pairs of similar examples with differing labels and sends them to human experts for high-fidelity annotation. To manage annotation costs, priority is given to pairs that span broader regions of the data space, ensuring diversity.

    These expert-labeled examples are split into two sets: one for fine-tuning the next iteration of the model, and another for evaluating model–human alignment. The process iterates, with each new model version improving its ability to distinguish subtle differences in content. Iterations continue until model–human alignment plateaus or matches internal expert agreement.

    Crucially, the approach does not rely on traditional metrics like precision or recall, which assume a single “ground truth.” Instead, it uses Cohen’s Kappa, a statistical measure of inter-annotator agreement that accounts for chance. Kappa values above 0.8 indicate exceptional alignment, and this serves as both a data quality benchmark and a performance metric.

    Experiments compared models trained on ~100,000 crowdsourced labels (baseline) versus those trained on expert-curated data using the new method. Two LLMs—Gemini Nano-1 (1.8B parameters) and Nano-2 (3.25B)—were tested on tasks of varying complexity. While smaller models showed limited gains, the 3.25B model achieved a 55–65% improvement in Kappa alignment using only 250–450 expert-labeled examples—three orders of magnitude fewer than the baseline. In production with larger models, reductions reached 10,000x.

    The results demonstrate that high-fidelity labeling, combined with intelligent data curation, allows models to achieve superior performance with minimal data. This is especially valuable for dynamic domains like ad safety, where rapid retraining is essential. The method effectively combines the broad coverage of LLMs with the precision of human experts, offering a path to overcome the data bottleneck in LLM fine-tuning.

    Source link

  • Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning

    Claude Opus 4.1 is an upgrade to Claude Opus 4 that significantly enhances performance on agentic tasks, real-world coding, and complex reasoning. It features a large 200,000 token context window, improved long-term memory support, and advanced capabilities in multi-file code refactoring, debugging, and sustained reasoning over long problem-solving sequences. The model scores 74.5% on the SWE-bench Verified benchmark for software engineering tasks, outperforming versions like GPT-4.1 and OpenAI’s GPT-4o, demonstrating strong autonomy and precision in tasks such as agentic search, multi-step task management, and detailed data analysis.

    Claude Opus 4.1 offers hybrid reasoning allowing both instant and extended step-by-step thinking with user-controllable “thinking budgets” to optimize cost and performance. Key improvements include better memory and context management, more stable tool usage, lower latency, stronger coherence over long conversations, and enhanced ability to adapt to coding style. It supports up to 32,000 output tokens, making it suitable for complex, large-scale coding projects and enterprise autonomous workflows.

    Use cases span AI agents managing multi-channel tasks, advanced coding with deep codebase understanding, agentic search synthesizing insights from vast data sources, and high-quality content creation with rich prose and character. It is available to paid Claude users, in Claude Code, and via API on platforms like Amazon Bedrock and Google Cloud Vertex AI with pricing consistent with Opus 4.

    Organizations such as GitHub have noted its improved multi-file refactoring, Rakuten appreciates its precise debugging without unnecessary changes, and Windsurf reports a one standard deviation performance gain over Opus 4 for junior developer tasks. The upgrade embodies a focused refinement on reliability, contextual reasoning, and autonomy, making it particularly valuable for advanced engineering, AI agent deployment, and research workflows.

  • Microsoft rolls out GPT-5 across entire Copilot ecosystem

    As of August 7, 2025, Microsoft has officially launched GPT-5 integration into Microsoft 365 Copilot, marking a significant advancement in AI-powered productivity tools for businesses and individuals. This update represents a major leap forward in how users interact with everyday applications such as Word, Excel, PowerPoint, Outlook, and Teams, making workflows smarter, faster, and more intuitive.

    GPT-5, the latest iteration of OpenAI’s advanced language model, is now deeply embedded within Microsoft 365 Copilot, enhancing its ability to understand complex user requests, generate high-quality content, and perform multi-step tasks with greater accuracy and contextual awareness. Unlike earlier versions, GPT-5 demonstrates improved reasoning, fewer hallucinations, and a deeper understanding of organizational data when used within the secure boundaries of Microsoft’s cloud infrastructure.

    One of the key benefits of this integration is the ability for Copilot to act as a proactive assistant across the Microsoft 365 suite. In Word, users can generate well-structured drafts from simple prompts, revise tone and style, and even pull in relevant data from other documents or emails. In Excel, GPT-5 enables natural language queries to analyze data, create formulas, and suggest visualizations—democratizing data analysis for non-technical users. PowerPoint users benefit from AI-generated storyboards and slide content based on outlines or documents, significantly reducing presentation preparation time.

    Outlook and Teams see transformative upgrades as well. Copilot can now summarize lengthy email threads, draft context-aware replies, and prioritize action items—all powered by GPT-5’s enhanced comprehension. In Teams meetings, real-time transcription, intelligent note-taking, and post-meeting action item generation are now more accurate and insightful, helping teams stay aligned and productive.

    Security and compliance remain central to this rollout. Microsoft emphasizes that GPT-5 in Copilot operates within its trusted cloud environment, ensuring that organizational data is not used to train the underlying AI models. Enterprises retain full control over data access, and all interactions are subject to existing compliance policies, including GDPR, HIPAA, and other regulatory standards.

    Microsoft also highlights new customization options for IT administrators, allowing organizations to tailor Copilot’s behavior based on role, department, or business process. This ensures that AI assistance remains relevant and aligned with company workflows. Additionally, developers can now extend Copilot’s capabilities using the Microsoft 365 Copilot Extensibility Framework, integrating internal apps and data sources securely.

    User adoption is supported by intuitive design and seamless integration—users don’t need to learn new interfaces. The AI works behind the scenes, activated through familiar commands in the ribbon or via natural language prompts in the Copilot sidebar.

    Microsoft positions this GPT-5-powered Copilot as a cornerstone of the future of work, enabling users to focus on creativity, decision-making, and collaboration by automating routine tasks. Early adopters report significant gains in productivity, with some teams reducing document creation time by up to 50% and improving email response efficiency by 40%.

    Why It Matters:

    the launch of GPT-5 in Microsoft 365 Copilot represents a pivotal moment in enterprise AI, combining cutting-edge generative AI with robust security and deep application integration. As AI becomes an everyday collaborator, Microsoft aims to lead the shift toward intelligent productivity, empowering users to achieve more with less effort. The integration of GPT-5 marks a leap in AI-augmented productivity, reducing repetitive tasks and allowing users to focus on strategic work. Early adopters report time savings of up to 40% on routine tasks.