Author: admin

  • Kombai, the first AI agent built for frontend development

    Kombai is characterized as the first AI agent purpose-built specifically for real-world frontend development tasks. It specializes in converting UX designs from sources like Figma, images, or text prompts into high-fidelity, clean frontend code such as HTML, CSS, and React components. Kombai significantly outperforms generic coding AI agents and the latest frontier models in building user interfaces from design specs. It understands your existing codebase, works inside your integrated development environment (IDE), and delivers backend-agnostic code optimized for frontend stacks and repositories.

    Key capabilities include deep-learning models tailored for frontend fidelity, specialized tools for indexing and searching frontend codebases to accurately and efficiently reuse code, and the ability to generate editable, task-optimized plans with previews before code changes. It supports projects of all sizes and complexities, from small components to entire app UIs. Importantly, Kombai does not alter database or backend logic, isolating its focus to frontend development.

    For enterprise customers, Kombai offers custom context engines to accommodate complex technology stacks. It is SOC 2 certified, ensuring data security and that user data is not used for further model training or improvements.

    Overall, Kombai fills a unique niche as the first domain-specific AI coding agent built exclusively for the frontend development domain, delivering unmatched code quality, developer velocity, and accuracy compared to generalist AI coding tools.

  • CEO Tim Cook says Apple ready to open its wallet to catch up in AI

    Apple CEO Tim Cook has recently confirmed that Apple is now “very open” to making bigger acquisitions in the AI space to accelerate its AI development roadmap. This marks a significant shift from Apple’s historically cautious approach to acquisitions. Cook emphasized that Apple is not constrained by the size of potential acquisition targets but focuses on whether a company can help speed up its AI efforts. While Apple has acquired about seven companies so far in 2025, those were relatively small deals; the company is open to much larger deals if they align with its AI acceleration goals.

    This move is in response to growing pressure from Wall Street and investors who view Apple as falling behind rivals like Microsoft, Google, and Meta in AI innovation. There are reports that Apple has had internal discussions about acquiring Perplexity AI, a conversational search startup valued around $14-18 billion, which would be Apple’s largest acquisition by a wide margin compared to its prior largest deal, the $3 billion Beats acquisition in 2014.

    In addition to considering large acquisitions, Apple plans to significantly grow its investments in AI, including reallocating resources internally and increasing capital expenditures on data centers, although it still uses a hybrid model that relies partially on third parties for infrastructure.

    In summary, Tim Cook’s latest statements reflect Apple’s readiness to “open its wallet” for major AI acquisitions and ramp up investments to catch up with competitors, signaling a strategic acceleration of its AI ambitions in 2025.

  • Ollama’s new app delivers a user-friendly way to interact with large language models on both macOS and Windows

    Ollama’s new app, released on July 30, 2025, delivers a user-friendly way to interact with large language models on both macOS and Windows. Here’s an overview of the standout features and capabilities:

    Here is the key Core Features:

    • Download and Chat with Models Locally: The app provides an intuitive interface to download, run, and chat with a wide range of AI models, including advanced options like Google DeepMind’s Gemma 3.

    • Chat with Files: Users can easily drag and drop text or PDF files into the chat. The app processes the file’s contents, enabling meaningful conversations or question answering about the document. For handling large documents, you can increase the context length in the app’s settings, though higher values require more system memory.

    • Multimodal Support : Thanks to Ollama’s new multimodal engine, the app lets you send images to models that are able to process them, such as Google’s Gemma 3. This enables use cases like image analysis and visual question answering, alongside typical text-based interactionsGemma 3 in particular boasts a context window of up to 128,000 characters and can process both text and images in over 140 languages.

    • Documentation Writing and Code Understanding: The app enables you to submit code files for analysis by the models, making it easier to generate documentation or understand complex code snippetsDevelopers can automate workflows such as summarizing codebases or generating documentation directly from source files.

    Additional Improvements

    • Optimized for Desktop : The latest macOS and Windows versions feature improved performance, reduced installation footprint, and a model directory that users can change to save disk space or use external storage.

    • Network Access & Automation: Ollama can be accessed over the network, allowing headless operation or connecting to the app from other devices. Through Python and CLI support, users can easily integrate Ollama-powered AI features into their own workflows or automation scripts.

    As a summary :
    Drag-and-drop files –> Chat with text/PDFs; increase context for large documents
    Multimodal support –> Send images to vision-capable models e.g., Gemma 3
    Documentation writing –> Analyze code, generate documentation
    Model downloads –> Choose and run large selection of LLMs locally
    Network/API access –> Expose Ollama for remote or automated workflows

  • International Olympiad in Artificial Intelligence (IOAI) 2025,in Beijing, China, from August 2nd to August 9th

    The International Olympiad in Artificial Intelligence (IOAI) is an international science competition focused on artificial intelligence, designed for high school students. Each participating country or territory can send up to two teams, with each team consisting of up to four students supported by one leader.

    The International Olympiad in Artificial Intelligence (IOAI) 2025 will be held in Beijing, China, from August 2nd to August 9th, 2025. This will be the second edition of the IOAI, following the inaugural event in 2024.

    The competition has two main rounds:

    • Scientific round: This comprises an at-home portion, which has a smaller weight, giving participants a month to solve problems, and an on-site portion with an 8-hour time limit. The scientific round tests theoretical and problem-solving knowledge in AI.
    • Practical round: Conducted on-site, lasting four hours, where participants solve two tasks related to AI applications like image and video generation, using existing AI tools to produce results.

    Awards are distributed with roughly half the participants receiving distinctions in gold, silver, and bronze medals in a 1:2:3 ratio for the scientific round and corresponding awards in the practical round. The top 3 teams receive honorary trophies.

    Beyond competition, IOAI fosters discussion on ethical AI issues and aims to engage the broader community, with activities such as involving local celebrities for promotion and hosting conferences where teams attend lectures and practical sessions on current AI topics.

    IOAI is a growing global event, with over 85 countries involved as of 2025, and the 2025 edition is scheduled to take place in Beijing, China. The Olympiad encourages international collaboration and talent development, supporting educational initiatives and national AI Olympiads through the Global AI Talent Empowerment (GAITE) program to promote equal participation worldwide.

    The official syllabus covers both theoretical foundations (“how it works”) and practical coding skills (“what it does and how to implement”), focusing on areas such as machine learning, natural language processing, and computer vision, ensuring students develop a balanced understanding and proficiency in AI.

  • China’s AI startup Zhipu releases GLM-4.5 and GLM-4.5 Air

    Zhipu AI (also known as Z.ai or 智谱AI) is a leading Chinese AI company specializing in large language models and other artificial intelligence technologies. Originating from Tsinghua University, Zhipu AI has attracted major investment from top Chinese tech firms and international backers. By 2024, it was regarded as one of the “AI Tiger” companies in China and is a significant player in the global AI landscape. The company is known for rapidly developing innovative LLMs, releasing open-source models, and building tools focused on agentic and reasoning capabilities.

    GLM-4.5 and GLM-4.5 Air: Overview

    Both GLM-4.5 and its compact sibling, GLM-4.5 Air, are foundation large language models designed for advanced reasoning, coding, and agentic tasks. They mark Zhipu AI’s push to unify general cognitive capabilities and serve as powerful backbones for intelligent agent applications.

    GLM-4.5

    • Size: 355 billion total parameters, 32 billion active parameters at runtime.

    • Core Features:

      • Hybrid Reasoning: Supports a “thinking mode” for tool use and multi-step reasoning (e.g., solving math, code, and logical problems) and a “non-thinking mode” for instant responses.
      • Agent Readiness: Designed for agent-centric workflows, integrating tool-calling natively for seamless automation and coding.
      • Performance:
        • Ranks in top three across many industry benchmarks, comparable to leading models such as Claude 4 Opus and Gemini 2.5 Pro.
        • Particularly excels in mathematics, coding, data analysis, and scientific reasoning—achieving near or at state-of-the-art results in tests like MMLU Pro and AIME24.
        • Demonstrates a high tool-calling success rate (90.6%) and strong coding benchmark performance.
    • Context Window: 128,000 tokens.
    • Open source: Weights and implementation available for research and commercial use (MIT license condition).

    GLM-4.5 Air

    • Size: 106 billion total parameters, 12 billion active parameters during inference.
    • Design: Lightweight, mixture-of-experts architecture for optimal efficiency and deployment flexibility, including running locally on consumer-grade hardware.
    • Same 128K context window as GLM-4.5.
    • Hybrid Reasoning & Agentic Capabilities:

      • Maintains strong reasoning and tool-use abilities, a hallmark of the GLM-4.5 family.
      • Offers a balance of performance and resource consumption, making it well suited to cost-sensitive and high-throughput applications.
      • On benchmarks, it scores competitively with other industry-leading models while using far fewer compute resources.
    • Use cases: Efficient deployment for enterprise AI assistants, automation, coding support, customer service, and affordable large-scale deployments.

    Performance and Accessibility

    • Competitive Pricing: API costs are among the lowest on the market, reflecting Zhipu AI’s strategy to undercut competitors and democratize access to advanced AI.
    • Open Source Access: Both models are available for free testing and deployment through multiple platforms like Hugging Face, Zhipu AI Open Platform, and third-party APIs.
    • Community and Ecosystem: Zhipu AI encourages developer and research engagement, providing technical blogs, documentation, and standard model APIs.

    In Summary

    • Zhipu AI is a dominant force in China’s rapidly growing AI industry, focusing on high-performance, open-source language models.
    • GLM-4.5 is a very large LLM targeting top-tier reasoning, agentic, and coding abilities.
    • GLM-4.5 Air offers similar power but much higher efficiency for wider, cost-effective deployment.

    These models are part of a new wave of AI technologies enabling more accessible, adaptable, and powerful agentic applications in both research and enterprise settings.

  • OpenAI launches Study Mode in ChatGPT, Study and learn

    ChatGPT Study Mode is a new feature designed to transform the ChatGPT experience from just giving quick answers into a guided, step-by-step learning process. It helps users build a deeper understanding of any topic by engaging them with Socratic-style questions that prompt critical thinking, breaking down concepts into manageable parts, and personalizing lessons based on the user’s skill level and past interactions if memory is enabled.

    Here is the key features of Study Mode include:

    • Asking interactive, guiding questions to stimulate reasoning rather than providing direct answers immediately.

    • Breaking down complex topics into easy-to-follow sections, progressively increasing in complexity.

    • Offering quizzes, open-ended questions, and personalized feedback to check understanding and track progress.

    • Supporting multimodal inputs like images, PDFs, and voice dictation to work with users’ learning materials.

    • Being accessible on all ChatGPT plans (Free, Plus, Pro, Team) globally, with availability soon in ChatGPT Edu plans.

    Study Mode was developed in collaboration with educators, scientists, and pedagogy experts, incorporating proven learning science principles such as managing cognitive load, encouraging active participation, fostering curiosity, and promoting metacognition and self-reflection.

    It is particularly useful for homework help, exam preparation, and unpacking class concepts while encouraging students to think critically rather than just completing tasks. Although it enhances learning engagement, users can still switch back to the standard ChatGPT interface for direct answers, which some critics argue may limit its impact if students prefer shortcuts.

    To activate it, users select Tools in the ChatGPT prompt window and choose Study and learn, then specify their topic, level, and goals for a tailored learning session.

    ChatGPT Study Mode is an AI-powered interactive tutoring experience aimed at supporting deeper, more active learning through guided questioning, personalized content, and multimodal inputs, rather than just providing finished answers. It represents OpenAI’s effort to make ChatGPT a more effective educational tool.

  • Jack Dorsey officially launches Bitchat messaging app that works offline

    Jack Dorsey officially launched Bitchat, a decentralized messaging app that works offline using Bluetooth Low Energy (BLE) mesh networking, on the Apple App Store on July 29, 2025. The app allows users to send end-to-end encrypted messages to others nearby without needing internet, Wi-Fi, or cellular service by creating a mesh network where phones relay messages to extend the communication range potentially up to 300 meters. Bitchat does not require accounts, phone numbers, or logins—users can start messaging immediately after installation with randomly assigned or customizable display names.

    Bitchat initially launched in beta via Apple’s TestFlight, reaching its 10,000-user limit quickly before the full release. The app is currently available only on iOS, with an Android version in development but not yet released. However, a known bug prevents iOS devices from connecting to Android devices so far, with a fix submitted to Apple for approval.

    The app aims to provide secure, private communication in low-connectivity or no-connectivity scenarios, such as natural disasters, protests, or areas with internet restrictions. Despite promoting strong security and privacy, some researchers have flagged potential vulnerabilities, including risks of user impersonation, and Dorsey has acknowledged the app lacks an external security review currently.

    Bitchat represents an innovative offline messaging platform officially launched by Jack Dorsey in late July 2025 on iOS, leveraging Bluetooth mesh networking to enable peer-to-peer communication without internet reliance.

  • Rakuten launches comprehensive AI platform across services

    Rakuten has launched a comprehensive AI platform called “Rakuten AI,” designed to enhance and streamline user experiences across its entire ecosystem of services. The full-scale launch began on July 30, 2025, with initial integration into Rakuten Link, the communication app for Rakuten Mobile subscribers. Rakuten AI is accessible free of charge and also available as a standalone web app in beta, aimed at broad user engagement in digital communications.

    The platform features advanced agentic AI capabilities, including chat functions, automatic search prompts, voice-to-text and image input, AI research, personalized shopping recommendations, translation, text reading support, programming assistance, and image generation. Rakuten AI has deep Japanese language and cultural awareness, which enables it to provide personalized, context-rich interactions in areas like e-commerce, fintech, travel, education, wellness, and entertainment.

    A significant expansion is planned for autumn 2025, when Rakuten AI will be integrated into Rakuten Ichiba, Rakuten’s flagship e-commerce marketplace. This integration will offer users real-time product recommendations based on behavioral data and purchasing insights, broadening the AI’s role in personalized customer experiences.

    Additionally, Rakuten Mobile offers a corporate-focused generative AI service called “Rakuten AI for Business,” launched earlier in 2025, which supports business tasks such as document creation, translation, brainstorming, and analysis. This service is optimized for the Japanese market and business customs, emphasizing security and ease of deployment at a monthly subscription rate.

    Rakuten’s AI initiative, branded under the term “AI-nization,” reflects the company’s strategic commitment to embedding AI deeply within its product ecosystem to empower both consumers and businesses. This approach was highlighted at Rakuten AI Optimism 2025, a three-day event dedicated to showcasing AI technologies and innovations across Rakuten’s services.

  • Gemini 2.5 Flash-Lite is now stable and generally available

    Gemini 2.5 Flash-Lite is Google DeepMind’s most cost-efficient and fastest model in the Gemini 2.5 family, designed specifically for high-volume, latency-sensitive AI tasks such as translation, classification, and other real-time uses. It balances performance and low cost without compromising quality, making it ideal for applications requiring both speed and efficiency.

    Key features of Gemini 2.5 Flash-Lite include:

    • Low latency and high throughput optimized for real-time, high-volume workloads.
    • Optional native reasoning (“thinking”) capabilities that can be toggled on for more complex tasks, enhancing output quality.
    • Tool use support including abilities like search and code execution.
    • Cost efficiency at rates of about $0.10 input and $0.40 output per million tokens, providing an economical choice for large-scale use.
    • Supports multiple input types including text, images, video, audio, and PDF.
    • Token limit of up to 1,048,576 for input and 65,536 for output.
    • Available for production use via Google AI Studio and Vertex AI.

    It stands out for combining speed, cost-effectiveness, quality reasoning, and multitasking capabilities, making it suitable for developers needing scalable, interactive, and real-time AI services.

  • Gemini Embedding now generally available in the Gemini API

    The Gemini Embedding model, called gemini-embedding-001, is a state-of-the-art text embedding model recently made generally available by Google through the Gemini API and Vertex AI. It is designed to generate dense vector representations of text that capture semantic meaning, enabling advanced natural language processing applications.

    Here is the key features of gemini-embedding-001 include:

    • High Performance and Versatility: It consistently ranks top on the Massive Text Embedding Benchmark (MTEB) for multilingual tasks, outperforming previous Google embedding models and many commercial alternatives.
    • Multilingual Support: Supports over 100 languages, making it ideal for global and cross-lingual applications such as translation, semantic search, and classification.
    • Long Input Handling: Accepts input sequences up to 2048 tokens, allowing for longer and more complex text or document embeddings.
    • Large Embedding Dimension: Outputs vectors with a default size of 3072 dimensions, offering detailed semantic representation. Developers can scale down the output dimensions to 1536 or 768 using Matryoshka Representation Learning (MRL) to balance between embedding quality, computational cost, and storage needs.
    • Unified Across Domains: Performs well across diverse fields—science, legal, finance, software development—offering a single solution for multiple enterprise and research use cases.
    • Flexible Usage: Available with free and paid tiers on Google’s Gemini API, allowing experimentation at no cost and scaling for production.

    Overall, gemini-embedding-001 provides a cutting-edge, flexible, and efficient embedding solution that can be integrated easily to enhance tasks like semantic search, classification, recommendation, and more sophisticated AI workflows across many languages and domains.