Gemma 3 270M embodies this “right tool for the job” philosophy. It’s a high-quality foundation model that follows instructions well out of the box, and its true power is unlocked through fine-tuning. Once specialized, it can execute tasks like text classification and data extraction with remarkable accuracy, speed, and cost-effectiveness. By starting with a compact, capable model, you can build production systems that are lean, fast, and dramatically cheaper to operate.Gemma 3 270M is designed to let developers take this approach even further, unlocking even greater efficiency for well-defined tasks. It’s the perfect starting point for creating a fleet of small, specialized models, each an expert at its own task.
Here is the core capabilities of Gemma 3 270M:
- Compact and capable architecture: Our new model has a total of 270 million parameters: 170 million embedding parameters due to a large vocabulary size and 100 million for our transformer blocks. Thanks to the large vocabulary of 256k tokens, the model can handle specific and rare tokens, making it a strong base model to be further fine-tuned in specific domains and languages.
- Extreme energy efficiency: A key advantage of Gemma 3 270M is its low power consumption. Internal tests on a Pixel 9 Pro SoC show the INT4-quantized model used just 0.75% of the battery for 25 conversations, making it our most power-efficient Gemma model.
- Instruction following: An instruction-tuned model is released alongside a pre-trained checkpoint. While this model is not designed for complex conversational use cases, it’s a strong model that follows general instructions right out of the box.
- Production-ready quantization: Quantization-Aware Trained (QAT) checkpoints are available, enabling you to run the models at INT4 precision with minimal performance degradation, which is essential for deploying on resource-constrained devices.
Leave a Reply