Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers
  • Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks
  • The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs
  • Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In
  • Hacking the EU’s new age-verification app takes only 2 minutes
  • Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale
  • This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.
  • The Huey Code Guide: Build a High-Performance Background Task Processor Using Scheduling with Retries and Pipelines.
AI-trends.todayAI-trends.today
Home»Tech»Taalas has replaced programmable graphics cards with AI chips that are hardwired to reach 17,000 tokens/second.

Taalas has replaced programmable graphics cards with AI chips that are hardwired to reach 17,000 tokens/second.

Tech By Gavin Wallace23/02/20265 Mins Read
Facebook Twitter LinkedIn Email
Meta AI Introduces Multi-SpatialMLLM: A Multi-Frame Spatial Understanding with Multi-modal
Meta AI Introduces Multi-SpatialMLLM: A Multi-Frame Spatial Understanding with Multi-modal
Share
Facebook Twitter LinkedIn Email

The AI industry, which is highly regulated and has high stakes at play, operates under the assumption that flexibility is key. Because AI models are constantly changing, we build GPUs that have a general purpose. We also need silicon that is programmable and can be adapted to the latest research.

The following are some of the ways to get in touch with each other TaalasAccording to the Toronto-based start-up, flexibility is what holds back AI. According to Taalas team, if we want AI to be as common and cheap as plastic, we have to stop ‘simulating’ intelligence on general-purpose computers and start ‘casting’ it directly into silicon.

The Problem: The ‘Memory Wall’ and the GPU Tax

A physical bottleneck is driving the current costs of operating a Large Language Model. Memory Wall.

Traditional processors (GPUs) are ‘Instruction Set Architecture’ (ISA) based. Separate memory from compute. In order to perform an inference run on models like Llama-3 the processor spends a large amount of time and energy moving data from High Bandwidth Memory. This ‘data movement tax’ accounts for nearly 90% of the power consumption in modern AI data centers.

Taalas s radical solution: Remove the memory retrieval cycle. Taalas uses a proprietary design flow to translate the computation graph of a model into the actual layout of the chip. The authors describe how they use their proprietary automated design flow to translate the computational graph of a specific model directly into the physical layout of a chip. HC1 The weights of (Hardcore 1 chip) and the architecture is literally imprinted into the wiring.

https://taalas.com/the-path-to-ubiquitous-ai/

Hardcore Models: 17,000 Tokens Per Second

The results of this ‘direct-to-silicon’ approach redefine the performance ceiling for inference. Taalas recently demonstrated its latest technology. HC1 Running a Llama 3 8B model. A top-tier NVIDIA GPU H100 could serve just one user, at 150 tokens per sec. The HC1 can handle a staggering 30,000 users. 16 to 17 tokens per second.

This changes the ‘unit economics’ of AI:

  • Performance: One HC1 chip is capable of outperforming a small GPU Data Center in terms raw throughput.
  • Efficiency: Taalas claims a 1000x improvement in efficiency (performance-per-watt and performance-per-dollar) compared to conventional chips.
  • Infrastructure: The weights can be hardwired so there’s no need to use external HBM systems or complicated liquid cooling. These 250W cards are housed on a standard air-cooled server rack. This gives you the power to run an entire GPU Cluster in one box.

Automated Foundry Breaks the 60-Day Limit

The obvious ‘catch’ for an AI developer is flexibility. What happens if a new model is released tomorrow and you have hard-wired a particular model onto a chip? Historically, designing an ASIC (Application-Specific Integrated Circuit) took two years and tens of millions of dollars.

Taalas is the one who has figured this out. You can also find out more about the automation of your home.. The system is similar to a foundry compiler that can generate a chip in a matter of a few days. By focusing on a streamlined manufacturing workflow—where they only change the top metal masks of the silicon—they have collapsed the turnaround time from ‘weights-to-silicon’ to just Two Months.

This allows for a ‘seasonal’ hardware cycle. In the spring, a company can fine-tune its frontier model and deploy thousands of hyper-efficient, specialized inference chips by summer.

https://taalas.com/the-path-to-ubiquitous-ai/

Stamps are now the dominant market, replacing shovels

The AI hype cycle is at a turning point. We are moving from the ‘Research & Training’ phase—where GPUs are essential for their flexibility—to the ‘Deployment & Inference’ phase, where cost-per-token is the only metric that matters.

The AI market could split in two tiers if Taalas is successful:

  1. General-Purpose Training: NVIDIA, AMD and others provide the flexible, massive clusters required to train and discover new architectures.
  2. Specialized Inference Led by ‘foundries’ like Taalas, which take those proven architectures and ‘print’ them into cheap, ubiquitous silicon for everything from smartphones to industrial sensors.

The Key Takeaways

  • The ‘Hardwired’ Paradigm Shift: Taalas has moved from Software-defined AI Running models on GPUs for general purpose is not recommended. A hardware-defined artificial intelligence. By ‘baking’ a specific model’s weights and architecture directly into the silicon, they eliminate the need for traditional instruction-set overhead, effectively making the model the processor itself.
  • The Death of Memory Wall Taalas’s AI technology is 90% efficient. Taalas’s HC1 (Hardcore 1) Chip eliminates the “Memory Wall” By physically connecting the model parameters to the metal layers of the chip, the expensive High Bandwidth Memory is no longer needed.
  • 1,000x Improvement in Efficiency: By stripping away the ‘programmability tax’, Taalas claims a 1,000x improvement in performance-per-watt and performance-per-dollar. The HC1 has a maximum of 450 watts. 17,000 tokens per second on a Llama 3.1 8B model—massively outperforming a standard GPU rack while using far less power.
  • Automated ‘Direct-to-Silicon’ Foundry: Taalas has developed a proprietary system to solve the issue of model obsolescence. Automatic Design Flow. It takes just a couple of weeks to develop a customized AI chip, instead of years. Weeks, allowing companies to ‘print’ their fine-tuned models into silicon on a seasonal basis.
  • Commodity Artificial Intelligence Future: This technology signals a shift from ‘Cloud-First’ to ‘Device-Native’ AI. As inference becomes a cheap, hardwired commodity, AI will move off centralized servers and into local, low-power hardware—ranging from smartphones to industrial sensors—with zero latency and no subscription costs.

Take a look at the Technical details. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.


AI ar chips wired
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers

19/04/2026

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

19/04/2026

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

18/04/2026
Top News

Technology in the Classroom – A history of hype and hysteria

The Viral Storm Streamers Predicting Deadly Tornadoes—Sometimes Faster Than the Government

Ransomware based on AI is now a reality

The FBI can access your push notifications

The IRS is looking for smarter audits. Palantir can help determine who is flagged

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

This paper presents a full implementation of the code to design a graph-structured agent using Gemini, for task planning, retrieval and computation.

24/08/2025

Moonshot AI releases Kosong – The LLM Abstraction layers that powers Kimi CLE

11/11/2025
Latest News

xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers

19/04/2026

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.