Product Thinking w/ Surya

Infrastructure Redundancy Stops Before the CDN

Azure, AWS, and Cloudflare all experienced significant outages in recent weeks. Different providers, same story: configuration changes triggering cascading failures across infrastructure that's supposed to be resilient.

The interesting part isn't that infrastructure fails. It's what gets exposed about the gap between architected resilience and actual resilience.

The multi-cloud gap

Companies might use AWS for one application and Azure for another, but any given application typically runs on a single cloud. Redundancy within that provider (multiple regions, availability zones) but the provider itself is treated as permanent infrastructure.

Then Cloudflare goes down and everything stops.

The pattern shows up consistently: sophisticated redundancy for compute, single-provider dependency for CDNs, DNS, and edge infrastructure. Like installing a backup generator but leaving your electrical panel connected to a single grid.

Configuration as failure mode

All three outages share the same root cause pattern: configuration changes, not hardware failures or attacks.

Azure's outage started with a networking configuration that created inconsistent state. AWS's disruption began when two automated systems tried to update the same database simultaneously. Cloudflare's global failure came from a database permissions change that corrupted the Bot Management system.

Infrastructure complexity creates failure modes that are hard to predict. Routine configuration changes can trigger cascading failures across regions or global networks.

This shifts the threat model. Traditional redundancy focuses on external threats: datacenter failures, provider outages, hardware degradation. But when configuration complexity is the primary failure mode, redundancy alone doesn't solve it. You need loose coupling so failures don't cascade.

The CDN blindspot

Multi-CDN strategies exist. Load balancing across providers, health checks, automated failover: these are solved technical problems. CloudFront, Bunny.net, Akamai, Azure CDN all offer alternatives.

What's less common is treating CDN infrastructure with the same redundancy thinking applied to compute. When Cloudflare went down, companies with sophisticated multi-cloud architectures went offline just as completely as companies running on a single EC2 instance.

The gap shows up in infrastructure assumptions. Most organizations cluster around accidental multi-cloud. Different teams chose different providers over time, creating redundancy architectures that exist on paper but haven't been tested under actual failure conditions.

What changes this is intentionality. Some organizations have made explicit decisions about where redundancy matters and where it doesn't. They've calculated the cost of downtime for different parts of their product and architected accordingly.

They've also made a harder decision: accepting that some level of downtime is inevitable and building products that degrade gracefully rather than fail catastrophically.

As infrastructure complexity increases, new failure modes emerge faster than old ones get solved. The organizations that navigate this aren't the ones with maximum redundancy. They're the ones who've thought clearly about what they're optimizing for and built systems that fail gracefully.

World Models Teach AI to See

On Lenny's recent podcast, Fei-Fei Li called LLMs "wordsmiths in the dark": eloquent but ungrounded in physical reality. The phrase resonated because it captures exactly what language models can't do: understand space, navigate environments, predict physics, or reason about the 3D world we inhabit.

I've been following world models with growing curiosity. The contrast with LLMs is stark. Where language models learn statistical patterns from text, world models learn by watching: absorbing spatial relationships, temporal dynamics, and cause-effect from video and sensory data. They're designed to answer the question LLMs fundamentally can't: what happens next in physical space?

What's happening now

There is a clear acceleration in 2024-2025. Google's Genie 2 generates playable 3D worlds from a single image. NVIDIA's Cosmos trained on 20 million hours of real-world footage, creating physics-aware simulations that companies like Uber and XPENG are deploying.

Meta's V-JEPA 2 learns 5-6x more efficiently by predicting abstract representations rather than raw pixels.

Fei-Fei Li's World Labs just launched Marble, the first commercial world model product. The technology she's building toward: spatial intelligence, AI that understands the physical world the way humans do.

In a recent WSJ profile, Yann LeCun (Meta's Chief AI Scientist) is telling PhD candidates to focus on world models instead of LLMs. His prediction: world models could replace the LLM paradigm within 3-5 years.

What this could unlock

Autonomous vehicles are the obvious application, but I'm watching a broader pattern. Robotics companies use world models as virtual simulators, training robots in generated scenarios before deploying to reality. Industrial automation benefits from synthetic data generation for rare edge cases.

The shift runs deeper. LLMs process language, world models process reality. One understands how to describe gravity, the other understands falling.

Where this seems to be headed

This feels like 2018-era LLMs: early, expensive, limited to well-funded teams. Genie 2 generates 10-60 seconds of stable video. Cosmos requires massive GPU clusters for training. The sim-to-real gap remains a real challenge: small simulation differences cause real-world failures in safety-critical systems.

But the trajectory is visible. Google formed a new team for world simulation models. NVIDIA is making Cosmos open-source to accelerate the robotics community.

For most companies, there's no tangible bet to make yet. This technology isn't accessible enough for broad experimentation. But it's worth following closely.

World models feel like they're approaching their ChatGPT moment. GPT-3 existed for years before ChatGPT made it accessible enough to spark the LLM application wave. When world models hit that inflection point, the teams that have been tracking the space will know where to tinker first.

LLMs taught AI to speak. World models are teaching it to see.

AI Agents Multiply Work and Eliminate Jobs Simultaneously

Traditional automation follows a script. You map the steps, define the logic, and the system executes. If-then-else at scale.

AI agents are different. They have decision-making authority. You give them a goal, and they figure out the path, making choices on the fly based on context. That shift from scripted execution to delegated judgment changes what happens to your workload.

What the data shows

A recent study from Faros AI analyzed over 10,000 developers across 1,255 teams to understand what happens when AI adoption goes high. The productivity story looks clear at first: teams completed 21% more tasks and merged 98% more pull requests.

But the same data revealed the downstream effects. PR review time increased 91%. Bug rates went up 9%. The agents didn't just speed up the work developers were already doing. They revealed new work that hadn't existed before.

Someone has to review what the agent produced. Someone has to validate the decisions it made. Someone has to integrate its output with the existing codebase. The cognitive load didn't disappear: it moved downstream and multiplied.

Two different reads of the same pattern

One interpretation: this is Jevons Paradox for knowledge work. When you make something more efficient, consumption increases rather than decreases. The efficiency gains are real, but they're not reducing the total work in the system. They're expanding what's possible, which creates new categories of work that didn't exist before. Agent management. Agent training. Quality control for autonomous decisions.

The other interpretation: Anthropic CEO Dario Amodei warned that AI could eliminate roughly 50% of all entry-level white-collar jobs within the next one to five years. His logic centers on a shift from augmentation (AI helps people do jobs) to automation (AI does the job). If agents can handle the execution work, you don't need as many people doing it. The efficiency doesn't create more work. It reallocates the dollars to different problems.

The core tension

Both patterns are showing up simultaneously. The Faros data demonstrates work multiplication downstream. The Anthropic warning points to headcount reduction upstream, particularly at entry-level roles where tasks are more structured and agent-friendly.

It's too early to tell which dynamic dominates, or whether they operate in parallel across different types of work. But the pattern is clear enough to plan for. If you're deploying agents expecting simple headcount reduction, you might be underestimating the new work they create. If you're assuming efficiency always expands the team, you might be overestimating the number of people you'll need to manage what agents produce.

The shifting baseline

Here's what complicates both interpretations: the definition of "entry-level" is moving. What we consider entry-level today might be three notches higher in eighteen months. College graduates entering the workforce with AI fluency might start at what we'd call mid-level today, because the baseline expectations have shifted.

The agents aren't just changing how much work gets done or who does it. They're changing what counts as foundational capability. If that's true, continuous leveling up isn't optional. It's the only defense available. The landscape is changing too fast for static skillsets to hold value.

What new work will your agents reveal that you can't see yet? And what work will disappear faster than you expect?

Context Engineering Turns AI Agents From Goldfish Into Assistants

Your AI agent is brilliant. It can write code, analyze documents, and answer complex questions with remarkable sophistication.

It is also a goldfish. Every conversation starts from scratch. Every user is a stranger. Every context is new.

Google just released a whitepaper on context engineering that tackles this fundamental problem. The paper introduces a systematic framework for making LLM agents stateful using two core primitives: Sessions and Memory.

The framework formalizes the architectural patterns that separate toy demos from production AI systems.

The statelessness problem

LLMs are fundamentally stateless. Outside their training data, their awareness is confined to the immediate context window of a single API call.

You can craft the perfect prompt, tune every parameter, and still end up with an agent that forgets the user's name between conversations. The model doesn't remember. It doesn't learn. It processes each turn in isolation.

Context Engineering is the discipline of dynamically assembling and managing all information within that context window to make agents stateful and intelligent. It is prompt engineering evolved: shifting from crafting static instructions to constructing the entire state-aware payload for every turn.

The business impact is direct. Stateless agents can't personalize. They can't maintain coherent multi-turn workflows. They can't reduce repetitive questions or remember user preferences.

Context Engineering Framework (Google Whitepaper)
│
├─ Core Primitives
│  ├─ Sessions (temporary workbench)
│  └─ Memory (long-term filing cabinet)
│
├─ Key Distinctions
│  └─ Memory vs RAG
│
├─ Production Challenges
│  ├─ Latency & cost
│  ├─ Data isolation
│  └─ Memory poisoning
│
└─ Advanced Concepts
│  ├─ Memory provenance (trust layer)
│  └─ Procedural memory (workflows)

Sessions: The temporary workbench

A Session is the container for a single, continuous conversation. Think of it as the workbench where the agent does its immediate work. [...]

Goal Clarity Without Strategy Clarity Is Just Noise

The dynamic is shifting. AI tools let startups go from idea to credible prototype in weeks, not quarters. Technical execution gaps are narrowing. For enterprises, this changes the calculus.

The advantages used to be resources, data, distribution, and customer relationships. Those still matter. But only if you can deploy them before the market moves.

The real enterprise problem isn't speed

It's coordination.

Everyone knows the goal. "AI transformation." "Double growth." "Modernize the platform." Leadership repeats it constantly. Town halls, all-hands, strategic decks.

But goal clarity without strategy clarity is paralysis.

Teams know the destination but have no shared map. So they make local decisions that feel rational but don't compound. Five teams each moving at reasonable pace, solving adjacent problems in isolation. Velocity looks fine locally. Strategic progress is zero.

The turf wars amplify this. "That's my lane." "No, it's mine." Some of this is healthy. You need clear ownership. But it becomes extreme when there's no strategy to adjudicate scope conflicts.

The coordination paradox

Here's the tension: You need input from multiple teams to form a coherent strategy. You need to identify who's already doing adjacent work, who controls critical capabilities, and who has context that would change the plan.

But decision-by-committee is doomed. If everyone needs to agree, you ship nothing.

The resolution isn't eliminating coordination. It's designing coordination for a 10x faster cycle time.

Include for input. Decide with authority. Move with speed.

Time-box strategy formulation to weeks, not months. Three weeks from "we need a strategy" to "teams are executing," not three quarters. Separate the input phase (broad consultation) from the decision phase (narrow authority). Default to leveraging existing capabilities unless there's a specific blocker.

Pre-negotiate escalation paths so turf conflicts get resolved in 48 hours, not 48 email threads.

Why this matters more now

Because the execution gap is narrowing. If a startup can prototype in 4 weeks and your enterprise takes 14 months to coordinate, your advantages evaporate.

Data moats, distribution, brand trust, and enterprise relationships only matter if you deploy them before competitors establish alternatives.

The question isn't whether you're moving fast in absolute terms. It's whether you're moving fast enough relative to how quickly the market is learning.

What's missing from your coordination system?

If you're in an established company trying to move with purpose:

Is it the actual decision-making structure? Who has authority at each level, and is that explicit?

Is it the incentive alignment? How do you get teams to cooperate instead of compete for scope?

Is it the measurement system? How do you know if you're actually moving faster, or just feeling busy?

Is it the cultural shift? From "coordination equals consensus" to "coordination equals speed"?

The answers determine whether your resources compound into leverage or fragment into theater.

Fast Teams Don't Ship More, They Learn Faster

Two teams both ship every week. One is learning. The other is just busy.

The difference isn't work ethic or talent. It's what they optimize for. Slow teams measure velocity by features shipped. Fast teams measure it by hypotheses validated. One counts outputs. The other measures learning rate.

The Learning Rate Problem

Shipping is easy. Learning is hard. Most teams can release code weekly but take months to figure out if it worked.

They ship a feature, watch some dashboards, have a few meetings, and eventually form an opinion. By the time they know what happened, the context has shifted and the team has moved on.

Fast teams collapse that loop. They don't ship faster because they cut corners. They ship faster because feedback arrives in hours, not weeks.

Each deployment answers a specific question. The instrumentation was built before the feature. The rollback is one click. The metrics update in real time.

By most accounts, Amazon's two-pizza teams work this way. Each team owns metrics, deployment, and learning. They don't wait for data teams to build dashboards or ask permission to roll back. The loop from "we think this will work" to "here's what actually happened" runs in days, sometimes hours.

What Fast Actually Means

Fast isn't about more features. It's about more learning cycles in the same time period. A team that ships one feature and validates it in a week is faster than a team that ships three features and validates them in a month.

The constraint isn't coding speed. It's learning infrastructure. Can you deploy without friction? Can you measure what matters automatically? Can you see results without waiting for someone else? Can you kill a feature as easily as you launched it?

Reportedly, Stripe optimized for this early. Every experiment had clear success metrics defined upfront. Results populated dashboards automatically.

Teams could see within 48 hours whether their hypothesis held. That learning rate compounded. More cycles meant more validated insights. More insights meant better decisions. Better decisions meant sustainable velocity.

The Real Metric

Your velocity metric shouldn't count story points or features shipped. It should measure time from hypothesis to validated learning. How many days from "we believe X" to "we now know Y"?

This is the same principle behind measuring outcomes rather than outputs. The question isn't what you built. It's what you learned.

If that number is more than two weeks, you don't have a shipping problem. You have a learning problem. And no amount of faster coding will fix it.

What's slowing down your learning loops right now?

Why Retention Starts at Onboarding, Not Growth

Most products lose 80% of users within 30 days. Teams see this happening and hand the problem to growth. They add email campaigns, push notifications, re-engagement hooks.

None of it moves the number because the retention problem wasn't created in month six. It was locked in during week one.

This isn't about better onboarding flows or slicker tutorials. It's about product decisions made before launch that determine whether users stay or leave months later. By the time your growth team measures retention, your product team already decided it.

Time-to-Value Determines Everything

Users don't leave because they forgot about your product. They leave because they never experienced its core value. The gap between signup and first meaningful outcome is where retention dies.

Consider Slack versus most enterprise tools. Slack delivers value in the first conversation. You invite a teammate, send a message, get a reply. That loop completes in minutes.

Most B2B products make you wait weeks: configure settings, integrate systems, import data, train your team. By the time value might arrive, the user already decided you're not worth it.

The best products collapse time-to-value ruthlessly. Figma lets you design in the browser with zero setup. Stripe processes your first test payment in minutes. Linear creates your first issue before you've read the docs.

Each optimized for the moment a user thinks "this actually works."

Complexity Curves Kill Quietly

Every feature you add increases the burden on new users. The complexity that delights power users in month twelve crushes new users in week one. This tradeoff is unavoidable, but most teams get it backwards. They design for the expert and hope beginners will figure it out.

Notion is the cautionary tale. Infinitely flexible, incredibly powerful, and overwhelming to 90% of new users who just wanted a place to write notes. The product's strength became its retention weakness.

Compare that to Linear, which hides advanced features behind progressive disclosure. New users see a clean issue tracker. Power users discover shortcuts, automations, and integrations as they need them.

The complexity curve should match the value curve. Early experience should be simple with obvious wins. Advanced capability should reveal itself gradually as users build competence and need more leverage.

Habit Formation, Not Feature Adoption

Retention isn't about using all your features. It's about embedding one habit that brings users back without thinking. The products with the best retention aren't the most feature-rich. They're the ones that become part of your daily rhythm.

GitHub doesn't retain engineers because of Actions or Projects. It retains them because checking pull requests becomes a morning ritual. Superhuman doesn't retain users through keyboard shortcuts.

It retains them by making inbox zero feel achievable daily. The habit is the moat.

Your onboarding should optimize for one thing: get the user to repeat the core action enough times that it becomes automatic. Three times is a trial. Seven times is a pattern. Thirty times is a habit.

The Real Metric

The metric that predicts retention isn't MAU or feature adoption. It's how many days until a new user completes the core loop three times. If that number exceeds seven (factor in your domain complexity), you have a retention problem that no growth campaign can fix.

The window to build retention is narrow. What product decision are you making today that will determine whether users are still here six months from now?

Why AI Platforms Are Testing User-Paid Sharing

Most platforms face a brutal tradeoff when enabling sharing. Charge creators for hosting and you limit adoption. Charge end-users at the point of distribution and you create friction. Subsidize usage yourself and the costs don't scale.

Each path blocks something you need: viral growth, sustainable economics, or both.

For years, platforms have picked their poison. SaaS tools charge creators monthly fees, killing casual sharing. Consumer apps eat infrastructure costs to drive growth, then scramble to monetize. Marketplaces take cuts that creators resent.

None of these models naturally align creator incentives with platform growth.

Anthropic's Artifacts feature tests a fourth path. When you build and share an interactive app in Claude, you pay nothing for distribution (no hosting fees, no infrastructure costs, no matter how many people use it). Instead, anyone who uses your shared artifact authenticates with their own Claude account, and their usage counts against their subscription.

The cost doesn't disappear. It just shifts to whoever's getting the value.

How the Model Works

Artifacts let you build interactive applications directly inside Claude. React-based UIs powered by Claude's API. You can create data analysis tools, games with adaptive AI, educational apps, writing assistants, or multi-step agent workflows.

Once you've built something, sharing is a single click. No deployment pipeline. No server configuration. No domain setup.

Here's where the economics diverge from traditional platforms. Users must authenticate with their Claude account to interact with shared artifacts. That authentication isn't just for access control. It determines who pays.

Every API call your shared app makes runs against the end user's Claude subscription, not yours. If you're on the free tier and share a tool that goes viral, you still pay nothing.

The platform handles scaling, hosting, and infrastructure. Users burn their own credits.

This creates unusual incentives. As a creator, you have zero reason to limit distribution. More users cost you nothing. [...]