Product Thinking w/ Surya

The Tool Spectrum is Collapsing

Marty Cagan's recent piece on prototyping tools draws a clean line: build-to-learn tools on one side, build-to-earn tools on the other. He's right about the hype problem: product managers confusing high-fidelity prototypes with production-ready systems. But the binary he describes is already dissolving.

The categorization reflects tool architecture. Lovable and Bolt for prototyping, Claude Code, and Cursor for production. UI-first tools abstract complexity and accelerate visual validation. Terminal-based tools expose code and configuration, giving engineers control over reliability, observability, and scale.

But that architectural difference doesn't lock tools into single purposes anymore.

What's Shifting

Claude Code sits in both camps. Non-technical product managers can use it to generate working prototypes that run on live data and simulate complex business logic. Then they hand that same generated code to engineering, who refactor what's useful and discard what isn't. But they continue building in the same environment with the same tool.

This isn't theoretical. It's happening now with Claude Code, Codex CLI, and tools like Droid Factory. The learning curve exists. Terminal interfaces intimidate at first, but the investment pays off in continuity. One tool, two phases, no translation layer.

Why the Convergence Matters

When prototyping and production share infrastructure, handoffs get cleaner. Product managers generate testable hypotheses in code, not static mockups. Engineers inherit working logic they can evaluate and extend, not wireframes they interpret from scratch. The gap between discovery and delivery narrows.

This doesn't erase Cagan's core warning: prototypes still aren't products. Business complexity, runtime demands, and operational constraints (reliability, telemetry, fault tolerance, compliance) remain non-negotiable for commercial-grade systems. Prototyping sophistication doesn't eliminate that work.

But it does change the question. It's not "Can this prototype become a product?" It's "How much of this prototype's logic survives into production, and how quickly can we validate the rest?"

The Implications

For product managers: Learning terminal-based tools is now a leverage move, not a technical detour. If you can prototype in the same environment engineering uses for delivery, you reduce interpretation overhead and accelerate feedback loops.

For engineering teams: Code generated during discovery becomes a starting point, not a distraction. You're evaluating real logic against real constraints, not translating concepts across tool boundaries.

For organizations: The "build-to-learn versus build-to-earn" framing still holds. Separating discovery from delivery remains essential. But the tooling gap that once reinforced that separation is closing. That's a workflow shift, not a conceptual collapse.

Open Questions

Can these converged tools handle the full complexity Cagan describes (thousands of use cases, enterprise-grade reliability, zero-downtime deployments) within the next three years? Unknown. The spec-driven development is picking up, so I am optimistic.

If tools serve both discovery and delivery well enough to accelerate learning and reduce handoff friction, that's sufficient. Perfect continuity isn't the goal. Better sequencing is.

The hype Cagan warns against (thinking prototypes are products) still deserves the warning. But the tools enabling that confusion are also solving a different problem: making the path from prototype to product less lossy. That's not hype. That's progress.

Don't Contain Innovation—Spread It

Your innovation lab isn't the problem. Keeping innovation isolated there is.

The pattern shows up in different forms—innovation labs, Centers of Excellence, digital transformation teams. The setup looks similar: bring in smart people, give them freedom to experiment, then wait for breakthroughs while the core business operates exactly as before.

I've watched this play out at multiple Fortune 500 companies. The lab discovers what customers need. Core business teams keep building the same way they always have. When the lab's insights finally reach leadership, the response is "interesting, we'll prioritize that next quarter." By then, a competitor has already shipped.

Here's the test: Try naming three companies where innovation labs successfully spread their methods across the entire organization. Not companies with famous labs. Companies where the lab's thinking transformed how core business teams operate.

I can't name many. That scarcity tells you something.

The Superstar Trap

The traditional model treats innovation like a superstar problem. Hire the best people. Give them the best tools. Isolate them from the constraints of the core business. Wait for breakthroughs.

This fails more often than it succeeds.

When innovation lives in one team, that team becomes a bottleneck. Every insight has to flow through them. Every experiment needs their approval. The rest of the organization waits for direction instead of building their own capacity to improve.

The labs that actually work don't stay separate. They seed methods, not solutions. They build innovation muscle across teams instead of concentrating it in one place. They make their job obsolete by teaching the rest of the organization how to think.

That's not a superstar model. That's a systems model.

What Spreading Looks Like

Innovation in your core business doesn't look like moonshots. It looks like your claims team is cutting approval time from five days to three using better automation. Your integration team is building reusable patterns instead of custom work each time. Your product managers are using AI tools to run discovery in days instead of weeks.

These changes compound. Today you improve one process. Tomorrow, another team sees what you did and adapts it. Next week, an adjacent team asks how you did it. That's how innovation spreads—not through mandates, but through demonstration.

The lab's job isn't to own innovation. It's to seed champions across the organization. Find the engineer in your payments team who sees a better way. The product manager in member services who wants to rethink onboarding. Give them methods, support, and air cover. Let them show their peers what's possible.

The Directional Test

You won't get 100% of your organization innovating. Some teams will lag. Some functions exist just to keep things running. That's fine.

The question is directional. Are more teams thinking about improvement this quarter than last? Is innovation behavior spreading or staying contained?

When only one part of your organization thinks about innovation, you're in trouble. When your core business teams stop asking "how do we improve this?" they treat their domain as frozen. They wait for the lab to tell them what's next. Meanwhile, your competitors are moving.

Innovation dies when one team owns it. It lives when everyone has permission to improve how they work. The companies winning today aren't the ones with the best labs. They're the ones where core business teams have built the innovation muscle—where innovation becomes part of the culture, not a separate function.

What's your lab spreading this quarter?

The Gap Between AI Adoption and Enterprise Value

Two-thirds of organizations are stuck in the pilot phase with AI. They run experiments, they test use cases, they see promising results—then nothing scales.

McKinsey's latest State of AI report (November 2025) reveals the pattern: 90% of organizations regularly use AI, but only 39% report enterprise-level EBIT impact. The gap between adoption and value isn't a mystery. It's a choice.

What the data shows

The survey of organizations reveals three distinct clusters:

Most organizations (nearly two-thirds): still experimenting or piloting, chasing efficiency gains.

High performers: setting growth and innovation as objectives alongside efficiency, redesigning workflows, and transforming business models.

The gap: 62% now experiment with AI agents, but few have moved beyond testing to production systems that reshape how work flows.

Not efficiency, but expansion

Here's the contrast: 80% of companies set efficiency as their AI objective. Cut costs, automate tasks, compress timelines. That's table stakes.

(Source: McKinsey State of AI Report)

High performers add a second intent: growth. They ask what new capabilities AI unlocks, what products become possible, and what customer experiences were previously out of reach.

Efficiency optimizes what exists. Growth expands what's possible.

The organizations seeing material value aren't just automating workflows—they're redesigning them. They're asking: if this task takes 10% of the time, what does the team do with the other 90%? If we remove this coordination bottleneck, what new bets become viable?

The workflow redesign test

Most AI deployments layer intelligence on top of existing processes. A copilot here, a classification model there, a summarization tool for that document type.

High performers reverse the question: If we assume AI can handle these tasks, what would the workflow look like from scratch?

That's not augmentation. That's transformation.

Example pattern: Radiology departments don't just use AI to flag potential findings—they redesign the entire diagnostic workflow, routing routine reads through automated triage while radiologists focus on complex cases, with AI pre-assembling patient history and relevant priors.

Another: Prior authorization teams don't just speed up approval requests—they restructure coverage determination so standard cases auto-approve based on clinical guidelines while utilization management focuses on edge cases, care optimization, and high-cost interventions.

The workflow redesign loop:

Identify the bottleneck or high-volume task
Prototype AI-native version (not AI-assisted)
Map new human role (expanded scope, not old scope faster)
Test enterprise impact (EBIT, not use-case ROI)
Scale, if impact confirms

The agent signal

The 62% experimenting with AI agents isn't just about technology maturity. It's a signal that organizations are starting to think about automation differently.

Agents don't just complete tasks. They coordinate across systems, make sequential decisions, and route work based on context. That's infrastructure, not tooling.

The question isn't whether to experiment with agents. It's whether your organization is ready to redesign workflows to take advantage of what agents enable: continuity of context, asynchronous coordination, and decision automation.

Implications for product and engineering leaders

If you're building AI features, here's the fork in the road:

Path one: Add AI capabilities that make existing workflows 20% faster. Ship incremental value. Stay in the pilot phase with everyone else.

Path two: Identify one workflow that could expand your team's leverage by 3-5x if redesigned. Prototype the AI-native version. Measure enterprise impact, not feature adoption.

The data suggests most organizations will stay on path one. High performers are choosing path two.

Where to start

Pick one workflow where your team spends significant time but sees marginal strategic impact. Map the current state. Then ask: if coordination and routine decisions were automated, what would this process look like?

Don't optimize the workflow. Redesign it.

The gap between pilot and enterprise value isn't about better models or more compute. It's about asking better questions.

Count Dependency, Not Customers

Your competitor just announced 10,000 new customers. You added 200 developers to your API program. Who wins?

Traditional B2B thinking says the customer count matters most. Ecosystem thinking says dependency beats scale every time.

The moat has moved

For decades, B2B competitive advantage meant user acquisition. More customers, more revenue, stronger position. Simple math.

That math doesn't work anymore. The $420B business SaaS market is reorganizing around a different principle: the companies that provide infrastructure capture more value than the companies that chase end users.

Stripe doesn't compete for the most direct customers. It becomes the payment rails that thousands of products build on. Twilio doesn't fight for the largest user base. It provides the communication layer that other companies depend on. AWS doesn't win by having the most developers—it wins because entire businesses can't function without it.

The shift is from counting users to measuring dependency.

Why dependency compounds differently

User acquisition scales linearly. More sales capacity, more ads, more reps. Growth requires continuous input.

Ecosystem dependency scales exponentially. Every product built on your infrastructure creates switching costs. Every integration increases lock-in. Every developer trained on your API expands your moat. These aren't distribution channels—they're foundational infrastructure that others can't easily replace.

When a consultant builds their workflow around your SDK, they bring every client with them. When an agency standardizes on your API, they create dependency across their entire book of business. When a developer ecosystem emerges, your growth becomes their growth.

The ecosystem GDP metric

Traditional metrics measure your direct impact: MRR, user count, and retention. Ecosystem metrics measure what others build on top of you.

The most revealing number isn't on your P&L—it's ecosystem GDP: the revenue, products, and workflows created on your platform. Salesforce's AppExchange and AWS's partner ecosystems demonstrate this at massive scale. The value created on top of the platform dwarfs the platform's direct revenue.

When ecosystem GDP grows faster than your direct revenue, you've built real infrastructure. When it stagnates, you've built a product that others use but don't depend on.

What this means in practice

The investment priorities shift. Documentation becomes infrastructure that scales adoption, not just marketing collateral. Developer evangelism turns into moat construction. Open APIs become the foundation of exponential growth, not nice-to-have features.

The companies making this work:

Open APIs early, tier access strategically. Free tier seeds the ecosystem. Paid tiers capture value as dependency grows.

Target developers, consultants, and agencies with evangelist programs. These multipliers bring entire networks with them.

Treat documentation as a growth engine. Clarity compounds adoption. Better docs return exponentially through ecosystem expansion.

Make it profitable for others to build on you. Affiliate revenue, marketplace access, or data sharing—whatever incentivizes ecosystem creation.

The uncomfortable truth

Building for ecosystem dependency means sharing control. Your roadmap gets influenced by what developers build. Your pricing faces pressure from what the ecosystem needs. Your product decisions have to serve builders, not just end users.

Most B2B companies resist this. They want to own the customer relationship, control the experience, and capture all the value. That worked when distribution was scarce and switching costs were high. It doesn't work when infrastructure becomes the moat.

The companies winning the next decade understand: you don't need to own everything. You need to become the layer that everything else depends on.

Your competitor is still counting customers. The smarter bet is counting dependency.

Feature Factories Build AI Wrappers, Product Orgs Build Moats

Every company can call the OpenAI API. Every developer can wrap Claude in a decent UI. Every product team can ship "AI-powered" features in a sprint or two.

The hard part isn't adding AI. It's building something users can't easily replicate elsewhere.

The Wrapper Trap

Integrating GPT-5 or Claude takes a weekend. Polish the UI, tune some prompts, add it to your feature list. Congratulations—you've built what a hundred competitors can build in the same timeframe.

Users commoditize AI features instantly. If your "AI-powered writing assistant" offers the same value as ChatGPT with better instructions, why wouldn't they just use ChatGPT? If your meeting summarizer works like every other meeting summarizer, you're competing on price and distribution alone.

Grammarly spent years building writing data, style guides, and brand trust. Then ChatGPT offered similar writing assistance for free. Their remaining moat? Enterprise IT relationships and procurement inertia. That's the lesson: yesterday's data moats don't automatically transfer to the LLM era.

What Actually Creates Moats

Real differentiation comes from what you build around the AI:

Proprietary data flywheels. Your system gets smarter from user interactions—trained on their workflows, their terminology, their edge cases. The value compounds with usage and can't be copied by competitors starting from scratch. Real data moats require uniqueness, structure, and continuous feedback loops.

Deep workflow integration. AI embedded in tools users already depend on creates switching costs. GitHub Copilot works because it's native to VS Code, understands repo context, and fits developer workflow. A standalone AI code editor faces an uphill adoption battle.

Trust infrastructure. SOC2 compliance, HIPAA certification, audit trails, data residency guarantees. These take 6-12 months to build properly and can't be copied overnight. Enterprise buyers care more about this than model performance.

Domain expertise encoded. Generic prompts produce generic results. Real value comes from vertical knowledge baked into your system—industry terminology, regulatory requirements, workflow patterns that took months to map.

B2B Moats Look Different

Consumer AI wrappers commoditize in weeks. B2B wrappers have more time because of procurement cycles—but without real integration, they'll commoditize too. Just slower. The squeeze forces startups to sequence distribution into defensibility before incumbents close the window.

System integration creates defensibility. APIs that connect to Salesforce, Jira, Slack, and internal tools take months to build and certify. Users don't switch because removal disrupts workflows across multiple systems.

Switching costs through team adoption matter more than individual user preferences. When entire teams are trained on your tool, share templates and configurations, collaborate through your platform—that's organizational lock-in, not just feature preference.

Professional services create procurement moats. If implementation requires weeks of configuration, change management, and custom integration work, you're selling transformation, not software. ChatGPT Enterprise can't replace that.

The Test

Can a user get identical value from ChatGPT Enterprise with custom instructions? If yes, you're in the wrapper business. And that's a race to zero margin.

Can a competitor replicate your core value in a quarter? If yes, your moat is brand and distribution, not product. Those erode faster in the AI era.

The window for experimentation without consequences is closing. Users are developing AI feature fatigue. Capital is flowing to companies with actual defensibility. The gap between wrapper products and moat products is widening every quarter.

The question isn't whether to ship AI features. It's whether you're building something users can only get from you.

What makes your AI defensible?

How to Build Product Sense

Everyone agrees product sense separates good PMs from great ones. Nobody can define what it actually means.

Here's the paradox: product sense feels like intuition, but it's built through systematic practice. It looks like magic, but it's earned through reflection, pattern recognition, and user empathy. The vagueness isn't because it's mystical—it's because it's contextual. What works in enterprise software fails in consumer apps. What matters in healthcare differs from fintech.

But certain principles hold. Here's how to strengthen product sense in any domain.

Make decisions with incomplete data

Every meaningful product decision happens with incomplete information. Perfect data doesn't exist, and waiting for it means losing momentum or missing insight.

The best PMs build pattern recognition by comparing current situations with past experiences, analyzing what worked and what failed, and using that history to decide faster next time. They make small, reversible bets—features they can test quickly or hypotheses they can validate with minimal cost.

Each decision, even wrong ones, refines your mental model. You learn what "good" looks like by making calls and watching what happens. Wrong decisions teach you more than safe ones.

Watch users struggle

Good intuition about products comes from unusually deep understanding of users. You can't build product sense by staring at dashboards. You have to see real people use your product and struggle with it.

Sit in on support calls. Watch user sessions. Read customer feedback without defensiveness. Ask: "What job is this user trying to get done?"—a core principle from the Jobs To Be Done framework.

The real insight comes from what users don't say. Notice the moments of confusion, the workarounds, the feature they use in a way you didn't expect. When you internalize those observations, your intuition becomes sharper and more empathetic.

Start with curiosity, not confirmation. You're looking for truth, not validation.

Solve problems, not symptoms

Many teams rush to solutioning, mistaking activity for progress. Strong product sense means slowing down to define the real problem before moving to design.

Ask "why" until you reach a root cause. If users churn after onboarding, the issue might not be your onboarding flow—it might be that users didn't see value in the first place. Frameworks like the 5 Whys or Opportunity Solution Trees help trace symptoms back to underlying needs.

If you can't describe the user pain in one sentence, you don't understand it yet.

Think in trade-offs

Product sense isn't only about users. It's about understanding how daily decisions shape long-term direction. Every roadmap choice reflects strategy, even the small ones.

Before saying yes to a feature, ask: "If we do this, what becomes easier or harder later?" That question forces alignment between tactical work and strategic intent. Sometimes that means saying no to a customer request that fits today's demand but erodes tomorrow's flexibility.

Strong product sense turns intuition into discipline—connecting the dots between now and next.

The path forward

Product sense is practical wisdom built through reflection, user contact, and repeated decision-making. It's not design taste or technical expertise, though both help.

To strengthen it: stay curious, test your assumptions, seek feedback constantly. Over time, the patterns you see become instinctive. Your decisions feel less like guesses and more like grounded judgment.

Great product sense isn't magic. It's earned.

You're Not an AI User, You're an AI Manager

A year ago, an engineer typed code into an IDE. Maybe GitHub Copilot suggested lines. Maybe they asked ChatGPT for help.

Today, that same engineer prompts an agent to write substantial chunks of code, then reviews what comes back. The work that used to take days now takes hours.

The job didn't disappear. It became something fundamentally different.

Aaron Levie, Box CEO, puts it directly: "The job of an individual contributor really begins to change because you are now a manager of agents."

The Management Shift

The old knowledge work loop: receive task, execute task, deliver output.

The new loop:

Receive objective
Decompose into agent-appropriate chunks
Allocate chunks to agents or self
Review agent outputs
Integrate, iterate, or reject
Deliver outcome

Every step except execution has become a management function. You're no longer measured on how fast you execute. You're measured on how well you allocate and how accurately you evaluate.

This shift requires skills most ICs haven't developed: decomposition, allocation, review velocity, and orchestration. These are management skills—planning, resource management, quality control, and workflow design.

When you're reviewing 10x more output than you used to produce, your error rate needs to drop proportionally, or you're introducing more mistakes, not fewer. You need explicit quality benchmarks, fast feedback loops, and clear prioritization.

Why Work Expands Instead of Disappearing

When you make knowledge work more efficient, the intuitive response is: we'll need fewer people.

That's not how it plays out. When work becomes cheaper and faster, you don't do less—you do more.

At Box, if lawyers can review contracts twice as fast, the company doesn't cut the legal team. They review every contract faster, respond to customers faster, close more deals—which creates more legal work, not less. The bottleneck moves. It doesn't disappear.

This is Jevons Paradox applied to knowledge work: when you make a resource more efficient, demand expands to consume the new capacity.

Spreadsheets didn't reduce accounting jobs—they created more accountants. Photoshop didn't shrink design work—it exploded the number of designers. AI won't reduce knowledge work jobs—it'll expand what gets built.

The work that becomes economical at new price points opens entirely new categories of demand. You couldn't afford a second legal review before. Now you can run contracts through AI at a fraction of the cost. That's not replacing lawyers, it's expanding the total market.

What Changes Now

Start thinking like an allocator. Audit your tasks: which are you doing because they're important versus because they're "your job"? The second category is your target for agent allocation.

Build review systems. Define what "good enough" looks like. How do you evaluate outputs quickly? What triggers iteration versus rejection?

Practice decomposition. Break complex projects into chunks that could be handed to agents. What can run in parallel? Where do you need human judgment?

Small startups with no legacy workflows are designing around this from day one. They're prompt-driven, spec-driven, agent-reviewing. They operate in ways that larger companies with established processes can't match yet.

That's not about technology. It's about workflow.

The Transformation

You're not becoming an AI user. You're becoming an AI manager.

That role requires judgment, prioritization, quality evaluation, and workflow orchestration. It's less about executing tasks yourself and more about deciding which tasks matter, allocating resources efficiently, and integrating outputs into business value.

The skill that matters most is changing from "how well do you execute this task" to "how well do you decide which tasks matter and allocate intelligence accordingly."

The shift is already happening in engineering. It will ripple through every knowledge work domain over the next few years.

Traffic Metrics Are Lying to You

Your traffic is down. Your growth team is panicking. And your product metrics might be telling you absolutely nothing useful.

Kyle Poyar's 2025 State of B2B GTM report uncovered something fascinating: Webflow's aggregate traffic is declining while their business is accelerating. ChatGPT referrals convert at 24% compared to 4% from Google. Two-thirds convert within 7 days.

This isn't a Webflow-specific anomaly. It's what happens when AI search reshapes discovery.

The death of aggregate traffic as a north star

Google AI overviews are fundamentally changing what traffic even means. Low-intent, high-volume queries that used to pad your metrics are vanishing into AI-generated answer boxes. The traffic that remains is radically higher quality.

"A lot of our lower value and lower intent traffic has gone down, but there's higher quality traffic occurring even as the aggregate declines," Josh Grant, Webflow's VP of Growth, told Poyar.

Aggregate traffic is completely misleading without a quality metric. You're not measuring growth. You're measuring noise. This is a symptom of a broader problem with how metrics fail to capture what actually matters.

The new metrics layer: Visibility, comprehension, conversion

If traditional traffic metrics don't work, what does? Webflow built a three-layer framework for AI discovery:

Visibility: How often you're cited in AI search results. Not impressions or rankings. Citations. They track this across ChatGPT, Perplexity, and Claude using tools like Profound.

Comprehension: How accurately AI models describe your product versus competitors. Grant's team prompts multiple LLMs side by side to audit their narrative. If the description is wrong, they know where to improve.

Conversion: Signup rates and time-to-conversion from LLM-referred traffic. High-intent traffic doesn't just convert better. It converts faster.

Traditional SEO dashboards track rankings and clicks. This framework tracks whether AI systems understand, trust, and recommend you.

What this means for product teams

This isn't just a marketing problem. Your product's narrative has to work for both humans and AI models. How ChatGPT describes your product when users aren't searching for you by name is your new positioning test.

The metrics you're optimizing for might be pushing you in the wrong direction. Volume-based goals (MAU, traffic, impressions) reward low-quality interactions. Quality-based goals (conversion rate, time-to-convert, citation frequency) reward relevance and trust. Instead of chasing traffic, map your work by customer value and business value to reveal what's actually moving the needle.

AI discovery is volatile, not fixed like Google rankings. Grant's observation: "Every query is a fresh model run that reshuffles sources in real time based on context, trust, and recency." You can't optimize once and coast.

Teams treating AI discovery as optional or as a one-time project will spend the next year explaining why their metrics look strong but their pipeline has dried up.

The question you should be asking

If aggregate traffic is misleading, what quality metrics are you tracking today? If you're not tracking quality separately from volume, how do you know whether your growth is real or just noise?

The shift from traffic to intent, from volume to quality, from rankings to comprehension is not a future state. It's happening now. Webflow's data proves it. The question is whether your metrics can see it.