← All posts
AI PM7 min read·

The AI PM Tech Stack I Use in 2025 (And Why Each Tool Earns Its Place)

The exact tools, APIs, and frameworks I use as an AI PM building real LLM-powered products. Not a list of buzzwords — a working stack with specific reasons for every choice.

Every "AI tools for PMs" list I've read is either a recycled list of ChatGPT, Notion AI, and Midjourney — or a VC's wish list of enterprise software. Neither is useful.

This is the actual stack I use, updated as of mid-2025, with specific reasons why each tool is in and what I tried before that didn't make the cut.

The framing: three layers

I think about my stack in three layers:

  1. Understanding — research, synthesis, thinking
  2. Building — designing and shipping AI-powered products
  3. Operating — running experiments, measuring, iterating

Most "AI PM tools" lists only cover layer 1. If you're actually building AI products — not just using AI for note-taking — you need all three.

Layer 1: Understanding

Claude (Anthropic) — Primary LLM

Best instruction-following of any model I've tested, especially for structured output. When I need a PRD section, competitive analysis in a specific format, or user stories that follow a schema, Claude returns it clean.

How I actually use it:

  • First-draft PRD sections from bullet notes
  • Synthesizing user research transcripts into themes
  • Stress-testing product reasoning ("argue against this feature decision")
  • Generating edge cases I haven't thought of

Perplexity — Research

Real-time web search + synthesis. For competitive research, market sizing checks, and "what's the current state of [technology]" questions — faster than building a search → read → synthesize loop manually.

NotebookLM — Document synthesis

When I have 10 PDFs or product docs and need to ask questions across all of them, NotebookLM is unmatched. I load a competitor's entire documentation site and interrogate it.

Layer 2: Building

Claude API — LLM backbone

When I'm building something like InstantPlan, Claude is the backbone. I default to Claude Sonnet for most tasks, Claude Opus for deep reasoning or evaluation. I avoid using the most powerful model by default — cost compounds fast.

n8n — Agent orchestration and automation

The best non-code-heavy way to build multi-agent pipelines. I use n8n to orchestrate LLM workflows: trigger → fetch data → process with LLM → format output → deliver.

What I've built with it:

  • Content repurposing pipeline: LinkedIn post → email → blog outline
  • Lead research automation for Zerton
  • Listen2RE content ingestion and processing pipeline

Why not Zapier or Make: n8n is self-hostable, has direct HTTP request nodes, and LLM integration is first-class. For serious automation, it's the better choice.

Cursor / Claude Code — Implementation

I'm a PM, not an engineer. But I can direct AI to implement things I've designed. Cursor with Claude lets me build production Next.js code through conversation. Claude Code goes further — it can execute, test, and iterate autonomously.

I review everything. I understand the structure even if I couldn't write it from scratch.

Vercel — Deployment

Zero-config deployment for Next.js. Push to GitHub, site is live. Preview deployments per branch are invaluable for testing before shipping.

Supabase — Database for AI apps

Postgres with a generous free tier, built-in auth, and a REST API out of the box. For AI products that need to store conversation history or evaluation results, Supabase is the fastest path to production.

Layer 3: Operating

PostHog — Product analytics

Open source, self-hostable, generous free tier. Session recording is the fastest way to understand where users get confused.

What I track for AI features:

  • Generation trigger rate (how often the AI feature is used)
  • Acceptance rate (output kept vs regenerated)
  • Time-to-first-value
  • Drop-off point

LLM-as-a-Judge (custom) — AI output quality

For AI products, you need a way to evaluate output quality at scale. One LLM generates output, another evaluates it against criteria. This is how I track quality without manual review of every response.

Evaluation dimensions I use:

  • Relevance
  • Completeness
  • Format adherence
  • Hallucination risk

What's NOT in my stack (and why)

GitHub Copilot — I use Claude Code instead. Better for my workflow.

Notion AI — I use Notion for docs but write with Claude in a separate tab and paste results in. Native Notion AI doesn't match Claude's output quality.

Jasper / Copy.ai — Outdated category. Claude does everything these do, better.

LangChain — Too heavy for most of what I build. n8n covers 80% of my orchestration needs without the Python complexity.

The one-sentence version

Claude for thinking, n8n for orchestration, Cursor/Claude Code for building, Vercel for shipping, PostHog for learning.

Everything else is context-specific.


Sujit Chankhore is an AI Product Manager and founder based in Pune, India. Open to Senior AI PM roles globally.

Portfolio → · LinkedIn →

Written by Sujit Chankhore · AI Product Manager & Builder · LinkedIn →