Embracing ProseMirror: Why other rich text editors felt like fighting the framework.

If you've ever tried to build a rich text editor into a web app, you already know the feeling. You find a library, follow the quickstart, marvel at your beautiful bold text, and then three weeks later you're reading GitHub issues at 2 AM trying to figure out why pasting from Google Docs inserts seven invisible zero-width spaces and your cursor teleports into the void. Welcome to contentEditable.

Rich text editing on the web is a problem that has humbled billion-dollar companies. Google built their own rendering engine for Docs because they gave up on contentEditable. Notion built a custom block system on top of it. Slack rewrote their editor so many times they probably considered carrier pigeons. And yet here we are, in 2025, still debating which library wraps this cursed browser API most gracefully.

I've spent an unreasonable amount of time in this space — evaluating, prototyping, and eventually building a block-based, Notion-style editor. I also needed something that could work on React Native, sharing the document model and core editing logic between web and mobile. After looking at every reasonable option (and a few unreasonable ones), I landed on ProseMirror. Not because it was easy. Because it was right.

The Landscape: A Buffet of Trade-offs

Let me set the scene. If you want to build a rich text editor in JavaScript today, you're choosing between roughly three tiers.

The batteries-included tier gives you BlockNote, CKEditor, TinyMCE, or Editor.js. You drop in a component, tweak some config, and get a working editor with toolbars and slash menus. This is great for content management systems where the editing experience isn't your core product. It's less great when your PM asks "can we make the drag handle do something slightly different" and the answer is "no, not without forking the library."

The mid-abstraction tier gives you Tiptap, Remirror, Plate, and Lexical. These wrap a lower-level engine (ProseMirror for Tiptap and Remirror, Slate for Plate, and Lexical is its own thing from Meta). They provide extension systems, React integrations, and a developer experience that won't make you cry on day one. Tiptap is probably the most popular choice here, and for good reason — it's well-documented and genuinely pleasant to use.

The engine tier is ProseMirror and Slate. These are the foundations that the mid-tier libraries build on. No UI. No toolbar. No opinions about what your editor should look like. Just a document model, a state management system, and a rendering layer.

Most people would — and probably should — start with Tiptap or BlockNote. So why did I go straight to the engine?

The Moment Tiptap Stopped Being Enough

I don't want to bash Tiptap. I genuinely like it. The extension API is elegant, the documentation is solid, and for 80% of rich text editing use cases, it's the right call. Their StarterKit gets you headings, lists, code blocks, and basic marks in roughly six lines of code:

import { useEditor } from '@tiptap/react'
import StarterKit from '@tiptap/starter-kit'

const editor = useEditor({
  extensions: [StarterKit],
  content: '<p>Hello, world.</p>',
})

Beautiful. Ship it. Go home early.

But I wasn't building "80% of use cases." I was building a block-based editor where every block is a first-class entity with its own type, attributes, drag handles, nesting behavior, and potentially a completely custom rendering pipeline. Think Notion, but with domain-specific block types that don't exist in any extension marketplace.

With Tiptap, the moment you step outside the extension system, you're writing ProseMirror code anyway. You're defining NodeSpec objects, writing plugin StateField logic, building custom NodeView classes. Tiptap's abstraction becomes a middleman — sometimes helpful, sometimes just another layer to debug through. When I found myself routinely reaching past Tiptap into its ProseMirror internals, I realized I was paying the complexity cost of two APIs without the benefit of either.

The Liveblocks team put it perfectly in their editor comparison: "Unless you're a purist, masochist, or both, we recommend starting with one of the excellent ProseMirror-based editors." Fair enough. But if you're going to be modifying those editors at the ProseMirror level anyway — and you will, if your editor is ambitious enough — you might as well understand what's underneath.

What Makes ProseMirror Different

The thing that separates ProseMirror from everything else is its architecture. Most editors give you an API shaped like "call this function and stuff happens." ProseMirror gives you a system shaped like "here's how documents work, here's how changes work, here's how rendering works — now compose them however you want."

The core modules tell you everything:

prosemirror-model — The document is a tree of typed nodes, described by a schema you define. No magic, no guessing. You say "a doc contains block+ nodes, a paragraph contains inline* content, a heading has a level attribute from 1 to 6." The schema enforces this at all times.
prosemirror-state — Editor state is immutable. You don't mutate the document — you create a Transaction describing what changed, and apply it to get a new state. If this sounds like Redux, that's because it basically is. Marijn Haverbeke figured out unidirectional data flow for editors before half the React ecosystem caught on.
prosemirror-view — The view renders state to the DOM and translates DOM events back into transactions. It's the bridge between your pure data model and the browser's chaotic contentEditable behavior.
prosemirror-transform — Steps are the atomic unit of document changes. They can be serialized, sent over a wire, rebased, and replayed. This is how you get collaborative editing — the same model that Google Docs uses (operational transformation) is baked into the architecture.

Here's what a basic ProseMirror schema for a block-based editor looks like:

import { Schema } from 'prosemirror-model'

const schema = new Schema({
  nodes: {
    doc: { content: 'block+' },
    paragraph: {
      group: 'block',
      content: 'inline*',
      toDOM: () => ['p', 0],
      parseDOM: [{ tag: 'p' }],
    },
    blockquote: {
      group: 'block',
      content: 'block+',
      toDOM: () => ['blockquote', 0],
      parseDOM: [{ tag: 'blockquote' }],
    },
    heading: {
      group: 'block',
      content: 'inline*',
      attrs: { level: { default: 1, validate: 'number' } },
      toDOM: (node) => [`h${node.attrs.level}`, 0],
      parseDOM: [1, 2, 3, 4].map((level) => ({
        tag: `h${level}`,
        attrs: { level },
      })),
    },
    text: { group: 'inline' },
  },
  marks: {
    bold: {
      toDOM: () => ['strong', 0],
      parseDOM: [{ tag: 'strong' }],
    },
    italic: {
      toDOM: () => ['em', 0],
      parseDOM: [{ tag: 'em' }],
    },
    link: {
      attrs: { href: { validate: 'string' } },
      toDOM: (mark) => ['a', { href: mark.attrs.href }, 0],
      parseDOM: [{ tag: 'a[href]', getAttrs: (dom) => ({
        href: dom.getAttribute('href'),
      })}],
    },
  },
})

Yes, this is more code than extensions: [StarterKit]. Substantially more. But look at what you get: a complete, explicit description of every node your document can contain, how they nest, how they serialize to the DOM, and how they're parsed back. Nothing is hidden behind a magic extension. Nothing surprises you at runtime.

The Plugin System: Where It Gets Good

ProseMirror's plugin system is where the architecture really shines. A plugin can:

Maintain its own state field that updates with every transaction
Intercept and filter transactions before they're applied
Append follow-up transactions after an edit
Add decorations (visual overlays that don't touch the document)
Provide custom NodeView renderers for specific node types

Here's a plugin that tracks how many times the user has typed a word, just to show the pattern:

import { Plugin, PluginKey } from 'prosemirror-state'

const wordCountKey = new PluginKey('wordCount')

const wordCountPlugin = new Plugin({
  key: wordCountKey,
  state: {
    init(_, state) {
      return countWords(state.doc)
    },
    apply(tr, value, _, newState) {
      if (tr.docChanged) {
        return countWords(newState.doc)
      }
      return value
    },
  },
})

function countWords(doc) {
  let count = 0
  doc.descendants((node) => {
    if (node.isText) {
      count += node.text.split(/\s+/).filter(Boolean).length
    }
  })
  return count
}

// Later, anywhere you have access to state:
const count = wordCountKey.getState(editorState)

The state field in a plugin is the key insight. Each plugin gets its own slice of the state, updated on every transaction. This is how you build features like change tracking, cursor position indicators, selection-aware toolbars, or collaborative editing awareness — as composable, isolated units that don't interfere with each other.

Compare this to Lexical, where extending behavior means subclassing node types and registering update listeners that fire imperatively. Or Slate, where plugins are essentially middleware functions that mutate the editor object. ProseMirror's approach is more verbose, but it's also more predictable. You always know when state changes, why it changed, and what the new state looks like.

NodeViews: Custom Rendering Without the Pain

When you need a block that renders as something more than just HTML — say, an embedded code editor, an image with resize handles, or a custom widget — ProseMirror gives you NodeView. A NodeView is an object that takes over rendering for a specific node type. You provide a dom element, optionally a contentDOM for editable content, and lifecycle hooks for updates and cleanup.

class ImageView {
  dom: HTMLElement
  
  constructor(node, view, getPos) {
    this.dom = document.createElement('div')
    this.dom.className = 'image-block'
    
    const img = document.createElement('img')
    img.src = node.attrs.src
    img.alt = node.attrs.alt || ''
    this.dom.appendChild(img)
    
    const caption = document.createElement('figcaption')
    caption.textContent = node.attrs.caption || ''
    this.dom.appendChild(caption)
  }
  
  update(node) {
    if (node.type.name !== 'image') return false
    // Update the view to match the new node
    return true
  }
  
  destroy() {
    // Cleanup
  }
}

This is also how you plug React components into ProseMirror nodes — your NodeView creates a DOM container, and you createRoot a React component into it. It's not zero-config, but it gives you complete control over the rendering lifecycle without fighting the framework.

Tiptap does offer this through its ReactNodeViewRenderer, and it works well for simple cases. But when you need fine-grained control over update batching, selection handling inside custom views, or interop between ProseMirror's transaction system and React's state — you end up needing to understand the NodeView protocol anyway.

Why Not Slate or Lexical?

I evaluated both seriously.

Slate has an elegant API, and its React integration is genuinely nice. The Plate ecosystem on top of it is impressively comprehensive. But Slate's document model is a plain JSON tree with no schema enforcement at the framework level. You can create structurally invalid documents, and Slate will happily render them until something breaks in a confusing way. Slate also doesn't have ProseMirror's step-based transform system, which means collaborative editing requires bolting on something like slate-yjs and hoping for the best.

Lexical is backed by Meta and has strong momentum. But it's still pre-1.0, and it shows. The lack of pure decorations — visual overlays that don't modify the document — is a real limitation. Want collaborative cursors? You're manually positioning div elements on top of the text and recalculating their position on scroll and resize. ProseMirror solves this with Decoration.widget and Decoration.inline, which are anchored to document positions and reflow automatically. That's not a minor convenience — it's an architectural difference that affects every feature you build.

Lexical also hardcodes assumptions about document structure (like the root node name) that make it difficult to have multiple editor instances in one Yjs document. When Liveblocks spent months integrating with Lexical, their takeaway was diplomatic but clear: it "needs more time to mature."

Quill deserves a mention for its 47k GitHub stars and its use by companies like Slack and Figma. But it also lacks pure decorations, and its plugin ecosystem hasn't fully caught up to the Quill 2 rewrite. For a greenfield project in 2025, there are better options.

The React Native Angle

Here's a benefit of ProseMirror that nobody talks about enough: its document model runs anywhere JavaScript runs.

prosemirror-model and prosemirror-state have zero DOM dependencies. You can use them in Node.js, in a Web Worker, or on React Native. The schema, the document tree, transactions, plugins with state fields — all of it works without a browser.

For my project, this was decisive. I'm building a mobile editor in React Native that needs to share the exact same document model and business logic as the web editor. The view layer is different (React Native's text rendering isn't contentEditable, obviously), but the entire model layer — schema validation, transaction logic, plugin state, even collaborative editing transforms — is shared code.

Try doing that with Tiptap, which is inherently coupled to prosemirror-view and the DOM. Or with Lexical, which has an iOS package but a completely separate architecture. ProseMirror's clean separation of model and view isn't just good architecture for its own sake — it's a genuine technical advantage when you need to target multiple platforms.

Building a Block-Based Editor

The Notion-style block editor is essentially the "can it run Doom" test for rich text frameworks. Every block is a draggable, nestable entity. You need slash commands, drag handles, block type switching, and a document model that treats blocks as first-class citizens rather than just "paragraphs with extra steps."

In ProseMirror, blocks are just nodes with a group: 'block' designation. Nesting is handled by content expressions — a toggle_list node might have content: 'list_item+', and a list_item might have content: 'block+', which lets you nest blocks arbitrarily. Drag-and-drop is a transaction that removes a node from one position and inserts it at another, with the schema validating the result.

The command system ties it together:

import { Command } from 'prosemirror-state'
import { setBlockType } from 'prosemirror-commands'

const makeHeading = (level: number): Command => {
  return (state, dispatch) => {
    const { heading } = state.schema.nodes
    return setBlockType(heading, { level })(state, dispatch)
  }
}

const turnIntoCodeBlock: Command = (state, dispatch) => {
  const { code_block } = state.schema.nodes
  return setBlockType(code_block)(state, dispatch)
}

Commands follow a simple protocol: take state and an optional dispatch function, return true if applicable. When called without dispatch, they do a dry run — "could I apply this?" — which is exactly how you determine whether a toolbar button should be active or disabled. No special "can I do this?" API needed. The command pattern just handles it.

The Learning Curve Is Real (And Worth It)

I won't pretend ProseMirror is easy to learn. The first time you look at ResolvedPos, ContentMatch, or the difference between from, to, anchor, and head in a selection, your brain will protest. The documentation is thorough but dense — Marijn Haverbeke writes documentation the way he writes code: precisely, with no filler, and with the expectation that you'll read it carefully.

But here's the thing: every hour you spend learning ProseMirror is an hour invested in understanding rich text editing fundamentals. The concepts aren't ProseMirror-specific — they're the concepts. Document trees. Schema constraints. Immutable state transitions. Position mapping through changes. Operational transforms. These ideas show up in every serious editor, because they're the only ideas that work at scale.

When you learn Tiptap, you learn Tiptap's API. When you learn ProseMirror, you learn how rich text editors work. And when Tiptap or Lexical or whatever comes next inevitably does something weird, you'll know immediately whether the bug is in the abstraction layer or in the underlying model — because you understand the underlying model.

Should You Use ProseMirror?

Probably not, if we're being honest. If you need a rich text editor for a CMS, blog, or internal tool, use Tiptap. It's excellent. If you want a Notion-style block editor and don't want to build one from scratch, use BlockNote. If your company already uses Lexical because a Meta engineer on the team evangelized it, that's fine too.

Use ProseMirror directly if:

Your editor is a core product feature, not a commodity
You need custom block types that don't map to standard extensions
You need the document model to work outside the browser (server-side processing, mobile apps)
You're building collaborative features and want to understand the transform pipeline
You've already outgrown Tiptap and are spending more time fighting its abstractions than using them

The ProseMirror ecosystem is nothing flashy. There's no VC-funded company behind it, no marketing team, no cloud product. There's just Marijn Haverbeke, an excellent set of modules, stellar documentation, and an active community on the discussion forum. It's quietly powering some of the most sophisticated editors on the web — including the ones that get all the credit for "making rich text editing easy."

Every abstraction has a cost. Sometimes the cost is worth paying. Sometimes you're better off understanding the machine and building exactly what you need. For me, ProseMirror was the second kind of choice. It wasn't the easy path, but it was the one where I actually understood every line of code in my editor — and that's made all the difference when things inevitably go sideways at 2 AM.