[back to writing]

mcps

AI • mcp • agents

My understanding of MCP is a field of finer discoveries as this is an evolving field.

Here are some gaps I had to fill for myself; obvious to others but clearly I need reinforcements. Essentially I thought mcps were simply just servers (running on your own system along with the client) where tools calls are provided in the tool schema that are really just wrapping APIs; they’re wrapping RPC calls that are simpler functions that a http explosed http client. (I could still be wrong on this)

When I’m teaching myself concepts I half already understand its simpler for me to dump all my questions into the llm chat on the page itself and start filling in gaps without starting from a new baseline; the idea is to reinforce my assumptions.

Before we begin, its helpful to have a diagram of what the predominant flow looks like in mcp client server architecture.

sequenceDiagram
    actor User
    participant Client as Client App<br/>(Agent Harness i.e Codex, ClaudeCode)
    participant LLM as LLM
    participant MCP as MCP Server<br/>(Tools API)

    User->>Client: Ask question (e.g. "Weather in Austin, TX?")
    Client->>LLM: Prompt + tool schemas<br/>(MCP tools as callable functions)

    note right of LLM: LLM generates output tokens

    LLM-->>Client: Normal text tokens<br/>("I'll look up the weather...")
    LLM-->>Client: <tool_call> token<br/>+ JSON tool call<br/>{ name, arguments }<br/><end_tool_call>

    Client->>Client: Parse tool call JSON
    Client->>MCP: Invoke tool RPC<br/>get_current_weather({ location })

    MCP-->>Client: Tool result JSON<br/>{ location, temperature, ... }

    Client->>LLM: <tool_result> token<br/>+ result JSON<br/><end_tool_result>

    note right of LLM: LLM reads tool result<br/>as part of its next input

    LLM-->>Client: Final natural language answer
    Client-->>User: "It's 93°F and sunny in Austin, TX."

The tools seem simple. So don’t llms confuse themselves by having tools from multiple mcps calling create_task? They have context based on what you might be already chatting on, can confirm with the user if its not clear, or make assumptions (tool naming collision). They are also called by the llm with namespaces so linear.create_task vs simply create_task.

Some benefits of MCPs are they don’t create walled garden clients. Knowing several providers are already providing interopable MPC servers to talk to any client (desktop apps such as codex or claude deskop, terminal based claude code) most providers will not create a client that is locked to their own ecosystem.

What is an MCP server in practice, when I “add MCPs” to something like Codex or Claude Code? When you add MCPs to an agent harness, you’re registering MCP servers. Each server exposes tools and resources via a JSON‑RPC protocol (‎⁠tools/list⁠, ‎⁠tools/call⁠, ‎⁠resources/*⁠, etc.). The harness connects to each server, pulls its schema, and hands that to the LLM as: “Here are the capabilities you can use.” Auth is handled separately (API keys, OAuth, env vars), and once the server is reachable, the model only sees named tools with JSON schemas.

What does an MCP schema actually look like?

Here is an example of what the tools section that is being registered might look like simply on defining how to confirm your vacation details. Now imagine this for many usecases for the same service. Fairly verbose.

{
  method: "elicitation/requestInput",
  params: {
    message: "Please confirm your Barcelona vacation booking details:",
    schema: {
      type: "object",
      properties: {
        confirmBooking: {
          type: "boolean",
          description: "Confirm the booking (Flights + Hotel = $3,000)"
        },
        seatPreference: {
          type: "string",
          enum: ["window", "aisle", "no preference"],
          description: "Preferred seat type for flights"
        },
        roomType: {
          type: "string",
          enum: ["sea view", "city view", "garden view"],
          description: "Preferred room type at hotel"
        },
        travelInsurance: {
          type: "boolean",
          default: false,
          description: "Add travel insurance ($150)"
        }
      },
      required: ["confirmBooking"]
    }
  }
}

Why does registering MCP servers eat so much context compared to “skills”? The schema text; tool lists, descriptions, JSON schemas from MCPs as of today are being loaded into every prompt so having all this capability upfront if you don’t plan to use it is eating up your tokens. Skills are designed to be on demand recepies but if they’re using CLIs, you need to be authorized prior. They can also use MCPs but unclear on how authorization works in this case.

The Valtown MCP article by Steve has some really great insights on MCPs benefits overall.