6 minute read

The Model Context Protocol (MCP) ecosystem is exploding with specialized tools—weather lookups, file managers, database connectors, and dozens more. But what if we’re approaching this all wrong? What if instead of creating 30 different tools, we should give AI agents access to something much more powerful: the ability to write and execute code directly?

This isn’t just a theoretical idea. Developers are already building MCP servers that expose programming languages as tools, and the results are surprisingly effective. Here’s why this approach might be the future of AI tool integration.

The Problem with Tool Proliferation

Traditional MCP implementations face several fundamental challenges:

Platform Dependencies

Command-line tools often break across different operating systems or versions. An AI agent that works perfectly on macOS might fail completely on Windows because of subtle differences in how tools handle arguments or file paths.

Context Limitations

Each tool requires its own documentation, parameters, and examples. With 30 tools, you’re consuming massive amounts of context just describing what’s available—leaving less room for actual problem-solving.

State Management Issues

CLI tools are stateless by design. If an AI agent needs to run a multi-step debugging session (like using LLDB to debug a crash), it has to remember session state across multiple tool calls. This leads to agents losing track of their work and starting over.

Composability Problems

Real-world tasks require combining multiple tools in creative ways. But when each tool is isolated, agents struggle to chain operations effectively. They can’t easily pipe output from one tool to another or maintain shared state across operations.

The Code-First Alternative

Instead of exposing dozens of specialized tools, what if we exposed a single powerful tool: a programming language interpreter? This is exactly what projects like pexpect-mcp are doing.

Here’s the core idea:

{
  "name": "python_executor",
  "description": "Execute Python code in a persistent session. The session maintains state between calls, and you have access to all standard libraries plus specialized tools like pexpect for process interaction.",
  "parameters": {
    "code": {"type": "string", "description": "Python code to execute"}
  }
}

That’s it. One tool. But this single tool is incredibly powerful because:

  1. Persistent State: Variables and objects persist between calls
  2. Full Language Access: All the programming patterns the AI learned during training
  3. Library Ecosystem: Access to thousands of Python packages
  4. Composability: Natural ability to combine operations in code

Real-World Example: Debugging with Code

Consider debugging a crashing C program. With traditional MCP tools, you’d need:

  • An LLDB tool
  • A file reading tool
  • A process spawning tool
  • A session management tool
  • Error handling across all of them

With a code-first approach, the AI agent simply writes Python:

import pexpect

# Start LLDB with the crashing program
child = pexpect.spawn('lldb ./demo-buggy')
child.expect('(lldb)')

# Set a breakpoint and run
child.sendline('breakpoint set -n main')
child.expect('(lldb)')
child.sendline('run')
child.expect('(lldb)')

# Examine the crash
child.sendline('bt')
child.expect('(lldb)')
backtrace = child.before.decode()
print(f"Backtrace: {backtrace}")

This approach has several advantages:

Natural Patterns

The AI doesn’t need to learn new tool APIs—it’s writing standard Python code using libraries it already understands from training.

Error Handling

Python’s exception handling works naturally. If something goes wrong, the AI can catch exceptions and adapt its approach.

Reusability

After solving a problem once, the AI can easily convert its solution into a standalone script that humans can run independently.

Debugging Capability

The AI can inspect its own state using standard Python debugging techniques like print(), dir(), and repr().

Beyond Python: JavaScript for Web Automation

The same principle applies to other domains. Instead of exposing 30 Playwright tools for web automation, expose a JavaScript interpreter with Playwright available:

// Navigate to a page and extract data
const page = await browser.newPage();
await page.goto('https://example.com');

// Collect data from multiple pages
const results = [];
for (let i = 1; i <= 10; i++) {
  await page.goto(`https://example.com/page/${i}`);
  const data = await page.evaluate(() => {
    return document.querySelector('.content').textContent;
  });
  results.push(data);
}

// Results automatically available for next operations
console.log(`Collected ${results.length} items`);

This eliminates the need for separate tools for navigation, data extraction, iteration, and state management.

The Security Question

Critics often raise security concerns about giving AI agents the ability to execute arbitrary code. But this misses a key point: we’re already doing this.

When we give an AI agent access to:

  • File system tools
  • Network tools
  • Process spawning tools
  • Database tools

We’ve essentially given it the ability to do anything code execution could do, just in a more cumbersome way.

The real security comes from proper sandboxing and monitoring, not from restricting the interface. Whether an AI agent deletes your files through a “file deletion tool” or through Python’s os.remove() doesn’t matter—the key is preventing unauthorized file deletion entirely.

Implementation Patterns

Building code-first MCP servers is surprisingly straightforward:

Basic Python Executor

class PythonMCP:
    def __init__(self):
        self.globals = {}  # Persistent state
    
    def execute_code(self, code, timeout=30):
        try:
            result = eval(code, self.globals)
            return {"success": True, "result": str(result)}
        except Exception as e:
            return {"success": False, "error": str(e)}

Enhanced with Libraries

class EnhancedPythonMCP:
    def __init__(self):
        self.globals = {
            'pexpect': pexpect,
            'requests': requests,
            'json': json,
            # Pre-load commonly used libraries
        }
    
    def execute_code(self, code, timeout=30):
        # Add timeout handling, output capture, etc.
        pass

When Code-First Makes Sense

This approach works particularly well for:

Development Tasks

  • Debugging applications
  • Running tests
  • Code analysis
  • Build automation

Data Processing

  • ETL operations
  • Analysis workflows
  • Report generation

System Administration

  • Log analysis
  • Monitoring setup
  • Configuration management

Research and Exploration

  • API discovery
  • Data investigation
  • Prototype development

Limitations and Considerations

The code-first approach isn’t perfect:

Learning Curve

AI agents need to understand the available libraries and their APIs. For well-known libraries like requests or pexpect, this works great. For custom or obscure libraries, more guidance is needed.

Verbosity

Code can be more verbose than specialized tool calls. A simple weather lookup might be longer in Python than calling a dedicated weather tool.

Context Usage

While one tool description is simpler than thirty, the actual code calls can be longer and use more context.

The Future of AI Tool Integration

The success of code-first MCP implementations suggests a broader principle: instead of creating ever-more-specialized tools, we should focus on giving AI agents access to powerful, general-purpose capabilities they already understand.

This doesn’t mean abandoning specialized tools entirely. But it does mean being thoughtful about when to create a new tool versus when to leverage programming languages that AI agents already know well.

The most powerful AI systems of the future won’t be those with the most tools—they’ll be those that can write and execute code to solve problems we haven’t even thought of yet.

Getting Started

If you want to experiment with code-first MCP:

  1. Try existing implementations: Projects like pexpect-mcp provide working examples
  2. Start simple: Begin with basic Python execution and add libraries as needed
  3. Focus on state: The persistent session is what makes this approach powerful
  4. Think about sandboxing: Use containers or VMs to limit blast radius

The paradigm shift from “many specialized tools” to “one powerful programming interface” represents a fundamental change in how we think about AI tool integration. It’s worth exploring, especially as AI agents become more sophisticated and capable of handling complex, multi-step tasks.


This approach to MCP design prioritizes the programming capabilities that AI agents already possess, rather than forcing them to learn dozens of new tool interfaces. The results suggest that sometimes the best tool is the ability to write tools.

Comments