So I've been vibe coding full time for a few weeks now, but I can't yet understand what is so good or worthwhile about MCP servers versus just prompting, RAG style. Can you help enlighten me?
It's a pseudo-plugin system for chatbots, specifically the popular ones (Claude, chatgpt).
It is presented as a scalable way to provide tools to LLMs but that's only if you assume every use of LLMs is via the popular chatbot interfaces, which isn't the case.
Basically it's Anthropic's idea for extending their chatbot's toolset into desktop apps such as Google drive and others who may wish to make their software capabilities integrated into chatbots as tools.
Of course as with everything in tech, especially AI related, it has been cargo-culted to be the second coming of the messiah while all nuances about its suitability/applicability is ignored.
MCP can be used as a form of context augmentation (i.e., RAG). It allows models to specify how that context augmentation is generated through tool use.
It's a formalized way of allowing developers to implement tools (using JSON-RPC) in such a way that the model is provided with a menu of tools that it can call on in each generation. The output is then included in the next generation.
I think 90% of the hype could be understood if you look at things from a non-coder's perspective. All of this tooling helps non-engineers with building AI applications, because they don't know how to code.
They don't know how to write a simple function to call a REST API, store the results in an database, etc. etc. So they need this tooling.
There's also the fact that humans love to abstract things, even when the thing they're trying to abstract already does the job fairly well (see: Kubernetes, GraphQL)
I use Tidewave which is a package for my Elixir app, and it allows the LLM to get access to some internals of my app. For example, Tidewave exposes tools to inspect the database schema, internal hex documentation for packages, introspection to see what functions are available on a module, etc.
While I’m not “vibe” coding, it is nice to be able to ask human language questions and have the LLM query the database to answer questions. Or while working on a feature, I can ask it to delete all the test records I created, etc. I can do that in a repl myself, but it’s sometimes a nice shortcut.
Note, this only runs in dev, so it’s not querying my production database or anything.
Basically, they can be a way to expose additional data or tools to the LLM.
Ignoring for a moment all of the other functions that MCP can allow an agent to do (open a webpage, query a database, run another agent, execute local commands etc) and only focussing on the use of MCP to provide context, the big advantage of MCP over RAG is that a RAG system needs to be built and maintained: you need to extract your content, vectorise it, store it in a database, query it, update it etc etc. With MCP, you just point it at your database and the agent gets up-to-date info.
Those things are not mutually exclusive. We use RAG and Vector stores to index terabyte of data.
Then use tools calls (MCP) to allow the AI to write SQL to directly query the data (vector store).
I think in that case, you would still need RAG - I can't imagine someone is going to build an MCP server to a folder of docs and even if they did, it would still need to index them, extract data etc. BUT if you were feeding your Confluence pages into RAG, then that's probably not worth doing anymore (because there is an MCP server for that).
In short, MCP servers won't make RAG obsolete, but the number of use cases is definitely lower than it was without it.
Isn't that indexing part of RAG? I've been always read about RAG as a two step process, creating and maintaining the vector database, and then using that database to feed the AI.
It’s just a new way to vibe integrate with a bunch of server data or api without hand crafting individual integrations. 90% of the hype is due to developer fomo
everything that didnt have an api i could integrate with, but does have a janky website is now something i can put into a locally-run workflow.
its not a panacea since i cant deploy it anywhere beyond my colleagues dev machines, but it enables a tone of automation that was otherwise a.big commitment, both from my team, and each of those janky website owners.
it was possible to do this website scraping before, but nobody was thinking about it in a plug and play manner
An mcp lets an agent call functions. These can in turn even issue queries to an LLM. E.g. an agent can issue natural language queries to a database by calling a function query("what is the answer to life, the universe and everything?") and the function will return "42" to the agent.
Text-to-text LLMs can only do one thing: output text.
These other capabilities that chat tools provide are actually extras built on top of the output sequence:
- reading and editing files
- searching the web
- executing commands
If your favorite chat tool (ChatGPT, Gemini, Claude, Cursor, whatever) already has all the tools you want, then you don't need to add more via an MCP server.
Note that text includes CLI commands so technically they can do anything that way. But an MCP might be able to hold state about something (eg keep an ssh connection open), and it also might be easier to teach new things than Claude itself.
I've also seen a lot of amateur ones with grandiose claims about how they enable AGI thinking ability by trying slightly harder to plan things.
I'm using Claude Code - with some MCP installed - so you would assume that whole MCP thing would work with an agentic product from the makers of this standard. In 9 out of 10 cases where a MCP would make sense to use - it doesn't know when to call the MCP. And yes, i've done all the claude.md crap. There is no transparency in this protocol about how AI would know when to call an MCP (besides direct prompting). To cut short - it's not reliable.
This is an issue with the prompt and/or the tool descriptions and/or the model.
A MCP server is really just a collection of functions the model can call, and a list of those functions and their description/input params. In theory the MCP connector is calling /get-tools on the MCP servers and injecting that into the prompt, so the model knows which tools are available, their description and input params. It's then up to the model to pick a specific tool to use.
I don't know where specifically the MCP tools are injected into the prompt or what the original system prompt is (ie: maybe it's saying "always give the internal tools priority and only use MCPs if nothing else fits").
It could be the MCP server has poor descriptions of the tool. That is what the model uses to decide to use it or not.
It could also be the model just sucks. Claude Opus/Sonnet seem to be some of the best at tool calling, but it's still up to the model to pick which tools to use. Some models are better than other. Some models start to regress in their tool calling abilities as the context window fills up.
My instinct is the MCP tools have bad descriptions. I've done a bit of reverse engineering of Claude Code and most of the tool descriptions are very detailed. "Use this tool to call a bash command" would be a bad tool description. The Claude Code bash tool description is 110 lines long containing detailed usage information, when to use it, when not to use it, example usage and etc. It also has a summary at the bottom of very important things (that were just written above in the same description) for the model not to forget (they use the word IMPORTANT and YOU MUST a lot in the prompts/descriptions)
I’m pretty new to this so curious for the answers as well. But as far as I understand a MCP server enables you to connect different applications to your vibe-coding journey. For example: keep track of your worklog, write documentation on your wiki, generate social media posts about your coding progress etc.
If you use LLM CLI tools like Claude Code you can let model just call shell commands directly instead of MCP. Or does MCP have some advantage even in the scenario?
It basically gives you the capabilities to easily extend the LLMs capabilities by providing it different kinds of tools, whether it be reading resources or performing certain update tasks.
Instead of looking at the code behind your web site, you can just have it browse the web and login to your site for itself. Instead of telling it about your database, just have it login and look at the structure itself.
If you're trying to get back into full-stack javascript or python engineering, you get to practice writing your own authentication layers and self-managing any dependencies you use for edge cases that don't make sense when you're normally working on backend.
It's great! crazy eyes all seriousness though, it's a terrible solution for the "vibe" space in terms of how careless people are about it. There are thousands of "who-knows-who-made-this" servers for major integrations out there.
It is presented as a scalable way to provide tools to LLMs but that's only if you assume every use of LLMs is via the popular chatbot interfaces, which isn't the case.
Basically it's Anthropic's idea for extending their chatbot's toolset into desktop apps such as Google drive and others who may wish to make their software capabilities integrated into chatbots as tools.
Of course as with everything in tech, especially AI related, it has been cargo-culted to be the second coming of the messiah while all nuances about its suitability/applicability is ignored.
MCP is an open protocol, and everyone half-competent has an MCP for their product/service.
RAG is a bespoke effort per implementation to vectorize data for model consumption.
It's a formalized way of allowing developers to implement tools (using JSON-RPC) in such a way that the model is provided with a menu of tools that it can call on in each generation. The output is then included in the next generation.
They don't know how to write a simple function to call a REST API, store the results in an database, etc. etc. So they need this tooling.
There's also the fact that humans love to abstract things, even when the thing they're trying to abstract already does the job fairly well (see: Kubernetes, GraphQL)
While I’m not “vibe” coding, it is nice to be able to ask human language questions and have the LLM query the database to answer questions. Or while working on a feature, I can ask it to delete all the test records I created, etc. I can do that in a repl myself, but it’s sometimes a nice shortcut.
Note, this only runs in dev, so it’s not querying my production database or anything.
Basically, they can be a way to expose additional data or tools to the LLM.
Ok, but what if you're dealing with thousands of PDFs? I thought that was the whole point (or at least, killer feature) of RAG.
In short, MCP servers won't make RAG obsolete, but the number of use cases is definitely lower than it was without it.
> Let's create new feature XYZ. Use Postgres MCP to verify the schema of relevant tables instead of making assumptions.
> Use Supabase MCP to see if user@domain.com has the correct permissions to have the Create Project button present in the UI.
NOTE: only run Supabase MCP with the --read-only, doing otherwise will lead to a bad time.
everything that didnt have an api i could integrate with, but does have a janky website is now something i can put into a locally-run workflow.
its not a panacea since i cant deploy it anywhere beyond my colleagues dev machines, but it enables a tone of automation that was otherwise a.big commitment, both from my team, and each of those janky website owners.
it was possible to do this website scraping before, but nobody was thinking about it in a plug and play manner
These other capabilities that chat tools provide are actually extras built on top of the output sequence:
- reading and editing files
- searching the web
- executing commands
If your favorite chat tool (ChatGPT, Gemini, Claude, Cursor, whatever) already has all the tools you want, then you don't need to add more via an MCP server.
I've also seen a lot of amateur ones with grandiose claims about how they enable AGI thinking ability by trying slightly harder to plan things.
A MCP server is really just a collection of functions the model can call, and a list of those functions and their description/input params. In theory the MCP connector is calling /get-tools on the MCP servers and injecting that into the prompt, so the model knows which tools are available, their description and input params. It's then up to the model to pick a specific tool to use.
I don't know where specifically the MCP tools are injected into the prompt or what the original system prompt is (ie: maybe it's saying "always give the internal tools priority and only use MCPs if nothing else fits").
It could be the MCP server has poor descriptions of the tool. That is what the model uses to decide to use it or not.
It could also be the model just sucks. Claude Opus/Sonnet seem to be some of the best at tool calling, but it's still up to the model to pick which tools to use. Some models are better than other. Some models start to regress in their tool calling abilities as the context window fills up.
My instinct is the MCP tools have bad descriptions. I've done a bit of reverse engineering of Claude Code and most of the tool descriptions are very detailed. "Use this tool to call a bash command" would be a bad tool description. The Claude Code bash tool description is 110 lines long containing detailed usage information, when to use it, when not to use it, example usage and etc. It also has a summary at the bottom of very important things (that were just written above in the same description) for the model not to forget (they use the word IMPORTANT and YOU MUST a lot in the prompts/descriptions)
Almost like an API for LLM driven actions.
It's great! crazy eyes all seriousness though, it's a terrible solution for the "vibe" space in terms of how careless people are about it. There are thousands of "who-knows-who-made-this" servers for major integrations out there.