A related idea is to have the LLM quiz you, Socratic-style about a topic of interest. It persists in asking questions at deeper levels until you arrive at the answer yourself. This forces you to think hard about a problem, and this effort helps with understanding, learning and retention. Of course I made a Socratic-quiz skill for this, to use with any coding agent or similar:
For example I’ve used this to better understand counter-intuitive things about diabetes/insulin, dopamine and motivation, Claude’s implementations, etc (to combat so-called cognitive debt).
Strong LLMs are surprisingly good at this type of quizzing, they display a semblance of “theory of mind”.
I have updated the popular /grill-me skill for this exact purpose! I had a very insightful grilling session yesterday on what exactly happens when you try to load an extremely large dataset in pandas, covering everything down to the last detail !
I’ve been using this general pattern - a custom cli app for deterministic tasks, skills for the agent harness, run the skills in the agent and it produces artifacts for you by using the cli and its own agentic reasoning - a lot lately for work. Things like “give me an executive brief of the activity in these teams backlogs over the last month” and in 5-10 minutes I have a few page doc I can read that is cited with the tickets it analyzed and I don’t have to go bug people or ask them to do yet another task for me, just make sure your backlog is updated and detailed like normal practice. It’s awesome and really fits a useful spot between pure agent usage (which is hard to get consistent results from on repeat tasks) and not having to build/buy a full blown app for every random thing.
This approach works well, I agree. But I keep wishing that I could invert it. The architecture I feel like I keep yearning for, is a traditional CLI program that encodes most workflow knowledge/decisions as real code; but which does "just a little bit of coding agent invocation" during one specific workflow step.
Not sure how to accomplish this. Anyone have any suggestions? Are there libraries for this yet? (And how would they even work? It feels like, to do this right, there would have to be some background service that CLI software could expect to interact with via a well-known local IPC socket — similar to how e.g. the docker daemon works. But I'm unaware of any coding agent software/frameworks that expose such an IPC capability...)
I’m building this! It was originally designed for human accessibility for interactive CLIs, but it turned out to be really useful for giving agents the ability to follow structured workflows.
It runs as a background terminal that the agent can observe, and then exposes all interaction options as structured commands that can be run from the foreground CLI which then update the state of the background terminal via IPC. My hope is to establish a sort of “ARIA for terminals” standard to improve accessibility for both humans and agents. Email in profile, ping me if you’re interested in giving it a spin (just have plugins for Inquirer + Commander right now, hoping to broaden to other frameworks & TUIs soon).
I reverted this due to impending billing changes, but Claude and most LLM providers to my knowledge do offer a way to directly fire a prompt to the LLM in a "headless" or non-interactive mode. Specifically "claude -p <your_prompt_here>" is the way to do it with Claude Code. It allows for using the agent to do a one-off command with a given structured prompt. Originally Lathe would use this from the Go application to allow you to extend a tutorial directly from the UI without directly interacting with the LLM.
You'd have to exec out, so it's alittle clunkier than an IPC, but I think you could achieve what you want with it.
Can you give some examples of the deterministic tasks? So in your example, was the deterministic task “fetch this team’s backlog”? And then the LLM parts are “process each backlog” and “combine a summary”?
I agree! I want to say I first saw this pattern in some work Simon Willison did (Rodney and Showboat). For certain workflows the pair of Skills + CLI give me a nice balance between the flexibility of LLMs and the consistency of a CLI.
Cool project! I'll be trying it out. I've been a big fan of throwing whatever sources I have on a new topic i'm trying to get into into a llm "project" and then asking it to teach me, grounded on the actual content to speed things up.
But at the same time, I'm afraid getting everything laid out for you in exactly the way you want will erode some of the understanding you build by going through a primary source directly and figuring things out the hard way. So this having more focus on actually doing stuff by yourself seems right up my alley (while still tending to the LLM induced intellecutal laziness... ) .
What I'm more looking at is your own experience with a vibed tool. I cannot really tell from this introduction whether you actually use and like it (you mentioned you use it and sometimes push back, which is a learning strategy of its own?)
Also, I wouldn't say "have another model test the tutorial compiles" a feature, but also I do not expect a fool-proof tutorial from a one-shot, I guess.
Not sure why I would try this over a hand-written promot. Also wondering why ChatGPT Study mode failed, it seemed interesting.
I've been using it quite a bit and I like it a lot! You certainly could roll your own prompt for this. The value I'm seeing is in the reusable skill/prompt to structure tutorials in a way that help me think and learn a new concept (rather than Claude just giving me code to copy/paste), and the local UI that makes working through the tutorial much more pleasant than scrolling through Claude's markdown output. Plus tutorial series are persistent so I can easily come back around later with a `/lathe-extend` to explore an extension to a topic/tutorial I'm interested in.
That said, it's been a tool that's been helpful for me personally, but doesn't have to be for everyone! I've never used ChatGPT Study, I'll look into it more. Thanks for sharing!
This is a very cool idea, feels like a sane way to use LLMs in this crazy time! Could be a very good way to break the ice when starting a new project and everything is friction.
Yea that’s definitely been a primary usecase for me! Easing the barrier to entry into a new project, and giving me the foundation to take it further on my own once I’m comfortable.
Did you write the skill.md files yourself? I often wonder this; there’s so much text in most skills, and I can’t imagine it’s human generated.
I don’t write my own - I can’t optimize for the models understanding, and so I just give the skill-creator skill an outline and then have it refine until the output is what I want.
Thanks for checking it out! ollama wasn't top of my list for support, just because I don't have a machine powerful enough to run decent local LLMs (I wish I did!). I'll look into it though, nothing here should be locked in to any one LLM, as long as it has the concept of a skill/slash command/reusable prompt.
Someone else asked about Gemini, so I think broader LLM support will be my focus for v0.4.0
https://pchalasani.github.io/claude-code-tools/plugins-detai...
For example I’ve used this to better understand counter-intuitive things about diabetes/insulin, dopamine and motivation, Claude’s implementations, etc (to combat so-called cognitive debt).
Strong LLMs are surprisingly good at this type of quizzing, they display a semblance of “theory of mind”.
The harder questions will only arrive when the context is getting full.
Not sure how to accomplish this. Anyone have any suggestions? Are there libraries for this yet? (And how would they even work? It feels like, to do this right, there would have to be some background service that CLI software could expect to interact with via a well-known local IPC socket — similar to how e.g. the docker daemon works. But I'm unaware of any coding agent software/frameworks that expose such an IPC capability...)
It runs as a background terminal that the agent can observe, and then exposes all interaction options as structured commands that can be run from the foreground CLI which then update the state of the background terminal via IPC. My hope is to establish a sort of “ARIA for terminals” standard to improve accessibility for both humans and agents. Email in profile, ping me if you’re interested in giving it a spin (just have plugins for Inquirer + Commander right now, hoping to broaden to other frameworks & TUIs soon).
You'd have to exec out, so it's alittle clunkier than an IPC, but I think you could achieve what you want with it.
But at the same time, I'm afraid getting everything laid out for you in exactly the way you want will erode some of the understanding you build by going through a primary source directly and figuring things out the hard way. So this having more focus on actually doing stuff by yourself seems right up my alley (while still tending to the LLM induced intellecutal laziness... ) .
Also, I wouldn't say "have another model test the tutorial compiles" a feature, but also I do not expect a fool-proof tutorial from a one-shot, I guess.
Not sure why I would try this over a hand-written promot. Also wondering why ChatGPT Study mode failed, it seemed interesting.
That said, it's been a tool that's been helpful for me personally, but doesn't have to be for everyone! I've never used ChatGPT Study, I'll look into it more. Thanks for sharing!
I don’t write my own - I can’t optimize for the models understanding, and so I just give the skill-creator skill an outline and then have it refine until the output is what I want.
If it does find some, maybe it could supplement them instead of just from scratch
Someone else asked about Gemini, so I think broader LLM support will be my focus for v0.4.0