34 comments

  • gbnwl 1 hour ago
    I'm not sure how many HN users frequent other places related to agentic coding like the subreddits of particular providers, but this has got to be the 1000th "ultimate memory system"/break-free-of-the-context-limit-tyranny! project I've seen, and like all other similar projects there's never any evidence or even attempt at measuring any metric of performance improved by it. Of course it's hard to measure such a thing, but that's part of exactly why it's hard to build something like this. Here's user #1001 that's been told by Claude "What a fascinating idea! You've identified a real gap in the market for a simple database based memory system to extend agent memory."
    • austinbaggio 1 hour ago
      Which of the 1000 is your favorite? There does seem to be a shallow race to optimizing xyz benchmark for some narrow sliver of the context problem, but you're right, context problem space is big, so I don't think we'll hurry to join that narrow race.
      • gbnwl 1 hour ago
        | Which of the 1000 is your favorite?

        None, that's what I'm trying to say. My favorite is just storing project context locally in docs that agents can discover on their own or I can point to if needed. This doesn't require me to upload sensitive code or information to anonymous people's side projects and has and equivalent amount of hard evidence for efficacy (zero), but at least has my own anecdotal evidence of helping and doesn't invite additonal security risk.

        People go way overboard with MCPs and armies of subagents built on wishes and unproven memory systems because no one really knows for sure how to get past the spot we all hit where the agentic project that was progressing perfectly hits a sharp downtrend in progress. Doesn't mean it's time to send our data to strangers.

    • AndyNemmity 1 hour ago
      The funny part is, the vast majority of them are barely doing anything at all.

      All of these systems are for managing context.

      You can generally tell which ones are actually doing something if they are using skills, with programs in them.

      Because then, you're actually attaching some sort of feature to the system.

      Otherwise, you're just feeding in different prompts and steps, which can add some value, but okay, it doesn't take much to do that.

      Like adding image generation to claude code with google nano banana, a python script that does it.

      That's actually adding something claude code doesn't have, instead of just saying "You are an expert in blah"

      • austinbaggio 57 minutes ago
        It sounds like you've used quite a few. What programs are you expecting? Assuming you're talking about doing some inference on the data? Or optimizing for some RAG or something?
        • AndyNemmity 55 minutes ago
          An example of a skill i gave, adding image generation to nano banana.

          another is one claude code ships with, using rip grep.

          Those are actual features. It's adding deterministic programs that the llm calls when it needs something.

          • austinbaggio 26 minutes ago
            Oh got it - tool use
            • AndyNemmity 24 minutes ago
              Exactly. That adds actual value. Some of the 1000s of projects do this. Those pieces add value, if the tool adds value which also isn’t a given
    • Forgeties79 1 hour ago
      Have you tried using it? Not being flippant and annoying. Just curious if you tried it and what the results were
    • johnnyfived 29 minutes ago
      I imagine HN, despite being full of experts and vet devs, also might have a prevalent attitude of looking down on using tools like MCP servers or agentic AI libraries for coding, which might be why something like this advertised seems novel rather than redundant.
  • ramoz 2 hours ago
    I struggle with these abstractions over context windows, esp when anthropic is actively focused on improving things like compaction, and knowing the eventual* goal is for the models to yave real memory layers baked in. Until then we have to optimize with how agents work best and ephemeral context is a part of that (they weren’t RL’d/trained with memory abstractions so we shouldn’t use them at inference either). Constant rediscovery that is task specific has worked well for me, doesn’t suffer from context decay, though it does eat more tokens.

    Otherwise the ability to search back through history is a valuable simple git log/diff or (rip)grep/jq combo over the session directory. Simple example of mine: https://github.com/backnotprop/rg_history

    • AndyNemmity 2 hours ago
      There is certainly a level where at any time you could be building some abstraction that is no longer required in a month, or 3.

      I feel that way too. I have a lot of these things.

      But the reality is, it doesn't really happen that often in my actual experience. Everyone is very slow as a whole to understand what these things mean, so far you get quite a bit of time just with an improved, customized system of your own.

      • ramoz 2 hours ago
        My somewhat naive heuristic would be that memory abstractions are a complete mistep in terms of optimization. There is no "super claude mem" or "continual claude" until there actually is.

        https://backnotprop.com/blog/50-first-dates-with-mr-meeseeks...

        • AndyNemmity 2 hours ago
          I tend to agree with you, however compacting has gotten much worse.

          So... it's tough. I think memory abstractions are generally a mistake, and generally not needed, however I also think that compacting has gotten so wrong recently that they are also required until Claude Code releases a version with improved compacting.

          But I don't do memory abstraction like this at all. I use skills to manage plans, and the plans are the memory abstraction.

          But that is more than memory. That is also about having a detailed set of things that must occur.

          • ramoz 2 hours ago
            I’m interested to see your setup.

            I think planning is a critical part of the process. I just built https://github.com/backnotprop/plannotator for a simple UX enhancement

            Before planning mode I used to write plans to a folder with descriptive file names. A simple ls was a nice memory refresher for the agent.

            • AndyNemmity 1 hour ago
              I understand the use case for plannotator. I understand why you did it that way.

              I am working alone. So I am instead having plans automatically update. Same conception, but without a human in the mix.

              But I am utilizing skills heavily here. I also have a python script which manages how the LLM calls the plans so it's all deterministic. It happens the same way every time.

              That's my big push right now. Every single thing I do, I try to make as much of it as deterministic as possible.

  • ossa-ma 2 hours ago
    There are a quadrillion startups (mem0, langmem, zep, supermemory), open source repos (claude-mem, beads), and tools that do this.

    My approach is literally just a top-level, local, git version controlled memory system with 3 commands:

    - /handoff - End of session, capture into an inbox.md

    - /sync - Route inbox.md to custom organised markdown files

    - /engineering (or /projects, /tasks, /research) - Load context into next session

    I didn't want a database or an MCP server or embeddings or auto-indexing when I can build something frictionless that works with git and markdown.

    Repo: https://github.com/ossa-ma/double (just published it publicly but its about the idea imo)

    Writeup: https://ossa-ma.github.io/blog/double

    • bl4ckneon 2 hours ago
      The extention Cline has a "memory bank" feature. It's just a markdown you add as an instruction. Works well for me. Worked with agents.md as well so not just with the Cline extention. Pretty much the same idea.
    • fastball 1 hour ago
      What is the purpose of a separate /handoff and /sync command? It seems like handoff could just write learnings straight to their final destinations without needing an .inbox.md buffer in-between.
      • ossa-ma 33 minutes ago
        I like to read and review what was captured in .inbox.md before it is committed and synced across my knowledge base. Allows me to catch mistakes, tweak preferences, add context and decide whether something is actually worth pushing.

        I will typically make multiple '/handoff's per day as I use Claude code whereas I typically use '/sync' at the end of the day to organise them all at once.

    • AndyNemmity 2 hours ago
      Your approach essentially matches mine, but I call them plans. I agree with you that the other tools don't seem to add any value compared to this structure.

      I think at this point in time, we both have it right.

  • JoshGlazebrook 3 hours ago
    Is anyone else just completely overwhelmed with the number of things you _need_ for claude code? Agents, sub agents, skills, claud.md, agents.md, rules, hooks, etc.

    We use Cursor where I work and I find it a good medium for still being in control and knowing what is happening with all of the changes being reviewed in an IDE. Claude feels more like a black box, and one with so many options that it's just overwhelming, yet I continue to try and figure out the best way to use it for my personal projects.

    Claude code suffers from initial decision fatigue in my opinion.

    • levocardia 7 minutes ago
      I just take a grug brain approach. I do touch CLAUDE.md and then just explain how the code/files/project spec work, like I'm writing a slack message or email to a really smart colleague, and then let it rip, always using biggest model with thinking on. If something consistently goes wrong I add more to CLAUDE.md or even better, have Claude Code just update CLAUDE.md itself with the new issue explained. I'm probably 3 months behind what you could get with absolute SOTA practices but it still works so well that I'm amazed and amused on a daily, if not hourly, basis.
    • asdev 3 hours ago
      you really don't need any of this crap. you just need Claude Code and CLAUDE.MD in directories where you need to direct it. complicated AI set ups are mid curve
      • parpfish 2 hours ago
        I refuse to learn all the complicated configuration because none of it will matter when they drop the next model.

        Things that need special settings now won’t in the future and vice versa.

        It’s not worth investing a bunch of time into learning features and prompting tricks that will be obsoleted soon

        • AndyNemmity 2 hours ago
          I wish that were true. Models don't feel like they've really had massive leaps.

          They do get better, but not enough to change any of the configuration I have.

          But you are correct, there is a real possibility that the time invested with be obsolete at some point.

          For sure the work towards MCPs are basically obsolete via skills. These things happen.

          • parpfish 1 hour ago
            It doesn’t require any major improvement to the underlying model. As long they tinker with system prompts and builtin tools/settings, the coding agent will evolve in unpredictable ways out of my control
            • AndyNemmity 1 hour ago
              That's a rational argument. In practice, what we're actually doing for the most part is managing context, and creating programs to run parts of tasks, so really the system prompts and builtin tools and settings have very little relevance.
          • dnautics 1 hour ago
            i don't understand this mcp/skill distinction? one of the mcps i use indexes the runtime dependency of code modules so that claude can refactor without just blindly grepping.

            how would that be a "skill"? just wrap the mcp in a cli?

            fwiw this may be a skill issue, pun intended, but i can't seem to get claude to trigger skills, whereas it reaches for mcps more... i wonder if im missing something. I'm plenty productive in claude though.

            • AndyNemmity 1 hour ago
              So MCPs are a bunch of, essenntially skill type objects. But it has to tell you about all of them, and information about all of them up front.

              So a Skill is just a smaller granulatrity level of that concept. It's just one of the individual things an MCP can do.

              This is about context management at some level. When you need to do a single thing within that full list of potential things, you don't need the instructions about a ton of other unrelated things in the context.

              So it's just not that deep. It would be having a python script or whatever that the skill calls that returns the runtime dependencies and gives them back to the LLM so they can refactor without blindly greping.

              Does that make sense?

            • austinbaggio 1 hour ago
              In our experience, a lot of it is feel and dev preference. After talking to quite a few developers, we've found the skill was the easiest to get started with, but we also have a CLI tool and an MCP server too. You can check out the docs if you'd prefer to try those - feedback welcome: https://www.ensue-network.ai/docs#cli-tool
      • wouldbecouldbe 2 hours ago
        It seems to mostly ignore Claude.md
        • songodongo 1 hour ago
          If you can test how often it is being used by having a line in there saying something like “You must start every non-code response with ‘Woohoo!’”
        • AndyNemmity 2 hours ago
          It does, Claude.md is the least effective way to communicate to it.

          It's always interesting reading other people's approaches, because I just find them all so very different than my experience.

          I need Agents, and Skills to perform well.

    • dimitri-vs 2 hours ago
      I'm in Claude Code 30+ hr/wk and always have a at least three tabs of CC agents open in my terminal.

      Agree with the other comments: pretty much running vanilla everything and only the Playwright MCP (IMO way better than the native chrome integration) and ccstatusline (for fun). Subagents can be as simple as saying "do X task(s) with subagent(s)". Skills are just self @-ing markdown files.

      Two of the most important things are 1) maintaining a short (<250 lines) CLAUDE.md and 2) having a /scratch directory where the agent can write one-off scripts to do whatever it needs to.

      • jswny 1 hour ago
        I also specifically instruct Claude how to use a globally git ignored scratch folder “tmp” in each repo. Curious what your approach is
        • austinbaggio 55 minutes ago
          You store your project context in an ignored tmp folder? Share more plz - what does it look like? What do you store?
          • jswny 24 minutes ago
            Not memory, I just instruct it to freely experiment with temporary scripts and artifacts in a specific folder.

            This helps it organize temporary things it does like debugging scripts and lets it (or me) reference/build on them later, without filling the context window. Nothing fancy, just a bit of organization that collects in a repo (Git ignored)

      • brigandish 2 hours ago
        How can you - or any human - review that much code?
        • Normal_gaussian 1 hour ago
          When I'm coding I have about 6 instances of VSCode on the go at once; each with their own worktree and the terminal is a dangerous cc in docker. most of the time they are sitting waiting for me. Generally a few are doing spec work/reporting for me to understand something - sometimes with issue context; these are used to plan or redirect my attention if I might've missed something. A few will be just hacking on issues with little to no oversight - I just want it to iterate tests+code+screenshots to come up with a way to do a thing / fix a thing, I'll likely not use the code it generates directly. Then one or two are actually doing work that I'll end up PR'ing or if I'm reviewing they'll be helping me do the review - either mechanically (hey claude, give me a script to launch n instances with a configuration that would show X ... ok, launch them ... ok, change to this ... grab X from the db ... etc.) or insight based (hey claude, check issue X against code Y - does the code reflect their comments; look up the docs for A and compare to the usage in B, give me references).

          I've TL'd and PM'd as well as IC'd. Now my IC work feels a lot more like a cross between being a TL and being a senior with a handful of exuberant and reasonably competent juniors. Lots of reviewing, but still having to get into the weeds quickly and then get out of their way.

        • bpolly 1 hour ago
          From personal experience, most of my time in Claude Code is spent experimenting, iterating, and refining approaches. The amount of code it produces as it relates to time spent working on it tends to be pretty logarithmic in practice.
    • _the_inflator 2 hours ago
      I like the finetuning aspect to it quite a lot. It makes sense to me. What I achieved now is a very streamlined process of autonomous work of an agent, which can more and more often be simply managed than controlled on a code review level basis for everything.

      I agree that this level of finetuning feels overwhelming and might let yourself doubting whether you do utilize Claude to its optimum and the beauty is, that finetunging and macro usage don't interfere, when you stay in your lane.

      For example I now don't use the planing agent anymore instead incorporated this process into the normal agents much to the project's advantage. Consistency is key. Anthropic did the right thing.

      Codex is quite a different beast and comes from the opposite direction so to say.

      I use both, Codex and Claude Opus especially, in my daily work and found them complementary not mutual exclusive. It is like two different evangelists who are on par exercising with different tools to achieve a goal, that both share.

      • AndyNemmity 1 hour ago
        Yeah, at a certainly level, it's just a ton of fun to do. I think that's why so many of us are playing with it.

        It's also deeply interesting because it's essentially unsolved space. It's the same excitement as the beginning of the internet.

        None of us know what the answers will be.

    • eterm 3 hours ago
      This isn't necessary. Claude will read CLAUDE.md from both:

        1. Current directory ./CLAUDE.md
        2. User directory ~/.claude/CLAUDE.md
      
      I stick general preferences in what it calls "user memory" and stick project specific preferences in the working directory.
    • austinbaggio 3 hours ago
      It feels like Claude is taking more of the Android approach of a less opinionated, but more open stack, so people are bending it to the shape they want to match their workflow. I think of the amnesia problem as pretty agent-agnostic, though, knowing what happens while you're delivering product is more of an agent execution layer problem than a tool problem, and it gets bigger when you have swarms coordinating - Jaya wrote a pretty good article about this https://x.com/AustinBaggio/status/2004599657520123933?s=20
    • AndyNemmity 2 hours ago
      I'm the opposite, I find it straight forward to use all these things, and am surprised people aren't getting it.

      I've been trying to write blogs explaining it recently, but I don't think I'm very good at making it sound interesting to people.

      What can I explain that you would be interested in?

      Here was my latest attempt today.

      https://vexjoy.com/posts/everything-that-can-be-deterministi...

      • majormajor 2 hours ago
        You say "My Claude Code Setup" but where is the actual setup there? I generally agree with everything about how LLMs should be called you say, but I don't see any concrete steps of changing Claude Code's settings in there? Where are the "35 agents. 68 skills. 234MB of context."? Is the implementation of the "Layer 4" programs intended to be left to the reader? That's hardly approachable.
        • AndyNemmity 2 hours ago
          I got similar feedback with my first blog post on my do router - https://vexjoy.com/posts/the-do-router/

          Here is what I don't get. it's trivial to do this. Mine is of course customized to me and what I do.

          The idea is to communicate the ideas, so you can use them in your own setup.

          It's trivial to put for example, my do router blog post in claude code and generate one customized for you.

          So what does it matter to see my exact version?

          These are the type of things I don't get. If I give you my details, it's less approachable for sure.

          The most approachable thing I could do would be to release individual skills.

          Like I have skills for generating images with google nano banana. That would be approachable and easy.

          But it doesn't communicate the why. I'm trying to communicate the why.

          • majormajor 2 hours ago
            I just don't have much faith in "if you're doing it right the results will be magically better than what you get otherwise" anymore. Any single person saying "the problems you run into with using LLMs will be solved if you do it my way" has to really wow me if they want me to put in effort on their tips. I generally agree with your why of why you set up like that. I'm skeptical that it will get over the hump of where I still run into issues.

            When you've tried 10 ways of doing it but they all end up getting into a "feed the error back into the LLM and see what it suggests next" you aren't that motivated to put that much effort into trying out an 11th.

            The current state of things is extremely useful for a lot of things already.

            • AndyNemmity 2 hours ago
              That's completely fair, I also don't have much faith in that anymore. Very often, the people who make those claims have the most basic implementation that barely is one.

              I'm not sure if the problems you run into with using LLMs will be solved if you do it my way. My problems are solved doing it my way. If I heard more about your problems, I would have a specific answer to them.

              These are the solutions to where I have run into issues.

              For sure, but my solutions are not feed the error back into the LLM. My solutions are varied, but as the blog shows, they are move as much as possible into scripts, and deterministic solutions, and keep the LLM to the smallest possible scope.

              The current state of things is extremely useful for a subset of things. That subset of things feels small to me. But it may be every thing a certain person wants to do exists in that subset of things.

              It just depends. We're all doing radically different things, and trying very different things.

              I certainly understand and appreciate your perspective.

              • majormajor 10 minutes ago
                That makes sense.

                My basic problem is: "first-run" LLM agent output frequently does one or more of the following: fails to compile/run, fails existing test coverage, or fails manual verification. The first two steps have been pretty well automated by agents: inspect output, try to fix, re-run. IME this works really well for things like Python, less-well for things like certain Rust edge cases around lifetimes and such, or goroutine coordination, which require a different sort of reasoning than "typical" procedural programming.

                But let's assume that the agents get even better at figuring out the deal with the more specialized languages/features and are able to iterate w/o interaction to fix things.

                If the first-pass output still has issues, I still have concerns. They aren't "I'm not going to use these tools" concerns, because I also sometimes write bugs, and they can write the vast majority of code faster than I can.

                But they are "I'm not gonna vibe-code my day job" concerns because the existence of trivially-catchable issues suggests that there's likely harder-to-catch issues that will need manual review to make sure (a) test coverage is sufficient, (b) the mental model being implemented is correct, (c) the outside world is interacted with correctly. And I still find bugs in these areas that I have to fix manually.

                This all adds up to "these tools save me 20-30% of my time" (the first-draft coding) vs "these agents save me 90% of my time."

                So I'm kinda at a plateau for a few months where it'll be hard to convince me to try new things to try to close that 20-30% -> 90% number.

              • ok_dad 39 minutes ago
                Damn, it really is all just vibes eh? Everyone just vibes their way to coding these days, no proof AI is actually doing anything for you. It's basically just how someone feels now: that's reality.

                In some sense, computers and digital things have now just become a part of reality, blending in by force.

                • AndyNemmity 27 minutes ago
                  I mean, it’s not vibes. I make real projects, and the failures of AI doing it force me to make fixes so that it only ever fails doing that thing once. Then it no longer fails to do that thing.

                  But the things I am doing might not be the things you are doing.

                  If you want proof, I intend to release a game to the App Store and steam soon. At that point you can judge if it built a thing adequately.

                  • ok_dad 14 minutes ago
                    No offense intended, I don't even know you at all, but I see people claim things like you did so often these days that I begin to question reality. These claims always have some big disclaimer, as yours does. I still don't know a single personal acquaintance who has claimed even a 2x improvement on general coding efficiency, not even 1.5x in general efficiency. Some of my coworkers say AI is good for this or that, but I literally just waste my time and money when I use it, I've never gotten good results or even adequate results to continue trying. I feel like I am taking crazy pills sometimes with all of the hype!

                    I hope you're just one of the ones who figured it out early and all the hype isn't fake bullshit. I'd much rather be proven wrong than for humanity to have wasted all this time and resources.

    • einsteinx2 1 hour ago
      I use both Cursor and Claude Code in VS Code at work (so I get similar control as Cursor). I don’t really use Claude Code any differently than cursor. People way over complicate it.
    • austinbaggio 2 hours ago
      It is overwhelming. We have support for Cursor mcp as well, but you lose a lot of the auto-magic stuff you get with the Claude Code plugin. Unfortunately, skills are pretty sticky to the Claude Code stack. It is kind of the vim of AI coding agents. . . One of the goals for this tool was to address context management in a single place. i.e instead of setting up all of the rules, claude.md, and skill.md you just semantic query a specific namespace in your knowledge base.

      the docs if you are curious: https://www.ensue-network.ai/docs

    • minimaxir 3 hours ago
      With Opus 4.5 in Claude Code, I'm doing fine with just a (very detailed) CLAUDE.md.
      • austinbaggio 3 hours ago
        Do you find you want to share the .md with the teams you work with? Or is it more for your solo coding?
    • wouldbecouldbe 2 hours ago
      All I use is curse words and it does a damn great job most of the time
      • nineteen999 38 minutes ago
        I thought I was the only one.
      • lobito25 2 hours ago
        Same here :)))), he's really good at understanding when you're pissed off.
      • anonzzzies 2 hours ago
        Yep, that usually works best.
    • lukev 3 hours ago
      A claude.md file will give you 90% of what you need.

      Consider more when you're 50+ hours in and understand what more you want.

      • AndyNemmity 2 hours ago
        In my experience, I'm at the most where I entirely ignore Claude.md - so it's very interesting how many people have very different experiences.
    • pigpop 3 hours ago
      You don't need all that, just have Claude write the same documentation you would (should) write for any project. I find it best to record things chronologically and then have Claude do periodic reviews of the docs and update key design documents and roadmap milestones. The best part is you get a written record of everything that you can review when you need to remember when and why something changed. They also come in handy for plan mode since they act as a guide to the existing code.

      The PMs were right all along!

    • metadat 3 hours ago
      Don't forget about the co-agents.. yeah.
    • animitronix 2 hours ago
      Nope, I spend time learning my tools.
    • saidcooldude2 2 hours ago
      [dead]
  • austinbaggio 54 minutes ago
    Thanks everyone for the comments, really, I wasn't expecting this.

    Quite a few of you have mentioned that you store a lot of your working context across sessions in some md file - what are you actually storing? What data do you actually go back to and refer to as you're building?

  • scubbo 1 hour ago
    I've been tinkering with building something similar for myself - though for a generic chatbot, rather than for Claude (not every task is coding, and I'd like to keep !). From other comments (e.g. https://news.ycombinator.com/item?id=46428368, https://news.ycombinator.com/item?id=46427950) suggest that many others are already ahead of me. Any recs for tools, libraries, or approaches that I should learn from or adopt? In particular, I've found that - no matter how direct and clear the system prompt is - models have a tendency to respond verbally as if they've made a tool-call recording some gained-knowledge ("thanks! I'll remember that"), but to not actually return the JSON required to trigger the call by the tool.
    • austinbaggio 1 hour ago
      Since you've already thought about this problem, I'd love to hear your feedback after giving this skill a try. It should speed up at least your basic need of having to trigger the LLM to store the memory. One of our colleagues has found success asking at the end of a research session what he missed, how he could improve, etc.
  • coffeeboy27 3 hours ago
    What's the data retention/deletion policy and is there a self-hosted option planned? I'd prefer not to send proprietary code to third-party servers.
    • austinbaggio 3 hours ago
      Honestly, very reasonable ask, you're not the first person to ask for a self-hosted version. We have a privacy policy we've drafted that is up-to-date with the current version of the product https://www.ensue-network.ai/privacy-policy.

      The project is still in alpha, so you could shape what we build next - what do you need to see, or what gets you comfortable sending proprietary code to other external services?

      • frumplestlatz 3 hours ago
        > what do you need to see, or what gets you comfortable sending proprietary code to other external services?

        Honestly? It just has to be local.

        At work, we have contracts with OpenAI, Anthropic, and Google with isolated/private hosting requirements, coupled with internal, custom, private API endpoints that enforce our enterprise constraints. Those endpoints perform extensive logging of everything, and reject calls that contain even small portions of code if it's identified as belonging to a secret/critical project.

        There's just no way we're going to negotiate, pay for, and build something like that for every possible small AI tooling vendor.

        And at home, I feed AI a ton of personal/private information, even when just writing software for my own use. I also give the AI relatively wide latitude to vibe-code and execute things. The level of trust I need in external services that insert themselves in that loop is very high. I'm just not going to insert a hard dependency on an external service like this -- and that's putting aside the whole "could disappear / raise prices / enshittify at any time" aspect of relying on a cloud provider.

        • austinbaggio 46 minutes ago
          Yeah I get the dependency concern, and also I think about the trust and pricing challenge a lot. I might be getting ahead of my skis here, but living in a future world, assuming there is a local service, what would you want to see with a context management service for your team to actually use it? Or even better - pay for it?
  • qudat 3 hours ago
    • jswny 1 hour ago
      Can you give an example of how beads would be used by Claude to do something it otherwise couldn’t? I can’t quite tell what it is useful for
    • zyan1de 3 hours ago
      oh yeah beads is awesome! I'd say this is a bit more general purpose rn especially what is in the skill!
  • linsomniac 1 hour ago
    The past few weeks I've been experimenting with using less context and less memory and it's been going really well. Where before I'd try to do a bunch of fairly related things in a single session, experimenting with compacting more or less frequently, now I'm clearing my context or exiting and restarting claude and codex. It seems to help it focus on the task at hand, hasn't tended to go off into the weeds as much, and my token costs have dropped way down.

    Combined with a good AGENTS.md, it seems to be working really well.

    • einsteinx2 1 hour ago
      That’s been my experience as well. I find I usually get better output if I create a new conversation for each thing I need. I’ve found that the only times it’s better to continue an existing conversation is if I want to have it make small improvements or changes to something it just wrote, as it tends to do better with the previous context still there. But even that only goes so far, then the scale tips and it works much better with a clean slate. I especially don’t want totally unrelated conversations polluting the context which is why I have all memory features turned off in all the web chat UI’s for the models I use.
  • amannm 2 hours ago
    There's a lot of people interested in forming some sort of memory layer around vendored LLM services. I don't think they realize how much impact a single error that disappears from your immediate attention can have on downstream performance. Now think of the accrual of those errors over time and your lack of ability to discern if it was service degradation or a bad prompt or a bad AGENTS.md OR now this "long term memory" or whatever. If this sort of feature will ever be viable, the service providers will offer the best solution only behind their API, optimized for their models and their infrastructure.
  • heliumtera 1 hour ago
    Stop Claude from forgetting by telling it to not forget
    • AndyNemmity 57 minutes ago
      and put it in all caps, so it knows you mean business.
  • AndyNemmity 2 hours ago
    I don't understand the use case. I think if you don't use agents, and skills currently effectively, then perhaps this is useful.

    If you're using them though, we no longer have the problem of Claude forgetting things.

    • lkbm 1 hour ago
      I'm curious how those replace this? I've barely used either, and would love to hear more.
      • AndyNemmity 1 hour ago
        Okay, Claude.md is an md file with instructions.

        Agents are an md file with instructions.

        Skills are an md file with instructions.

        Commands are.. you get the point.

        We're just dealing with instructions. Claude.md is handled by Claude Code. It is forgotten almost entirely often when the context fills.

        Okay, what is an agent? An agent is basically a Claude.md file, but you make it extremely granular. So it only has instructions of let's say, Typescript.

        We're all just doing context management here. We're trying to make sure our instructions that matter stay.

        To do that, we have to remove all other instructions from the picture.

        When you're doing typescript, you only know type script things.

        Okay, what's a skill? A skill is doing a single thing with type script. Why? So that the context is even smaller.

        Instead of the agent having every single instruction you need about typescript, you put them in skills so they only get put into context when that thing is needed.

        But skills are also where you connect deterministic programs. For example, I have a skill for creating images in nano banana.

        So when the Typescript Agent needs to create an image, it calls the skill, that calls the python script, to create images in nano banana.

        We're managing all the context to only be available when it's needed, keeping all other instructions out.

        Does that help?

  • ec109685 2 hours ago
    This is impressive.

    Though I have found repo level claude.md that is updated everytime claude makes a mistake plus using —restore to select a previous relevant session works well.

    There is no way for Anthropic to optimize Claude code or the underlying models for these custom setups. So it’s probably better to stick with the patterns Anthropic engineers use internally.

    • austinbaggio 34 minutes ago
      If you give it a try, I think that use case should work, but if not, I would be grateful if you told us what broke.

      And also - I genuinely worry about vendor lock-in, do you?

    • austinbaggio 1 hour ago
      Do you ever switch tools? I don't love the idea of my context being hostage of whatever LLM I choose first.
  • zyan1de 4 hours ago
    I mostly use it during long Claude Code research sessions so I don’t lose my place between days.

    I run it in automatic mode with decent namespacing, so thoughts, notes, and whole conversations just accumulate in a structured way. As I work, it stores the session and builds small semantic, entity-based hypergraphs of what I was thinking about.

    Later I’ll come back and ask things like:

    what was I actually trying to fix here?

    what research threads exist already?

    where did my reasoning drift?

    Sometimes I’ll even ask Claude to reflect on its own reasoning in a past session and point out where it was being reactive or missed connections.

  • EMM_386 2 hours ago
    Just put a claude.md file in your directory. If you want more details about a subdirectory put one in there too.

    Claude itself can just update the claude.md file with whatever you might have forgot to put in there.

    You can stick it in git and it lives with the project.

  • robertwt7 2 hours ago
    Congrats for this! how does this differs from claude-mem? I've been using claude-mem for a while now

    https://github.com/thedotmack/claude-mem

  • altmanaltman 4 hours ago
    Thank you for specifying it wasn't magic or AGI.
    • apublicfrog 3 hours ago
      > Not magic. Not AGI. Just state.

      Very clearly AI written

      • fragmede 3 hours ago
        You're absolutely right!
    • austinbaggio 3 hours ago
      jk it is AGI. First.
  • sabareesh 2 hours ago
    Non starter for us, we cant ship propriety data to a third party servers.
    • austinbaggio 1 hour ago
      I assume this is with work? And also assume you do send data, you just need some service agreement or something like with AWS or Microsoft for GH?
  • dr_dshiv 2 hours ago
    I just ask Claude to look at past conversations where I was working on x… it sometimes thinks it can’t see them, but it can.

    I’ll give this a go though and let you know!

  • alex_young 1 hour ago
    Doesn't Claude already use RAG on the backend?
  • bilbo-b-baggins 2 hours ago
    Your site advertises careers in San Francisco/Remote. California law requires compensation disclosures.
    • austinbaggio 2 hours ago
      Good flag, we're still pretty early, I think the strict requirement for compensation disclosures is post 15 employees in CA? Did I get this wrong?
  • fullstick 1 hour ago
    I like it when the conversation is new sometimes.
  • gaigalas 39 minutes ago
    I like the fact that it forgets.

    Each time an LLM looks at my project, it's like a newcomer has arrived. If it keeps repeating mistakes, it's because my project sucks.

    It's an unique opportunity. You can have lots of repeated feedback from "infinite newcomers" to a project, each of their failures an opportunity to make things clearer. Better docs (for humans, no machine-specific hacks), better conventions, better examples, more intuitive code.

    That, in my opinion, is how markdown (for machines only and not humans) will fall. There will be a breed of projects that thrives with minimal machine-specific context.

    For example, if my project uses MIDI, I'm much better doing some specialized tools and examples that introduce MIDI to newcomers (machines and humans alike) than writing extensive "skill documents" that explain what MIDI is and how it works.

    Think like a human do. Do you prefer being introduced to a codebase by reading lots of verbose docs or having some ready-to-run examples that can get you going right away? We humans also forget, or ignore, or keep redundant context sources away (for a good reason).

  • senshan 3 hours ago
    What is the advantage over summarizing previous sessions for the new one?

    Or, over continuing the same session and compacting?

    • austinbaggio 2 hours ago
      You can use it with summaries for sure, but summaries often miss edge cases and long sessions drift. This makes it easier to jump between tasks, come back days later, and reorient without missing something that the summarization or compaction might have gotten rid of. I've often found post-compaction, the memory of even the current session feels so much dumber.
      • ec109685 2 hours ago
        You can go to a previous session and resume from there. Plus keep updating the repo claude.md along the way:
  • lloydatkinson 2 hours ago
    > Not magic. Not AGI. Just state.

    Why did you need to use AI to write this post?

    • llmslave2 1 hour ago
      Their brains are mush, lost the ability to focus on a task or do any deep thinking. Just proooooooooompt.
  • graphememes 2 hours ago
    stop wasting context space with this stuff ミ · · 彡
  • jMyles 1 hour ago
    We built one too, with a web frontend and a 'spy' viewer in case your team wants to watch your interactions. Also has secret redaction:

    https://github.com/jMyles/memory-lane

  • zyan1de 3 hours ago
    maybe you are in a claude code session and think "didn't i already make design doc for system like this one?" Or you could even look at your thought process in a previous session and reflect. but rn i mainly use it for reviewing research and the hypergraph retrieval
  • CPLX 4 hours ago
    I absolutely love this concept! It's like the thing that I've been looking for my whole life. Well, at least since I've been using Claude Code, which is this year.

    I'm sold.

    With that said, I can't think of a way that this would work. How does this work? I took a very quick glance, and it's not obvious at first glance.

    The whole problem is, the AI is short on context, it has limited memory. Of course, you can store lots of memory elsewhere, but how do you solve the problem of having the AI not know what's in the memory as it goes from step to step? How does it sort of find the relevant memory at the time that that relevance is most active?

    Could you just walk through the sort of conceptual mechanism of action of this thing?

    • austinbaggio 3 hours ago
      Appreciate it - yeah, you're right, models don't work well when you just give it a giant dump of memory. We store memories in a small DB - think key/value pair with embeddings Every time you ask Claude something, the skill:

      1. Embeds the current request.

      2. Runs a semantic + timestamp-weighted search over your past sessions. Returns only the top N items that look relevant to this request.

      3. Those get injected into the prompt as context (like extra system/user messages), so Claude sees just enough to stay oriented without blowing context limits.

      Think of it like: Attention over your historical work, more so than brute force recall. Context on demand basically giving you an infinite context window. Bookmark + semantic grep + temporal rank. It doesn’t “know everything all the time.” It just knows how to ask its own past: “What from memory might matter for this?”

      When you try it, I’d love to hear where the mechanism breaks for you.

    • skuenzli 3 hours ago
      It looks to me like the skill sets up a connection to their MCP server at api.ensue-network.ai during Claude session start via https://github.com/mutable-state-inc/ensue-skill/blob/main/s...

      Then Claude uses the MCP tools according to the SKILL definition: https://github.com/mutable-state-inc/ensue-skill/blob/main/s...

    • zyan1de 3 hours ago
      yeah so you can run it in automatic mode, or read only mode. In automatic mode it hooks onto the conversation and tool calls so you get the entire conversation stored. If you dont want to get super deep, then read only is safe and only stores what you ask. You could ask it things like "why is my reasoning dumb" by recalling passed conversations, or even give it the claude tool call sequence and ask "how can claude be smarter about next time".

      I think of it like a file tree with proper namespacing and keep abstract concepts in separate directories. so like my food preferences will be in like /preferences/sandos. or you can even do things like /system-design preferences and then load them into a relevant conversation for next time.

    • DANmode 3 hours ago
      Total speculation:

      Text Index of past conversations, using prompt-like summaries.

  • devhouse 1 hour ago
    [dead]
  • incoming1211 1 hour ago
    [dead]