TL;DR: I spent a solid month “pair programming” with Claude Code, trying to suspend disbelief and adopt a this-will-be-productive mindset. More specifically, I got Claude to write well over 99% of the code produced during the month. I found the experience infuriating, unpleasant, and stressful before even worrying about its energy impact. Ideally, I would prefer not to do it again for at least a year or two. The only problem with that is that it “worked”. It’s hard to know exactly how well, but I (“we”) definitely produced far more than I would have been able to do unassisted, probably at higher quality, and with a fair number of pretty good tests (about 1500). Against my expectation going in, I have changed my mind. I now believe chat-oriented programming (“CHOP”) can work today, if your tolerance for pain is high enough.

  • termaxima@slrpnk.net
    link
    fedilink
    arrow-up
    1
    ·
    9 minutes ago

    I don’t care if human meat made for tastier, healthier hamburgers, and faster too ; I refuse to eat people for any reason whatsoever.

    If you don’t see how this relates to AI, maybe you are AI yourself.

  • ozymandias@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    2
    ·
    4 hours ago

    i tried that recently with a pretty unique app, it gave a decent outline but just made so many bugs it was worthless… every library it included was outdated… i don’t want to imagine how many security flaws it creates.
    i think it’s decent if your project has been done a million times before, otherwise it sucks

  • sacredfire@programming.dev
    link
    fedilink
    arrow-up
    2
    ·
    4 hours ago

    My experience with LLMs for coding has been similar. You have to be extremely vigilant, because they can produce very good code but will also miss important things that will cause disasters. It makes you very paranoid with their output, which is probably how you should approach it and is honestly how you should approach any code that you’re writing or getting from somewhere else.

    I can’t bring my self to actually use them for generating code like he does in this blog post though. That seems infuriating. I find them useful as a way to query knowledge about stuff that I’m interested in which I then cross reference with documentation and other sources to make sure I understand it.

    Sometimes you’re dealing with a particular issue or problem that is very hard to Google for or look up. LLMs are a good starting point to get an understanding of it; even if that understanding could be flawed. I found that it usually points me in the right direction. Though the environmental and ethical implications of using these tools also bother me. Is making my discovery phase for a topic a little bit easier worth the cost of these things?

  • TehPers@beehaw.org
    link
    fedilink
    English
    arrow-up
    8
    ·
    11 hours ago

    1500 tests is a lot. That doesn’t mean anything if the tests aren’t testing the right thing.

    My experience was that it generates tests for the sake of generating them. Some are good. Many are useless. Without a good understanding of what it’s generating, you have no way of knowing which are good and which are useless.

    It ended up being faster for me to just learn the testing libraries and write my own tests. That way I was sure every test served a purpose and tested the right thing.

    • kamstrup@programming.dev
      link
      fedilink
      arrow-up
      4
      ·
      9 hours ago

      Yeah. Totally agree on this. I spend maybe 3-4h a day reviewing code, and these are my thoughts…

      The LLM generated tests I see are generally of very low quality. Perfectly fitting the bill of looking like a test, but not actually being a good test.

      They often don’t test the precise expected value. As an overly simplistic example: They rarely check 2+2==4. But just assert 2+2>0, or often just that 2+2 doesn’t cause an error.

      The tests often contain mountains of redundancy. Again, an oversimplified example: They have a test for 2+2, and another for 2+3.

      There is never any attempt to make the tests nice to read for humans. It is always just heaps of boilerplate code. No helpers introduced, or affordances to simplify test setup.

      Coupling the proclivity for boilerplate together with subtly redundant tests makes for some very poor programming. Worse than I’d expect from a junior, tbh.

      And 1500 tests… That is not necessarily a lot! If that is the output of 1 month of pumping out code, I would say bare minimum

      • majster@lemmy.zip
        link
        fedilink
        English
        arrow-up
        3
        ·
        8 hours ago

        30×50=1500, 50 tests per day is a lot. That is a lot to read and understand all the edge cases, let alone writing them.

        • TehPers@beehaw.org
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          5 hours ago

          30 is assuming you write code for all 30 days. In practice, it’s closer to 20, so 75 tests per day. It’s doable on some days for sure (if we include parameterized tests), but I don’t strictly write code everyday either.

          Still, I agree with them that you generally want to write a lot of tests, but volume is less important than quality and thoroughness. The author using the volume alone as a meaningful metric is nonsense.

  • Avicenna@programming.dev
    link
    fedilink
    arrow-up
    17
    ·
    edit-2
    17 hours ago

    Sounds like being a project manager for a team of one coder AI, honestly quite depressing. You don’t get to do the fun part (coding) or you don’t actually get to interact with intelligent human beings (possibly only fun part of a managerial role). The only positive thing you get out of it is basically output (which may become unmaintainable for complex projects in the long run). Sounds like something that only CEOs and people trying to get rich quickly would like.

  • tyler@programming.dev
    link
    fedilink
    arrow-up
    14
    ·
    18 hours ago

    Sounds infuriating honestly. Being more productive at the cost of mental health isn’t something we should be aiming for as a species.

  • Damarus@feddit.org
    link
    fedilink
    arrow-up
    11
    arrow-down
    1
    ·
    20 hours ago

    This page renders with only about four words in a row on my phone unfortunately

    • Mikina@programming.dev
      link
      fedilink
      arrow-up
      10
      ·
      18 hours ago

      If I understood it right, one of the projects he was working on as part of the experiment was the website.

      Figures.

    • zqwzzle@lemmy.ca
      link
      fedilink
      English
      arrow-up
      5
      ·
      19 hours ago

      Didn’t they revive that checkeagle project with Claude? Keep the bar low I guess.

  • abbadon420@sh.itjust.works
    link
    fedilink
    arrow-up
    10
    arrow-down
    1
    ·
    20 hours ago

    Interesting read. Haven’t finished it yet (it’s late, I’m going to bed) but it’s a nice shift from the defacto negativity around LLM’s on this platform