• Zos_Kia@jlai.lu
    link
    fedilink
    arrow-up
    22
    ·
    4 days ago

    I think the issue is also that you need some serious hardware to get good inference speed when your devs are working, but then most of the time this hardware will be under utilized.

    That being said you can get good performance from indie inference farms, at a fraction of the cost of the big US labs. I think it’s a great compromise and in a few months the open models will be near parity with opus 4.6 which is really all you need for most tasks.

    • plyth@feddit.org
      link
      fedilink
      English
      arrow-up
      6
      ·
      4 days ago

      opus 4.6 which is really all you need for most tasks.

      The same tasks that can fit into 640KB.

          • Zos_Kia@jlai.lu
            link
            fedilink
            arrow-up
            2
            arrow-down
            1
            ·
            4 days ago

            Aha thanks for sharing that’s a cool anecdote. But i think my point still stands, as there are thresholds effects in LLM “intelligence” which don’t directly map to the RAM comparison.

            Opus 4.6 is comparable to a mid-level developer. It requires some guidance and will sometimes get things wrong, but is also suitable to work in most business environments: most projects are not that complicated or high stakes in the first place.

            In the future you’ll probably have Opus 7.5 or some shit, which will be at a mega-senior level but also considerably more expensive. And given the price difference, companies will suddenly discover that they don’t really need expert level coding at a high price tag, and that a reliable workhorse at a fraction of the cost is largely enough for their needs.

            • jj4211@lemmy.world
              link
              fedilink
              arrow-up
              1
              ·
              3 days ago

              Opus 4.6 is comparable to a mid-level developer.

              Not really…

              Yes, it pays attention to certain details that humans will tend to flub, so it’s better than juniors when it comes to that…

              But broadly speaking, it’s a moron. It’s like a junior dev pasting 15 year old stack overflow answers into a project, but better at making it fit in, but still doing pretty dumb approaches.

              I spent a bunch of tokens to try to get Opus 4.7 to do a task for me last week. The result had mistakes and the test case that should be near instant took 3 minutes to complete (indicating that a user would be staring at a spinner for 3 minutes). It did save me the trouble of trying to figure out the details basic structure of the thing I was going to interact with (the documentation was dense and lacking specific examples, and Opus did output something that let me see how it basically worked in a to-the-point way), but I had to rewrite the “meat” of the task to get correct execution in under a second.

              In the future you’ll probably have Opus 7.5 or some shit, which will be at a mega-senior

              My impression has been less about it being more “senior” over time and more about being able to consistently deliver junior level work for longer amounts of output. Error rate remains problematic so you end up with more to review that in a way tortuously “looks right” for longer. When it digs itself into a hole, it’s very bad at trying to amend the mess that has accumulated.

              • Zos_Kia@jlai.lu
                link
                fedilink
                arrow-up
                1
                ·
                3 days ago

                I mean obviously mileage does vary from project to project and task to task, but i think you might be overestimating mid-level developers. Or you’ve been really lucky with your recruitment ! Cause i would describe them just the way you described Opus. Pretty eager, kind of try-hard, decent engineering chops but often misdirected with dumb approaches.

                Of course my experience is limited and i’ve never really been in a managing role but i’ve been the adult in a fair number of rooms and i’ve done my share of “grooming sprints” and dispatching tasks.

                That being said, there are projects that are horribly reluctant to agentic coding. It’s pretty rare as most codebases nowadays are bog standard and rely on roughly the same abstractions, but i’ve seen it happen. It can come from the complexity of the domain, or of the codebase, or from the way documentation and tribal knowledge clash, or a myriad other reasons. Often it’s the kind of projects that require more mature devs and can’t really onboard juniors/mids.

                When it digs itself into a hole, it’s very bad at trying to amend the mess that has accumulated

                Oh yeah definitely. Once it’s in the hole you better scratch that branch off and restart with more specific instructions cause agents are very “additive”, they don’t often think to remove stuff and change their approach. Again, kind of like mid devs once they’re committed to an implementation plan.

                • jj4211@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  2 days ago

                  you might be overestimating mid-level developers

                  Maybe, conveniently shortly after you made that point I had to work with a random developer at a nother company, and technically his rating was senior developer, and the nature of his mess that he had gotten into (without codegen) was credibly not “junior like” at all. It was a mess from superfluous complexity from adopting every buzzword along his career, cloud, microservices, configuration management. Not just once each, but for example three different configuration management solutions that all could do the task were in use for different things. I was asked to consult on why it was flakey despite his best efforts and maybe help him simplify what he demanded of the users. Turned out they already had something purpose built for the task installed, and I showed him the single command that required 90% fewer inputs (it could auto-fetch the information) and worked within a few seconds instead of a few minutes and did it entirely on-premise instead of having to go to the cloud and actually worked reliably.

                  So yeah, that guy probably would have been no worse off from CodeGen AI making his mess… It also occurred to me that his resume description of his project probably sounds more impressive on a resume despite it being garbage…

                  • Zos_Kia@jlai.lu
                    link
                    fedilink
                    arrow-up
                    1
                    ·
                    2 days ago

                    It was a mess from superfluous complexity from adopting every buzzword along his career, cloud, microservices, configuration management

                    That was the bane of my existence before AI and i suspect AI will only compound this issue.

                    If you do “artisan vibe coding”, acting like a very hands on CTO that challenges decisions and reviews most of the code produced, you get a modest productivity boost in the 20 to 40% range, and a large reduction in cognitive load which can help you think bigger thoughts on the longer term. The quality can be as high as you want it to be in that setup.

                    But if you do fully agentic unsupervised vibe-coding, it’s easy to get into a mess because it’s like having a team of junior/mids paid by the line churning out complexity all day long. The productivity boost can be a large multiple but the quality suffers because you have to ignore a lot of the stuff and the devil is in the details so he will certainly get you at some point.