S5E21

Will AI replace developers?

January 3, 2025 · 48:44

Ben: So, Josh, you getting ready for Christmas. Is your shopping all done?

Josh: Uh, yeah, I think I got most of it done. I actually, I've got like one stop to make and it's probably the one I should have done first, which is See’s Candy.

Ben: Oh yes.

Josh: We’ll see how that goes. I always save that to like the last week of Christmas. Yeah. But so far I've been, you know, I've always—I’ve always managed to make it work.

Ben: Awesome. We had a new See's Candy open up right around the corner from our house. So that's kind of dangerous.

Josh: Yeah.

Ben: And so I have, like, all of my packages get shipped to our box at the UPS store. So I use that for, you know, for Honeybadger stuff, and I use that for my other business, and I use it for personal things I don't want to go to my house, like Christmas presents. And I thought I had it all perfectly planned out. I had all the packages showing up within a couple of day window. So I could just go and do one trip because it’s—it gets crazy at the UPS store at Christmas time.

So I went there yesterday. And somehow I miscounted the number of packages I was expecting. And as I was leaving, I think like within five minutes of my leaving the store, another package showed up. And so I had to go back for the second time, so yeah.

Josh: That’s, that's hilarious timing though. So you got the email. Okay. And then you went back.

Ben: Yeah.

Josh: All right. Well, hopefully that's it for you.

Ben: Yeah. I think I have everything now, but maybe now that you mentioned it, maybe a trip to the See’s store would be a good idea.

Josh: Yeah, it usually is. Yeah. My mail story is our mailbox got taken out when I was at RubyConf. And we have this, like, ancient, heavy duty, custom steel, like, welded mailbox on the side of this fairly busy road. I think it's probably been there since the house, which was built in 1965. And I had no idea how it got, like, thrown basically 12 feet down a hill—with the concrete shoe ripped out of the ground, by the way. The best I can tell is, like, a truck hit it and decided to just drive off or something with a huge dent in the front of it.

Because it's enough to destroy most vehicles. So I'm really, like, confused what happened. But anyway, we were without any mail delivery for a number of weeks and starting to ramp up into the holiday season. So I was having to like drive to the mailbox once a week and walk away with this, like armful of packages, basically.

We finally got it fixed last weekend, so.

Ben: Nice. That’s—that's pretty impressive. It sounds substantial.

Josh: Yeah. Yeah, like it's the kind of mailbox that I think someone put in as revenge for maybe their previous teenagers, like, taking the bat to the old mailbox is the kind of mailbox, like some old timer welds himself and puts in there to like, yeah, basically destroy anything that's going to run into it. So I guess it lasted long enough, but didn't last forever.

Ben: That’s awesome. So did you get like a fancy kind of mailbox, like a supporting a sports team or, you know, something like that, or maybe a big flying fish or?

Josh: Um, no, I, we—so there's like some planned road construction that's going to affect our property front significantly next year and so I opted just for the cheapest Lowe's special, or I think my handyman bought it off of Amazon, I think. So it's like the cheapest mailbox you can find, but the plan is my wife wants to reuse the old one. It was sturdy enough that it didn't, there's no dent. There’s not a scratch on it except for the rust.

So we want to reuse it at some point, but I didn't want to go to the trouble of putting new concrete in. I mean, the thing weighs like probably 250 pounds. Yeah, it’s currently stored on the side of my shed and we'll deal with that once the construction's done. And in the meantime, hopefully no teenagers want to pull some shenanigans with the cheap mailbox, cause it's not going to stand up to, to anything.

Ben: That’s hilarious. Well, I'm glad the mailbox survived and it could tell the tale.

Josh: Yeah, it'll return. So I think, I feel like we both have been using AI or whatever LLM tools for a while now, like everyone else. But I feel like maybe you've been using them, like, in your code editor or for code generation a little bit more than me, probably because you've been writing a lot more code than I have been the past few years, to be honest, I've stuck over here in marketing land, writing blog posts and stuff, not with AI. So I feel like you've been using AI co generation tools a little bit more than I have. I think we've both been using them along with everyone else, but.

Ben: Yeah. In fact, Kevin, I think was the first among us to really start diving into these tools. And he was a big fan early on of diving into Chat GPT and asking it for tips about what he was working on. And I didn't love the idea because at the early time, when it first came out, I was like the old man shaking fist at cloud. It was like, I don't want AI stuff. But as he kept on raving about it, I was like, okay, I guess I should give this a shot.

And so I hopped into chat GPT and I started asking things like, I was interested in doing something in Golang and Golang is not my first language. It's not the one I spend a lot of my time in, but we do have some Go code around here. And so from time to time we do some Go projects. And I was curious about making a change to one of our Go projects that we have. And I wanted to use a in-memory cache for doing something to speed up some stuff. Right now we're going to Redis a lot for one of our processes. And we wanted to find a way to get that stuff into the process itself, instead of having to go to Redis. And I didn't know what the landscape looked like for Go for doing in-memory caches.

Like I know if you've got a Rails app, right? It’s super easy, you got, you know, the Rails caches right there. It's handy. So anyway, I asked Chat GPT, I'm like, hey, tell me about, you know, what options there are for Golang for doing an in-memory cache. And it has to support TTLs, time limited keys. And it gave me four options and it gave a little blurb about what each one was and like what its strengths and weaknesses were.

And then I dove down like, okay, well, I'm interested in this kind of scenario, which of these four would [be the best one that I should check out first. And it gave me a recommendation based on, you know, its API and blah, blah, blah. And so that was, like, super handy. It saved me a morning's worth of research.

Because it's like, okay, there’s four great candidates. And then I narrowed it down to one and I checked out that one. I'm like, yeah, I love that. This'll work. It was super helpful in doing that kind of research, but I never really got into the actual having it write code right away. I just like asking questions and then I got the idea.

I was like, well, if it's got a bunch of good suggestions, maybe I should just have it write some code for me. So I said, okay, well, I want to do this little task and I already knew how I would do it. But it would just be kind of tedious to write this code. It's not terribly exciting code. And so I just asked it to write me some code as I think this was Ruby this time.

And it—it gave me some code and I'm like, yeah, that's good. I can use that as a starting point. And maybe it was like 90 percent of the way there, but I did some edits and yeah, it was kind of cool. Like just saved me a lot of time that way.

Josh: So I'm curious—before the code generation—how did you find it compared to what you used to do? Like, go to Google and start doing some—do a search and do research and go from there. Like, like how did it save you time? And how did the experience compare to your old way of doing things?

Ben: Yeah, I think the best thing was that it showed me, I could like give a vague general description of what I wanted, like, show me some caching libraries for Go. I knew I wanted that, but I didn't know what was available and it knows, right? Because it's indexed the entire web or whatever. So I think like previously I would have gone to Google and I would have typed in Go caching libraries. And I would have probably would have found those same four suggestions and I would have gone through them one by one. And I would, okay, what does this one do? The thing that I want, or how does the API look on this one? And instead, like it was able to shorten that research time down from, I don't know, an hour or two down to five or 10 minutes.

Josh: Yeah. Yeah, that makes sense. Yeah. I found that, that it can be good for like, if you don't really know what you're talking about or like what you should be searching for, because it's basically a pattern matching engine. As far as I know, I think that's how they work. Like, they're trained on other content and they're going to like basically regurgitate that in one way or the other. I've even found it as a useful starting place for trying to figure out what specifically I should be searching for if I'm using like a traditional search or something like that.

Because like with a super new concept, like, I don't always know the language or what are the terms that I should be looking for if I want to get to the deeper side of this whatever, technology, or whatever it is, I can use it to jump ahead a little bit without having to go read a bunch of stuff. In the past, I might've gone and read a whole book or something or a bunch of articles to get myself in the headspace of like, okay, now I'm starting to understand these concepts, these terms and things, and now I have the context that I need to go out and actually do further research into this. So I found it’s—in some cases, it helps me get a head start on that sort of process.

Ben: Yeah. Yeah. I found it really does a good job of surfacing those things you don't know, like getting you out, I guess, that, that next level of knowledge about a thing, like you might know, okay, what is the deal with distributed systems, right. And they can give you some text and you're like, oh, I should go research the cap theorem or, you know, whatever, based on what it gives you back.

Josh: Yeah. So you started using it for code generation a little bit more. I remember, I think I remember specifically that time when you like did your first, you had to basically like write some Go tests, I think, or something like that. And it was just a pretty. Cool experience to have it just knock out some simple boilerplate code like that for you.

Ben: Yeah, I was early on to the Copilot beta with GitHub and I remember, yeah, I was in VS code one day and I was writing—I had just written some stuff in our Rails app and I had written a test because we do a lot of unit testing in our Rails app and I'd sort of written a test and then the autocomplete suggestion in my editor was like three more tests and they were exactly what I would have written.

And so I was like, tab complete. Yes. Thank you very much. And they were all legit. And, you know, it's like, great. I just saved myself, I don't know, five minutes of typing, but like, hey, it's five minutes of typing I didn't have to do, right? And it was just boring stuff of testing this condition and testing that condition.

So that I think really opened my eyes, like, hey, this could be kind of cool. I still didn't really trust it much, you know? I still read every line and I'm like, I don't know, but the thing that's kind of really nice about using the AI tools when it comes to code is that you know when you're done. Does it work or not, right? It's provable, right?

Does this thing do what it's supposed to do or does it not? Versus, I don't know if you asked AI about why do I have this skin condition? I'm not a dermatologist. I have no expertise, right. So I have no idea if it's completely lying to me, you know, making stuff up.

Josh: Yeah, it can’t go reproduce the scientific research or something or do the study in quite the way that—that it can actually run the code or you can run the code to verify the results. I think I agree with that for the simple cases, I think, and I will probably get into this a bit later, but like there, I think in some of the more like complex or complicated bits of code, I think that's where it starts to get a little hairy in terms of getting yourself just stuck, or into the weeds a little bit.

But I think the way I look at it is it can save a lot of time in the same way that like—almost like snippet generation plugins for closing method definitions and other, commenting out large, like, big blocks of text or code snippets or something. These things are patterns that are common that everyone needs to do quickly and it's easy to like—it's easy to just build something that does that for you with a keyboard shortcut or something.

And LLMs are a little bit fuzzier, but they can do a lot of the same thing. It's like very smart—like autocomplete basically—where if you're writing some Go code and it's similar to Go code that all the other Go developers have written and they wrote these tests, then it's probably you're going to want to write these tests too, and it can fill in some of the specifics if it's simple enough to, like, customize that to your specific context, which is neat.

Ben: Yeah. I found that like the very smart autocomplete was where I was found the most benefit for quite a while. Like I would go, yes, tab complete that, thanks. It was right there. But until recently I didn't really start a new thing with AI, like from scratch or really ask it to do a whole lot. I wasn't doing the chat kind of base thing, which has gotten popular more recently with like the Cursor editor, where you have this like chat interface that's right in the thing.

And I think Copilot added this over time as well, where now you can talk to the LLM in the editor. And I didn't really play that much until I started seeing people on Bluesky talking about Windsurf. Which is yet another editor that comes along and competes with Cursor, that basically—that’s built around some notion that you want to have a conversation with the AI and that let it do the driving basically, while you supervise and agree, you know, accept or reject the code suggestions that it makes. So I was seeing people talk about Windsurf a lot and I was like, okay, I'll try this out.

And so I didn't want to try it out on, like, a big existing project. I figured it might, I don't know, just have too much friction in my brain. Like I already—I know what I want to do, I just need to do it, you know, kind of thing. So what I did is like, well, I want to deploy this new zookeeper cluster to our infrastructure.

And I know I want to use Terraform and I know exactly how I would do this if I was just doing it from scratch, but let me see what the AI can do. And so—this is my first use case of—for Windsurf, so I opened up a brand new empty folder. I'm like, all right, and I described what I wanted. And when I say keep a cluster and it's got to have an auto scaling group and blah, blah, blah.

And my problem is, I don't know, two or three paragraphs long, because I know exactly what I want. And it just starts generating files. Okay, here’s a variables file. And here's the main file. And here, you know, I don't know, three or four different files. And I was like, wow, I'm watching it, like, type in the windows of each tab.

And I started reading through it. I'm like, this is basically what I would have written. And so like, now—now it's going from saving me like five minutes of typing to save me like an hour of typing, right. Because the Terraform syntax is very well documented, it knows exactly what it needs to write to give you an auto scaling group, for example, in an EC2 instance and blah, blah, blah.

So it's like, I didn't have to go and like, oh, what was that syntax for this particular thing that I haven't typed in a month or whatever. It just did it all. And now I've got like, five files in my editor, five open tabs. Yes, accept all that, that’s awesome. You know, and of course, then I went from there on and I was like, oh, now I want this, and now I want that. And it was like this really cool interactive experience. It's pretty fun.

John: Yeah. Yeah, that’s really cool. You got me checking out Windsurf recently too, and I've been experimenting with it and I had some similar results. I noticed—like, the way you describe that it reminds me a little bit of the old way to do it would have been, I would have gone and I would have searched for examples of what I wanted to do.

And maybe I would have landed on—like, if it was specific enough, there might be like a starter, someone might've made like some kind of project starter or like you used to have Rails kits, for example, which are like, you know, basically a starter kit for doing a specific thing that a lot of people want to do.

So I might've found if there was like a starter repo that had some examples of like, what do you name the variables file and where should it go? And all these basic things that you will know once you're familiar with Terraform or what you're doing, or maybe you already do know, but it's just a lot of work to go and set that all up. Again, the LLM seems to be doing that on the fly, but also customizing it to some specific things that you prompt it with.

Ben: And after I had that Terraform success, I was like, okay, well, that was a blank folder. Let me try an actual existing configuration. Let me add something to it. And speaking of the context that you mentioned, it can pull in context from other files in the project that you referenced specifically, or that are just there, and then it starts creating this new stuff that kind of matches your old stuff. Like it uses the existing variables and things like that. It just blew my mind.

Josh: Yeah, the thing that really impressed me about Windsurf—and I think Cursor does this as well—like, they both can do the multi file edits and that sort of thing, but Windsurf also can do, like, other operations in the workspace and it uses additional tools to inform its workflows.

So part of the step that it would generate—like if you're asking it to do something really complicated—it might go and run a terminal command to grep for some files that it thinks might exist and then use the output of that command to figure out where the files are that it needs to go and read to make an edit or include in its extra context or something.

It can do like additional operations, like exploring your actual workspace to inform the actual code changes that it's trying to make for you, which I thought was interesting. I don't know. Do you know if it can actually go—can it go out and read documentation on the internet or is it all, I think it's all local. I don't think it can actually go, like, fetch a URL, can it?

Ben: Well, actually—so we had the hack week that we often do near the end of the year. And we just pick a topic or some kind of fun code that we want to write. And so this year, Kevin and I decided we wanted to do a hack week where we're building a command line interface for Honeybadger.

So we want to—because like you can use curl to send a deploy notification to our API and we provide a CLI for our Honeybadger gem, but we don't have a CLI for every language. Like we don't have one for the Go library or the Python library. So we thought, well, what if we built one that was in Go, and then we could deploy that to every client, right?

Because you can just grab the binary. And so I decided to try out Windsurf to start off this project. And I told it what I wanted. And it was basically like, I started with the deploy command and I said, based on the API endpoint documented at the URL for our documentation for deploys, give me a Go command line thing that will record a deployment to that API.

So it actually went and fetched the, as far as I can tell, I went and fetched that page and saw what variables that need to send, like what the API key name was, like the—the header name. And I looked at what the URL structure was for deploys and then it figured it out, right? It's like, okay, I need to create this kind of payload and I'm expecting this kind of response back. And it basically worked. It was pretty wild.

Josh: That’s pretty cool. I'll have to—I’m curious now, because it's like implied to me in the past when I've been using it, that it had gone and check something, but I don't know if you're familiar with Perplexity. It's like a search engine mixed with Chat GPT, basically, where it goes—you give it a prompt and it like improves your search, like, a search query based on your prompt, then it goes and it does the search queries across Google and probably some other search engines.

Brings the results back in, summarizes the results and then answers your query based on that. And aside from all the like ethical and just, like, information economy, like what a disaster it is for content creators and all that, I have found it pretty useful in terms of like, basically again, it’s doing what I would do if I had to go and read the first page of Google and read all the— which let's be honest is mostly AI slop in the first place.

So like basically it's figuring out which results have some sort of information that's useful to me and then pull that out. But I wonder like if these tools can do something along those lines where it goes and it actually does the Googling for you in the background and pulls the information you're looking for directly into the context of what you're doing, that's an interesting idea, obviously fraught with potential problems, but.

Ben: Yeah, I've never seen Windsurf justify what it's doing. Like it could have given me the Terraform documentation URLs for the things it was doing in that project, but it never did. I don't know.

Josh: Yeah, so that's what I was getting at—so what I didn't mention is that so Perplexity, like it includes the sources. That's the key part. So like when it makes a statement like, you know, some sort of, I mean, I hate to say fact based claim, but it is, you know, let's be honest. People are using this to like look up facts, which is a terrible idea, by the way, and even if you use it as a shorthand, like you need to go and you need to verify the receipts yourself.

But Perplexity at least gives you inline receipts, links out to where it's getting the information, where you can click through and then you can actually make sure, like at least the page it's reading, it’s accurately summarizing that page, which by the way, might also be total bullshit, but you need to like go several levels deep to verify what you're getting.

But it does kind of let you back into that versus like having to do the entire process yourself. So Windsurf doesn't give me that. Like it did, it’ll—it might say like, okay, I'm going to go look up. I'm going to like check the docs. But, you know, I have very low trust of these systems and I know that they will tell you that they're reading the docs when they're not, when they're just using whatever their training data is. So I’m—that's why I'm a little skeptical still. I guess I need to go look at Windsurf’s docs a little bit more or something and see what it's actually doing or capable of.

Ben: Yeah. Well, that is a concern for me. Like, where are these code things coming from? Where are these ideas that it's getting. For example, back to that Go CLI. So I have not written a CLI with Go before. And so I've never done, like, command line parsing in Go. And so I have no idea, right. But it just did it because I told it like what arguments I want to give. And so it understood that it needed to parse command line arguments. And so it brought in the Cobra dependency and it brought in the Viper dependency that I'd never heard of before.

But these are apparently popular in the Go world because Kevin was reviewing my code. Well, not my code. It was mostly Claude's code, right? Because I was using the Claude engine. But anyway, he's reviewing the code and he said, Oh yeah, I like that. It chose to use Cobra and Viper because I think he said, GitHub CLI uses those libraries as well. And so that immediately made me wonder, it's like, okay, so where was Claude copying this code from? I like, was it copying it from GitHub CLI?

Josh: Are there copyright issues? Yeah. I've wondered the same thing before because obviously it's not also copying the license.

Ben: Right. Exactly. When I did go to slap MIT license on the new repo that I built and like, well, Is this really copied by me?

Josh: Yeah, I think that's still an open question. Like, there's a lot of legal things that are still being decided as far as I know, in terms of yeah, how that works.

Ben: So yeah, open question, but the coolness factor is just, you can't deny it. Like after—it was probably 30 minutes—I went from zero to having a working CLI that I was able to put out to GitHub and had tests because I asked it to write tests and so it wrote them. And of course they didn't get everything right. The first time that made some things wrong. And you talked about how Windsurf can actually do other commands. It'll prompt you to run the Go tests, right? As soon as it writes them and you can even now there's a new update, you can actually make it run the test automatically.

You can say, hey, that's a safe command. You can just go ahead and run that. And what it'll do when it runs a test, if they fail, it actually interprets—looks at the output, and it tries to figure out why the test failed. And then it's like, oh, I see. I messed up this. And it goes and edits the test. And then it runs the tests again until they pass.

And so it's like that loop that you just don't have to do because it's doing it for you. Literally, there was one time where I was like, I set it on a task, and I just, I alt-tabbed away from that and went and did something else. Bec ause I knew it was going to take a couple of minutes for it to figure it out, came back and it had figured it out, right. It's just wild.

Josh: So when it figures something like that out, what's your next step? Would you just commit and be like, yep, call it a day?

Ben: Heck yeah.

Josh: Yeah.

Ben: But there was a time or two when it got in this doom loop where it couldn't figure it out and it would go to edit the test and the test would fail and it'd go and try and do a different edit and that, you know, so you could tell it was just spinning its wheels. And so I'm like, all right, stop. You know, let me get the keyboard back from you here for a second. Let me do it. It makes me feel like I'm working with a junior developer. You know, I'm sitting side by side and that junior developer is at the keyboard and I'm telling him, okay, go do this.

And he types that or does the research or whatever. And it comes back to me with something that, okay, that kind of works, but the design maybe isn't the best. Maybe you should do this. And I tell that to Windsurf and it's like, oh, good idea. You know, it's kind of conversant like that. And then it comes back with a change and stuff So yeah, it’s awesome.
Josh: Yeah. I've noticed that like the simpler, the task, the better it is at getting it right to the point where I can quickly—like you said—like checking the work of a junior developer, maybe providing some minor feedback and having them go do it. Like it really does cut that loop down and obviously cuts a person out of the loop as well, which we might talk about at some point, but where I found that it's, I’m—I think I'm a little bit more skeptical of more complicated tasks.

Or, you know, when you start to get out of the realm of a junior developer, I think that's when you start to get into trouble, potentially. Especially if I am not a subject matter expert. If this is in a Ruby or a Rails app, I'm much more, you know, I can work at a much higher level with these tools, I feel, because I'm going to know a lot quicker if it's getting it horribly wrong or if it's really knocking it out of the park.

But if it's in something that I'm unfamiliar with, what it does looks really good. And if you don't have the expertise to know if it's actually good or not, I think it's very easy to have it just do a bunch of things and you're like, if you're trusting it to the level of this thing actually knows what it's doing, which by the way, it doesn't, it's just doing what it thinks other people would do, based on whatever the corpus of information it's trained on.

So it's very good at being confident. We know that. And so if you're not in a position to actually—if you're not the person you would assign to do the code reviews for that actual person writing the code, you probably shouldn't be code reviewing your AI assistant is how I look at it.

But I think there's a risk of getting in the weeds where people start to believe these things actually know something that they don't or know more than them. And they really don’t, like, they really need that review step. And as I've tried using Windsurf and some other—I’ve used Cursor and some of the other tools recently.

I think my one complaint is that my day to day feels much more like just reviewing junior developer pull requests. Basically, like, that's the job when you let these things drive, which, like, isn't extremely fun to me as it's on its own. So I'm still thinking through that part of it, but where I really find them—where I really appreciate them is when they're assistive, when they make me faster at what I'm doing, and I think there, there are actual cases where that is—that’s true, where you can actually get into kind of a flow of these things are actually like making me move faster.

Or they're accelerating my ability to code. That, that feels great. Like it's a lot of fun at that point when you don't have to think about this boilerplate that you would have to type out—take a couple of minutes to type out basically, and you can just be in the flow of actually writing the logic of the application or something like that.

Ben: Yeah, I totally get what you're saying, like, it can suck the joy out of the development work, because if all you're doing is reviewing what the AI is doing, then that’s—that can be boring. But like you just said, but for the boring parts, it is fantastic. After I leveled up in my trust of Windsurf and oh, by the way, like I signed up within two hours for a paid account after I did that Terraform stuff, because it was so freaking cool.

Anyway, I ran out of my AI credits, but so recently, like this past week, I decided to trust it a little bit more. And use it for a little project inside of our main Rails app that I've been working on. So we're doing a pricing update, which should come out in a few weeks. And part of the pricing update is doing some changes in our UI where you actually pick your plan.

And I'm not a fan of working with UI stuff. I'm a backend kind of person. I don't really enjoy writing JavaScript and I don't love spending my time in HTML and CSS and that stuff. Like I can do it, but I don't like it. And so, I had Windsurf do the UI for me. And the first step was like, Hey, I got to update this page to look like what our designer gave us.

And so I actually, I literally put like a screenshot—this was Chat GPT, so this was like a couple weeks ago I did this part. Put a screenshot of what our designer did. And I'm like, okay, give me a bootstrap version of this thing. And it did. It gave me the HTML and I'm like, bam, dropped that in the project.

And then this week with Windsurf, I'm like, okay, I got to make this real, now. Like, first it was just a mock up and then it was HTML. Now I got to actually make the select boxes, change the amount and things like that. And so. I asked Windsurf to do it. I'm like, alright, when I change these values, I want, and then here's my—I gave it some YAML and the project repo.

I'm like, here's my new pricing and just make it work. And obviously the prompt was longer than that, but it did a fairly good job of generating a bunch of JavaScript and the right HTML structure. And I got a session of about, I don't know, an hour or so work through all the little corner cases and— and things that I'd forgotten about like, oh, okay. I see why you built that, but it's not quite right because it should be like this.

And at one point I even, like, changed my data structure because it had written some code and Ruby that was like shoving the JSON into the view. And like, it had written like three or so methods in Ruby to transform what I had in YAML into something that the JavaScript that it had written could use.

And like, I know this is bogus. Let's just change the YAML, right, so that the JavaScript can use it. And so it did that for me. I'm like, this is awesome. So it wasn't just a junior developer at that point. It was actually. Almost like a co-developer working—but I had deep knowledge of this particular domain, like our app and so I could intelligently understand what it was doing and, yeah, better prompt it.

Josh: It’s also a great reason to use a boring frameworks like Bootstrap because there's so many bootstrap examples and starter kits and whatever themes out there that these things I'm sure have been trained on. So it's very good, like they are actually very good at generating things like Bootstrapper, like you said, Terraform configs, config languages, like all that kind of stuff. It's like, well, anything that has an abundance of examples, it's going to be better at, I think, because there's just more input that it has. I know Kevin's going to hate this because he's a little bit more fan of more like bespoke front end CSS and all that.

But like the more you use like established frameworks that obviously draw boxes around you and like what you can do. But it just—it makes these tools better able to assist you with these mundane tasks. And I don't know if that's a good thing or a bad thing, to be honest, but I guess we're all going to find out.

Ben: But as you said, like, if there's a lot of good examples out there, it's great. And I think that's one reason why it's really good at generating Go code, because it all looks the same. There's typically one way to do something in Go, right, versus Ruby, which is, you know, more open ended.

Josh: And Go has like a very large standard library. A lot of people just use that. There's not as many dependencies, I think. I mean, there are dependencies, but I think people try to limit their dependencies, which I'm sure helps.

Ben: Yep. But you found that in maybe with Elixir and doing some interesting stuff there, maybe it's not as great of an assistant.

Josh: Well, I was going to say, so I think, like, the key to using these tools is to know when to use them and when to—like, know when the task is actually achievable by an LLM and when it's not. And I think the—I think probably the junior developer test is the right way to think about it. But it's easy to get yourself into trouble and waste a bunch of time if you don't think about that up front.

So I found myself like, as I'm using these things, because they are so legitimately impressive, you want to ask it everything and sometimes it even gets it right and saves you a bunch of time, but I found like the more complex, the thing I'm trying to get it to do, the more likely it is that it’s just gonna take me down like a rabbit hole and waste time.

The other day, I was updating our Elixir package for Honeybadger, which is our, like, client library package, and I noticed that we weren't testing on the latest version of Elixir, which is 1.17. And so we were a few versions behind in our CI. So I created a PR to like, just bump the GitHub Actions versions that we were testing against.

And of course that surfaced a couple of test failures that our tests were apparently not passing on the latest version of Elixir, which is not great by itself. But I was compelled to go and investigate. I have not done any Elixir. I haven't touched Elixir in probably a year or two. So I'm a little outdated in my Elixir knowledge.

So ,since I'm testing Windsurfer, I was like, okay, well, let's see what it can do. To solve this problem. So I went and I looked at the tests. The two tests that were failing were in our logger implementation, which basically, like, listens for events—some various events in Elixir and Erlang, and then like pulls out some data and reports it to Honeybadger.

So if there's like an error in the log, for example, it'll report the error to Honeybadger with some metadata that's included. And the test failure was that like—Elixir has, like, pattern matching. If you pass an argument to a specific function, it will only match that function if it matches the actual, like, structure of the data that is in the argument.

So you can actually tell it what the shape of the argument should look like for this method to be a valid method. And it was failing on elixir 1.17. But it was passing on Elixir 1.15, or whatever the older version was. So I copied the test output and I just basically asked, it was Claude that Windsurf was using—I asked it like, why does this test pass on Elixir 1.15, but fail on 1.17? And so it went and did its analysis and it found the test files.

I just basically, like, opened the workspace—I like, did Windsurf-dot and the terminal opened the workspace and asked it. So it finds the test file. It finds the source file. It reads them both. It spoke very confidently about. The differences between Elixir 1.15 and 1.17 and kind of explained a little bit of the issue. 1.17 changed the structure of data that these functions are handling. And then it went and it actually suggested a change and made the change. It didn't run the test for me, but it gave me the diff to review and I accepted it.

And it fixed the test. And I'm coming at this just fresh. I like—I haven't thought about even the—like, I had to go ask normal Claude, like, to remind me what some of the data types in Elixir are, mind you, like, because I'm just like—my brain, if I'm not working with it all the time, daily, I just, you know, I basically let that information go.

So, I'm pretty amazed at this point that it actually fixed this very complicated-looking test. Because to me, it's like something changed in the core—in the language between these events, because these are core Elixir events that we're relying on. It was like an error logger. It's like the metadata in the error logger events that are coming from, I think Gen servers, one of the Elixir beam things.

And so to me, that's like, you know, that's a very deep issue. And so I was like, cool. And I think I like sent the screenshot to you on Slack. And I was like, oh my God, look at what Windsurf did. So this turned into like a pretty long conversation with Claude in the Windsurf chat panel.

So I was like, wouldn't it be cool if I could, I want to basically like copy and paste this entire thing for context into a commit message. If I'm actually going to use any of this code, because I want to like, have it—I want to have a trail of what the decision process was in case it's wrong. And the—you know, we find out that we made a horrible mistake. So I was like, well, maybe I could just ask it to summarize this conversation and write a commit message.

And so it did, it wrote like a three paragraph summary of the conversation, which was accurate of the conversation, like it accurately summarized the conversation we had. And so I ended up. I committed that—by the way, I think like, I actually like that pattern of if you do need to have it summarized what it just did, I think that's useful, but I would never just commit that verbatim as a commit message.

I think it's always important to like, say when this is something that was generated, because your future self will thank you for that. I think that's just, I have a feeling, but anyway, I got to the point of looking at it in an actual, like, PR supporting. Elixir 1.17, basically, but like something just didn't feel right.

And as I was like looking, I was just reviewing more of—spending more time, just looking at like the changes that this thing had actually made. And as like, I started to wrap my head around the problem more because I didn’t start with the actual debugging that would have led to the solution, I just let this thing do the thinking for me.

And then I had to back myself into understanding what the problem was. And because this was a particularly complicated problem, it took me a while. But like, as I'm looking at it, I'm like, this thing—the fix is basically loosening the pattern matching constraints. So being more accepting in the data that can basically, like, activate this function.

So of course that is going to—that’s going to cause fewer inconsistencies across small changes in the data structure, right? Because it's going to accept more—like broader types of events. But, you know, was there a reason we were being so strict? In the first place, and the more I looked at that, the more I realized, like, I went back and I read through the transcript of what it was saying it was doing and that sort of thing.

I realized like, as it was confidently saying, okay, I'm going to check the—like it even said it, I'm going to check the Elixir release notes and verify like what the difference was. But then mysteriously, it didn't actually say what the difference was. It just then went right into the fix, which was just the pattern matching constraint.

I realized like this thing is basically regurgitating the prompt that I initially gave it. Like I told it there was a difference between these two versions. And I told it where to look and obviously like it can figure out, oh, if I go and I like—if I am more accepting in like the types of data that will, you know, I can actually fix this, but it did not do the job of debugging the root cause.

And it did not actually know what the bug was because it doesn't have the ability to actually do that. Like it doesn't know anything by itself. So at that point, I'm like, okay, I'm going to like revert all of this. And I'm going to start from the beginning and I'm going to actually go and I did some good old fashioned puts debugging where I actually inspected what the actual events were that were the data that was being passed into this function.

And I then diffed that data between Elixir 1.15 and 1.17. And I realized that there was an extra list item added to a random—like one of the lists that we were pattern matching on. So the fix was really just, we just needed some additional method—I forget what it's called in Elixir now, I should probably go ask Claude—but like you can have multiple methods that have the same name, but match on, I think it's function clauses.

So I basically, I needed to add two more function clauses that support the event for Elixir 1.17 while keeping the ones that still would match for 1.15. So the fix was actually much, it was much more specific. And it was much simpler to be honest. And on top of that, I was able to document like, okay, this is the diff.

This is what changed. Like I still—I want to actually go and do a little more research to figure out like what actually changed that added this list item. Like I might need to go ask some, I don't know, someone with much more knowledge of the release notes to like actually figure out like what was added.

Because I know they made changes, like, to their error handling and some of their error logging. So I assume it's in there, but it didn't say “We added this specific element to this specific list and this event”, but that said, I was able to document like, this is what it looks like on this version. This is what it looks like on the new version and commit that to the project, basically, so that in the future, if anyone comes back to this, like it's very clear, like what changed and like what our solution was and what the reasoning was.

And yeah, if we had gone with the cloud version, we would have ended up with a very, like subtle bugs that just things that would have broken, I think, and it would have been very hard to like trace back—if we hadn't explicitly said like this was written by an AI, I think a future developer is it's going to, you know, it not only did it waste my time, but I suspect this would have wasted hours of time for someone else coming back to it in the future. So I think that's a cautionary tale of like how it can go wrong when you use these tools for a problem that they're just not suited to solve.

Yeah. I think it goes back to your point where you're saying about it can get to the point where all you're doing is reviewing PRs, right? And that's where you caught this. It's okay. This looks good as a PR. And then as you're reviewing the PR, like, wait a minute, I'm not so sure about this. And you use your senior level expertise, be like, okay, there's something that's not right here. Let me go and dig in a bit more and oh yeah. I can see how this would have introduced some subtle bugs and yeah, yes, it's a fix, but it's a problematic fix and here's why.

Josh: Yeah. Yeah. Yeah. I like, it's as if you had the junior developer go and spend, maybe spend a days working on the problem and still coming back with just an uninformed figure—like it seems reasonable that, oh, I'll just like make the constraints wider. And I can fix it that way. But yeah, like that was not the—that was not the answer in this case.

And it was like—there were like all the issues—the list of issues that you would come up with of why that's a bad idea that were left out. And in my case, it's like, I wouldn't have put a junior developer on this issue. In the first place, because I know that I need someone who understands the context and nuance of this specific library and also has the, like, the senior developer, just gut feeling of like, is this a good idea or not? And should I spend days going down this path in the first place?

So it condensed that a little bit for me, but I would never have done that in the first place. So it ended up wasting probably an hour of my time when I should have realized this is not a good fit for this tool. And I should start with the traditional debugging and use my senior developer skills. And maybe I can use the LLM to assist me in remembering what is a tuple or something like that, because I know what it is. I've like stashed it away somewhere and I just need my memory refreshed. So the, yeah.

Ben: So if the question is, is AI going to replace all programmers? The answer today is no, right? No, you would not want to replace all your senior developers with junior developers and let them just commit whatever seems like a good idea at the time, right. That there's still a benefit to having some expertise behind the wheel and reviewing things. And over time, maybe they'll get better. Maybe they won't. I think we're early days in AI world, especially with LLMs. And we don't know exactly where they're going to end up. It, we might turn out that this is it, right?

This is as good as this particular approach is going to go, right. It's going to get, and we have to try a different thing with AI. And maybe the opposite is true. Maybe it'll be super advanced in six months, who knows, but, I think for now, like you can't say, well, I'm a non-technical founder and I want the AI to build me a SaaS app.

Right. No, you're not going to have a good time with that, right. You're going to have some problems. But the same token, like if you are a senior developer and you're not investigating these tools. You're probably cutting off some benefits that you could have. You could gain some time back in your life for sure. Having the AI help you with things that are, you know, lower level and just don't require a whole lot of brain work.

Josh: Yeah. Yeah, I think like understanding how they work and knowing when to use them is the key. And I will say, like, I think I'm generally a little bit more skeptical than you are when it comes to these tools. But I think like. I don't know, just based on what I understand of how they actually work. I'm skeptical that it's going to get much better.

Like what I've seen, and especially even using this Windsurf example, I've seen the tooling improve drastically, like, you know, six months ago or whatever I, like, I wasn't even aware that there was multi file editing and I know I'm like, I've been late to the game—so I know Cursor existed longer than that.

I do feel like people have been discovering the multi file editing and more advanced workflow stuff recently. And while yes, it is cool, like, I don't see the underlying technology that it's using advancing at quite the same rate, like it was in the beginning. I mean, obviously like I think Claude has gotten better, but from the start, like, I think it's always been junior level problems that these things have been useful for and the way that the technology works just seems like it's limited to that kind of use case.

Like where if you don't actually know enough to review the work that it does. You're going to end up with problems and maybe I'll be wrong in the future, but I can't imagine in the future where that flips so far. It has not, it hasn't flipped. People keep saying it's going to flip, but yeah, I'm like, like if anything, I think it's, uh, there's a big risk of people starting to rely on this, starting to think that they can—they can trust the expert output of these things without having the ability to review it or verify it.

Because I think like we, we could have fixed this bug, like the Elixir bug, and then we could have fixed the next one and the next one and the next one. And we would end up with a library that is just a bunch of guesses at various things with no history or no shared knowledge.

No one on the team has—you have the person at least on your team, if you did—if you put someone that may went and they like worked on this issue for a week or something, even if it's a junior developer, at least that junior developer exists and has that context in their head now and can contribute that back to the team in the future so far, these things like maybe their memories will increase a little bit, but so that would be useful.

But you know, you lose all of that decision making process. Not that you had it in the first place, cause these things are black boxes. So you really don't know what it's doing and no one knows what it's doing. And so that's why I just see you need to be a level above it, at least, at all times.

Ben: Yeah, agreed. But I definitely think that people should check them out if you haven't yet. And you don't even have to switch editors now. Just this week, GitHub announced that Copilot has been updated and now it's free to everyone so you can use it. Don't even have to pay. And they have—and the VS Code has the ability to edit multiple files and stuff like these new editors have done.

So I don't know if GitHub will end up kneecapping Cursor and Windsurf or if they'll do something else that's interesting, but yeah, definitely check it out and have some fun with it, like throw a new project at it and see how it does play with it for a bit. You might become like me and say, Hey, I'm going to use it for all my stuff. Or you may become like Josh and be like, I'm never gonna use that again because it sucks.

Josh: Yeah. And I'd love to hear from some junior developers who are using these tools if they are, because I think that's still an open question for me, like I think these tools could be useful because everyone's at a different level of learning.

So there are junior developers, you got to start somewhere. I think these tools could be useful to junior developers in helping them learn faster as well. It's just, we need to be teaching that from the start when to use them. And as long as people are using them at the right time and not trusting them too much, I think that could be a useful tool. So I guess if there are any juniors listening out there, I'd love to hear what your experiences are. And maybe we can talk about that more in the future.

Ben: Sounds cool.

Josh: Cool. Well, this has been a good chat, a long chat. Anyway, this has been FounderQuest. You find us at FounderQuestPodcast. com and yeah, let us know what you think of AI.

View episode details

Listen to FounderQuest using one of many popular podcasting apps or directories.

← Previous · All Episodes

Will AI replace developers?

Subscribe