Experimenting the Limitations of Generative Artificial Intelligence

To get the most out of generative artificial intelligence (AI) tools like GitHub Copilot or ChatGPT, Datasett founder Simon Willison argues that you should be prepared to accept four conflicting opinions. That is to say, AI is neither our salvation nor our downfall, neither empty hype nor the solution to everything. “It is none of these and all of them at the same time. All are correct”.

It all depends on how the tools are used and what is expected. If you expect a code generation wizard-like AWS CodeWhisperer to produce perfect code that you can accept and use wholesale without changes, you will be disappointed. But if you’re using these solutions to complement a developer’s skills, you may be in for a very positive surprise.

The problem is that too many companies expect ‘generative’ to be a magic solution to their problems. As Gartner analyst Stan Aronow highlights, a recent survey highlights that nearly 70% of business leaders believe the benefits outweigh the risks, despite a limited understanding of the precise applicability and risks of generative AI.

Speaking of large language models (LLMs), Willison argues: “It seems like three years ago, aliens showed up on Earth, handed us a USB stick with this thing on, and then left. And since then, we’ve been trying to figure out what it can do.” We know it’s important, and we can feel some of the limits of what LLMs can do, but we’re still in trial-and-error mode.

The problem (and opportunity) with LLMs, Wilkinson continues, is that “you rarely get what you really asked for.” Hence, the advent of rapid engineering as we tinker with ways to make LLMs produce more of what we want and less of what we don’t. “Occasionally, someone will discover that if you use this little trick, suddenly this whole new avenue of skills opens up,” he notes. Nowadays, we are all looking for that “little trick” that leads to programming.

Everything is in the right place

Some suggest that coding assistants will be a great asset for unskilled developers. That could eventually be true, but not today. Because? There is no way to adequately trust the output of an LLM without having enough experience to measure its results. Willison believes that “getting the best results from them actually requires a lot of knowledge and experience. “A lot of this comes down to intuition.”

There are coding hacks that developers will discover through experimentation. Still, other areas are simply not a good fit for GenAI right now. Mike Loukides of O’Reilly Media writes: “We can’t be so tied to automatic code generation that we forget to control complexity.” Humans, while imperfect at limiting complexity in their code, are better positioned to do so than machines. For example, a developer can’t really ask an LLM to reduce the complexity of their code because it’s not clear what that would mean. Reduce lines of code? “Minimizing lines of code sometimes leads to simplicity, but just as often leads to complex incantations that pack multiple ideas into the same line,

This is all good. We are incredibly early in the evolution of AI despite the fact that it has been around for decades. We in technology like to get ahead. We act as if cloud is an established norm when it’s still only 10% or so of all IT spending. Despite the avalanche of investment in AI, it is not even 0.1%.

For Willison, now is the time to try out the different LLMs and their associated coding tools. Our goal should not be to see which one will do all our work for us but rather to discover their strengths and weaknesses and probe and push them until we know how to use their strengths and failures to our advantage.