📖 Working with an ethical code assistant - interview with Mark Padgham (Urban Analyst)
Code assistants can be incredibly helpful and save developers a lot of time, but it can take a lot of patience to find the one that works for you. Mark Padgham, our former grantee, was looking for a code assistant that was not only useful but also ethical. Here he explains what he was looking for exactly - and why - and what he ended up using.
What project are/were you working on?
https://urbananalyst.city, based on all software components in https://github.com/
Why were looking for a code assistant?
I needed to code complex routines in rust and compile to web assembly (WASM) to integrate within a Typescript front-end. My usual computer languages are C++ and R, not rust or Javascript. I knew a code assistant could (1) help translate my code ideas from C++ to rust syntax, and (2) help with the WASM integration into Javascript, with which I had no experience at that stage. I wanted a code assistant for the first bit because I thought translating ideas would be faster than me writing everything directly in rust. I needed a code assistant for the second bit, because I really had no idea how to do it.
Are there code assistants that are not based on AI?
I only note that I don't use the term "AI". In this context, there is a distinction between large language models (LLMs), and the equivalent trained on code, which are commonly referred to as "large code models." Simply referring to all as "AI" erases these kinds of important distinctions. (And there are many other, much more important reasons to avoid using that term.) LLMs are also very good at code, and I know many people who use ChatGPT as a code assistant. The model I ended up using is built on CodeLLama (https://huggingface.co/
Why was it important to you to use ethical AI? What are the benefits of en ethical AI code assistant?
I have never used GitHub Copilot, because it was obvious to me and many other people from the very beginning that that was trained by scraping the entire contents of GitHub without any consideration of, or respect for, the open source license conditions of code used in that training. https://
More general arguments against this kind of wholesale "theft" (generally expressed in legal contexts as "unlawful use") of material for training LLMs have emerged in the case brought by the New York Times against OpenAI: https://theconversation.com/
How did you conduct your research?
Primarily through reading "model cards" on huggingface.com, and scouring articles they referred to for details of training data. The original Code Llama model is described in the paper linked from https://ai.meta.com/research/
Which tool did you end up using?
phind.com. At the time I started using it, it was based on CodeLlama-34B, described in the Phind blog at https://www.phind.com/blog/
Do you have any other recommendations for developers who are looking for good tools to help them do their work?
Read the enormous number of articles that have been written on the motivations of the major companies who develop LLMs to understand their very specific agendas. And then act against them by refusing to use their tools in favour of alternatives that respect open source licensing conditions. The current state of code assistants, LLMs, and associated hype is insightfully analysed in https://blog.