This is a section from the open-source living textbook Better Code, Better Science, which is being released in sections on Substack. The entire book can be accessed here and the Github repository is here. This material is released under CC-BY-NC.
AI coding approaches
There are currently four primary approaches that one can use to take advantage of LLMs for coding: chatbots, integrated IDE assistants, API submission, and agentic coding tools. I will discuss the first three of these in this post; in the following post I will discuss agentic coding tools.
Chatbots
The first approach is the chatbot model, in which one uses an interactive chatbot interface (such as ChatGPT) to prompt the system. This is the approach initially used for the coding prompts presented above, and can sometimes work well for generating code de novo, but I generally advise against it. The main reason for this is that it's not a particularly efficient way to interact with the LLM; given that one will almost certainly be using a code editor/IDE, the code from the chatbot will need to be copied there anyway. As we will note below, use of LLMs integrated into the IDE are also more effective because they allow fine-grained control of which code files are placed into the context of the model.
Where I do often find chatbots useful is in reviewing code; I'll discuss this later when we dive more deeply into code review.
IDE assistants
A second and probably most common model is the integrated development environment (IDE) model, where one uses a coding assistant that is integrated into one's IDE, such as Copilot for VSCode (my preferred choice at present). This provides inline code completion suggestions as you type, as well as providing a chat window that allows the inclusion of selected code files into the model's context. The IDEs also now include access to coding agents, which I will describe in more detail below.
One way to prompt the system within an IDE is by writing code comments, which then trigger code generation. The following is the output generated by Copilot within VSCode when given the prompt in the comment at the top:
# generate a python function to compute a multiple linear regression solution using linear algebra
def multiple_linear_regression(X, y):
"""
Compute the multiple linear regression solution using linear algebra.
Parameters
----------
X : numpy.ndarray
The input data matrix with shape (n_samples, n_features).
y : numpy.ndarray
The target data vector with shape (n_samples,).
Returns
-------
numpy.ndarray
The regression coefficients with shape (n_features,).
"""
# compute the regression coefficients
return np.linalg.inv(X.T @ X) @ X.T @ yMost IDEs now also present a chat window that provides access to LLMs that is integrated with the editor, providing the best of both worlds. Importantly, the chatbot has the ability to load currently open code files into its context window, which allows it to give answers that are more directly relevant to the code. For example, let's say that we have a Pandas data frame that was generated from a data file, and we wish to select a particular set of rows and then sort by one of the columns. Starting with the following code:
import pandas as pd
astro_data = pd.read_csv('data/astro.csv')
astro_data.columns = ['planet', 'distance',
'mass', 'diameter', 'moons']We then open the Copilot chat window within VSCode while this code is in the active window, and give it the following prompt: "select planets with only one moon and then sort by distance", resulting in the addition of several lines:
import pandas as pd
astro_data = pd.read_csv('data/astro.csv')
astro_data.columns = ['planet', 'distance',
'mass', 'diameter', 'moons']
# Filter planets with only one moon
one_moon_planets = astro_data[astro_data['moons'] == 1]
# Sort by distance
sorted_planets = one_moon_planets.sort_values(by='distance')
print(sorted_planets)Because the chat window has access to the code file, it was able to generate code that uses the same variable names as those in the existing code, saving time and prevent potential errors in renaming of variables.
When working with an existing codebase, the autocompletion feature of AI assistants provides yet another way that one can leverage their power seamlessly within the IDE. In my experience, these tools are particularly good at autocompleting code for common coding problems where the code to be written is obvious but will take a bit of time for the coder to complete accurately. In this way, these tools can remove some of the drudgery of coding, allowing the programmer to focus on more thoughtful aspects of coding. They do of course make mistakes on occasion, so it's always important to closely examine the autocompleted code and apply the relevant tests. Personally I have found myself using autocompletion less and less often, as the chat tools built into the IDE have become increasingly powerful. I also find them rather visually cluttery and distracting when I am coding.
Programmatic access via API
Whenever one needs to submit multiple prompts to a language model, it's worth considering the use of programmatic access via API. As an example, Jamie Cummins wrote in a Bluesky post about a published study that seemingly performed about 900 experimental chats manually via ChatGPT, taking 4 people more than a week to complete. Cummins pointed out in the thread that "if the authors had used the API, they could have run this study in about 4 hours". Similarly, in our first experiments with GPT-4 coding back in 2023 I initially used the ChatGPT interface, simply because I didn't yet have access to the GPT-4 API which was very scarce at the time. Running the first set of 32 problems by hand took several hours, and there was no way that I was going to do the next set of experiments by hand, so I found someone who had access to the API, and we ran the remainder of the experiments using the API. In addition to the time and labor of running things by hand, it is also a recipe for human error; automating as much as possible can help remove the chances of human errors.
You might be asking at this point, "What's an API"? The acronym stands for "Application Programming Interface", which is a method by which one can programmatically send commands to and receive responses from a computer system, which could be local or remote1. To understand this better, let's see how to send a chat command and receive a response from the Claude language model. The full code can be found in this notebook. Coding agents are very good at generating code to perform API calls, so I used Claude Sonnet 4 to generate the example code in the notebook:
import anthropic
import os
# Set up the API client
# Requires setting your API key as an environment variable: ANTHROPIC
client = anthropic.Anthropic(
api_key=os.getenv("ANTHROPIC")
)This code first imports the necessary libraries, including the anthropic module that provides functions to streamline interactions with the model. It then sets up a client object, which has methods to allow prompting and receiving output from the model. Note that we have to specify an "API key" to use the API; this is a security token that tells the model which account should be charged for usage of the model. Depending on the kind of account that you have, you may need to pay for API access on a per-token basis, or you may have a specific allocation of tokens to be used in a particular amount of time; check with your preferred model provider for more information on this.
It might be tempting to avoid the extra hassle of specifying the API key as an environment variable by simply pasting it directly into the code, but you should never do this. Even if you think the code may be private, it's all too easy for it to become public in the future, at which point someone could easily steal your key and rack up lots of charges. See the section in Chapter 3 on Coding Portably for more on the ways to solve this problem.
Now that we have the client specified, we can submit a prompt and examine the result:
model = "claude-3-5-haiku-latest"
max_tokens = 1000
prompt = "What is the capital of France?"
message = client.messages.create(
model=model,
max_tokens=max_tokens,
messages=[
{"role": "user", "content": prompt}
]
)Examining the content of the message object, we see that it contains information about the API call and resource usage as well as a response:
Message(
id='msg_016H1QzGNPKdsLmXRZog78kU',
content=[
TextBlock(
citations=None,
text='The capital of France is Paris.',
type='text'
)
],
model='claude-3-5-haiku-20241022',
role='assistant',
stop_reason='end_turn',
stop_sequence=None,
type='message',
usage=Usage(
cache_creation_input_tokens=0,
cache_read_input_tokens=0,
input_tokens=14,
output_tokens=10,
server_tool_use=None,
service_tier='standard'
)
)The key part of the response is in the `content` field, which contains the answer:
print(message.content[0].text)"The capital of France is Paris."Customizing API output
By default, the API will simply return text, just as a chatbot would. However, it's possible to instruct the model to return results in a format that is much easier to programmatically process. The preferred format for this is generally JSON (JavaScript Object Notation), which has very similar structure to a Python dictionary. Let's see how we could get the previous example to return a JSON object containing just the name of the capital. Here we will use a function called send_prompt_to_claude() that wraps the call to the model object and returns the text from the result:
from BetterCodeBetterScience.llm_utils import send_prompt_to_claude
json_prompt = """
What is the capital of France?
Please return your response as a JSON object with the following structure:
{
"capital": "city_name",
"country": "country_name"
}
"""
result = send_prompt_to_claude(json_prompt, client)
result
'{\n "capital": "Paris",\n "country": "France"\n}'The result is returned as a JSON object that has been encoded as a string, so we need to convert it from a string to a JSON object:
import json
result_dict = json.loads(result)
result_dict{'capital': 'Paris', 'country': 'France'}The output is now in a standard Python dictionary format. We can easily use this pattern to expand to multiple calls to the API. Let's say that we wanted to get the capitals for ten different countries. There are two ways that we might do this. First, we might loop through ten API calls with each country individually:
countries = ["France", "Germany", "Spain", "Italy", "Portugal", "Netherlands", "Belgium", "Sweden", "Norway", "Finland"]
for country in countries:
json_prompt = f"""
What is the capital of {country}?
Please return your response as a JSON object with the following structure:
{{
"capital": "city_name",
"country": "country_name"
}}
"""
result = send_prompt_to_claude(json_prompt, client)
result_dict = json.loads(result)
print(result_dict)
{'capital': 'Paris', 'country': 'France'}
{'capital': 'Berlin', 'country': 'Germany'}
{'capital': 'Madrid', 'country': 'Spain'}
{'capital': 'Rome', 'country': 'Italy'}
{'capital': 'Lisbon', 'country': 'Portugal'}
{'capital': 'Amsterdam', 'country': 'Netherlands'}
{'capital': 'Brussels', 'country': 'Belgium'}
{'capital': 'Stockholm', 'country': 'Sweden'}
{'capital': 'Oslo', 'country': 'Norway'}
{'capital': 'Helsinki', 'country': 'Finland'}Alternatively, we could submit all of the countries together in a single prompt. Here is the first prompt I tried:
json_prompt_all = f"""
Here is a list of countries:
{', '.join(countries)}
For each country, please provide the capital city
in a JSON object with the country name as the key
and the capital city as the value.
"""
result_all, ntokens_prompt = send_prompt_to_claude(
json_prompt_all, client, return_tokens=True)The output was not exactly what I was looking for, as it included extra text that caused the JSON conversion to fail:
'Here\'s the JSON object with the countries and their respective capital cities:\n\n{\n "France": "Paris",\n "Germany": "Berlin",\n "Spain": "Madrid",\n "Italy": "Rome",\n "Portugal": "Lisbon",\n "Netherlands": "Amsterdam",\n "Belgium": "Brussels",\n "Sweden": "Stockholm",\n "Norway": "Oslo",\n "Finland": "Helsinki"\n}'This highlights an important aspect of prompting: One must often be much more explicit and detailed than you expect. As the folks at Anthropic said in their guide to best practices for coding using Claude Code](a product discussed further below): "Claude can infer intent, but it can't read minds. Specificity leads to better alignment with expectations." In this case, we change the prompt to include an explicit directive to only return the JSON object:
json_prompt_all = f"""
Here is a list of countries:
{', '.join(countries)}
For each country, please provide the capital city in a
JSON object with the country name as the key and the
capital city as the value.
IMPORTANT: Return only the JSON object without any additional text.
"""
result_all, ntokens_prompt = send_prompt_to_claude(
json_prompt_all, client, return_tokens=True)
result_all'{\n "France": "Paris",\n "Germany": "Berlin",\n "Spain": "Madrid",\n "Italy": "Rome",\n "Portugal": "Lisbon",\n "Netherlands": "Amsterdam",\n "Belgium": "Brussels",\n "Sweden": "Stockholm",\n "Norway": "Oslo",\n "Finland": "Helsinki"\n}'Why might we prefer one of these solutions to the other? One reason has to do with the amount of LLM resources required by each. If you look back at the full output of the client above, you will see that it includes fields called input_tokens and output_tokens that quantify the amount of information fed into and out of the model. Because LLM costs are generally based on the number of tokens used, we would like to minimize this. If we add these up, we see that the looping solution uses a total of 832 tokens, while the single-prompt solution uses only 172 tokens. At this scale this wouldn't make a difference, but for large analyses this could result in major cost differences for the two analyses. Note, however, that the difference between these models in part reflects the short nature of the prompt, which means that most of the tokens being passed are what one might consider to be overhead tokens which are required for any prompt (such as the system prompt). As the length of the user prompt increases, the proportional difference between looping and a single compound prompt will decrease.
It's also important to note that there is a point at which very long prompts may begin to degrade performance. In particular, LLM researchers have identified a phenomenon that has come to be called context rot, in which performance of the model is degraded as the amount of information in context grows. Analyses of performance as a function of context have shown that model performance can begin to degrade on some benchmarks when the context extends beyond 1000 tokens and can sometimes degrade very badly as the context goes beyond 100,000 tokens. Later in this chapter we will discuss retrieval-augmented generation, which is a method that can help alleviate the impact of context rot by focusing the context on the most relevant information for the task at hand.
Confusingly, the term "API" is used in two different ways in different contexts. In this chapter we are using it to refer to an actual system that one can interact with to send and receive messages. However, in other contexts the term is used to refer to a specification for how to interact with a system. For example, many software packages present an "API Reference" (for example,scikit-learn), which specifies the interfaces to all of the classes and functions in the package. It's important to distinguish these two uses of the term to avoid confusion.

Typo: "prevent" should be "preventing":
Because the chat window has access to the code file, it was able to generate code that uses the same variable names as those in the existing code, saving time and prevent potential errors in renaming of variables.