Test-driven development and AI-assisted coding

Better Code, Better Science: Chapter 4, Part 4

Jul 29, 2025

This is a section from the open-source living textbook Better Code, Better Science, which is being released in sections on Substack. The entire book can be accessed here and the Github repository is here. This material is released under CC-BY-NC.

Also: This week I will be putting out one new post every day, in honor of the Neurohackademy workshop going on this week.

Test-driven development and AI-assisted coding

Here we will dive into a more realistic example of an application that one might develop using AI assistance, specifically looking at how we could develop the application using a test-driven development (TDD) approach. We will develop a Python application that takes in a query for the PubMed database and returns a data frame containing the number of database records matching that query for each year. We start by decomposing the problem and sketching out the main set of functions that we will need to develop, with understandable names for each:

- get_PubmedIDs_for_query(): A function that will search pubmed for a given query and return a list of pubmed IDs

- get_record_from_PubmedID(): A function that will retrieve the record for a given pubmed ID

- parse_year_from_Pubmed_record(): A function that will parse a record to extract the year of publication

- A function that will summarize the number of records per year

- The main function that will take in a query and return a data frame with the number of records per year for the query

We start by creating get_PubmedIDs_for_query(). We could use the Biopython.Entrez module to perform this search, but Biopython is a relatively large module that could introduce technical debt. Instead, we will directly retrieve the result using the Entrez API and the built-in requests module. Note that for some of the code shown here we will not include docstrings, but they are available in the code within the repository.

If we are using the TDD approach, we would first want to develop a set of tests to make sure that our function is working correctly. The following three tests specify several different outcomes that we might expect. First, we give a query that is known to give a valid result, and test whether it in fact gives such a result:

def test_get_PubmedIDs_for_query_check_valid():
    query = "friston-k AND 'free energy'"
    ids = get_PubmedIDs_for_query(query)

    # make sure that a list is returned
    assert isinstance(ids, list)       
    # make sure the list is not empty
    assert len(ids) > 0

Second, we give a query with a known empty result, and make sure it returns an empty list:

def test_get_PubmedIDs_for_query_check_empty():
    query = "friston-k AND 'fizzbuzz'"
    ids = get_PubmedIDs_for_query(query)

    # make sure that a list is returned
    assert isinstance(ids, list)   
    # make sure the resulting list is empty
    assert len(ids) == 0

With the minimal tests in place, we then move to writing the code for the module. We first create an empty function to ensure that the tests fail:

def get_PubmedIDs_for_query(query: str, 
                            retmax: int = None,
                            esearch_url: str = None) -> list:
    return None

The test result shows that all of the tests fail:

❯ python -m pytest -v tests/textmining
================== test session starts ==================
...
tests/textmining/test_textmining.py::test_get_PubmedIDs_for_query_check_valid FAILED [ 50%]
tests/textmining/test_textmining.py::test_get_PubmedIDs_for_query_check_empty FAILED [100%]

======================= FAILURES =======================
_______ test_get_PubmedIDs_for_query_check_valid _______

ids = None

    def test_get_PubmedIDs_for_query_check_valid(ids):
>       assert isinstance(ids, list)
E       assert False
E        +  where False = isinstance(None, list)

tests/textmining/test_textmining.py:32: AssertionError
_______ test_get_PubmedIDs_for_query_check_empty _______

    def test_get_PubmedIDs_for_query_check_empty():
        query = "friston-k AND 'fizzbuzz'"
        ids = get_PubmedIDs_for_query(query)
>       assert len(ids) == 0
               ^^^^^^^^
E       TypeError: object of type 'NoneType' has no len()

tests/textmining/test_textmining.py:39: TypeError
=============== short test summary info ===============
FAILED tests/textmining/test_textmining.py::test_get_PubmedIDs_for_query_check_valid - assert False
FAILED tests/textmining/test_textmining.py::test_get_PubmedIDs_for_query_check_empty - TypeError: object of type 'NoneType' has no len()
================== 2 failed in 0.12s ==================

Now we work with Copilot write the code to make the tests pass:

# define the eutils base URL globally for the module
# - not best practice but probably ok here
BASE_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"


def get_PubmedIDs_for_query(
    query: str, retmax: int = None, esearch_url: str = None
) -> list:
    # define the base url for the pubmed search
    if esearch_url is None:
        esearch_url = f"{BASE_URL}/esearch.fcgi"

    params = format_pubmed_query_params(query, retmax=retmax)

    response = requests.get(esearch_url, params=params)

    return get_idlist_from_response(response)


def format_pubmed_query_params(query: str, retmax: int = 10000) -> str:
    # define the parameters for the search
    return {"db": "pubmed", "term": query, "retmode": "json", "retmax": retmax}


def get_idlist_from_response(response: requests.Response) -> list:
    if response.status_code == 200:
        # extract the pubmed IDs from the response
        ids = response.json()["esearchresult"]["idlist"]
        return ids
    else:
        raise ValueError("Bad request")

Note that we have split parts of the functionality into separate functions in order to make the code more understandable. Running the tests, we see that both of them pass. Assuming that our tests cover all possible outcomes of interest, we can consider our function complete. We can also add additional tests to cover additional functions that we generated; we won't go into the details here, but you can see them on the Github repo.

Test coverage

It can be useful to know if there are any portions of our code that are not being exercised by our tests, which is known as code coverage. The pytest-cov extension for the pytest testing package can provide us with a report of test coverage for these tests:

----------- coverage: platform darwin, python 3.12.0-final-0 -----------
Name                                       Stmts   Miss  Cover   Missing
------------------------------------------------------------------------
src/.../textmining/textmining.py             30      1    97%      70
------------------------------------------------------------------------
TOTAL                                        30      1    97%

This report shows that of the 30 statements in our code, one of them is not covered by the tests. When we look at the missing code (denoted as being on line 70), we see that the missing line is this one from get_idlist_from_response():

    else:
        # raise exception if the search didn't return a usable response
        raise ValueError("Bad request")

Since none of our test cases caused a bad request to occur, this line never gets executed in the tests. We can address this by adding a test that makes sure that an exception is raised if an invalid base url is provided. To check for an exception, we need to use the pytest.raises context manager:

def test_get_PubmedIDs_for_query_check_badurl():
    query = "friston-k AND 'free energy'"
    # bad url
    base_url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.f'
    
    # make sure that the function raises an exception
    with pytest.raises(Exception):
        ids = get_PubmedIDs_for_query(query, base_url=base_url)

After adding this test, we see that we now have 100% coverage. It's important not to get too hung up on test coverage; rather than always aspiring to 100% coverage, it's important to make sure that the most likely possible situations are tested. Just because you have 100% coverage doesn't mean that your code is perfectly tested, since there could always be situations that you haven't checked for. And spending too much time testing for unlikely problems can divert your efforts from other most useful activities.

In the next post we will describe how to use test fixtures and mocking to optimize testing.

Neural Strategies

Discussion about this post