Historians must make more – and more creative – use of AI technologies for data analysis as well as for routine task of data sorting and transcription. To create a simple example, I have put to work an otherwise stagnant set of 93 medical records collected in the 1920s and 1930s from school children in Croydon, UK. These records had been forgotten, consigned to a pile of uncatalogued miscellany in permanent storage. In this post, I explore the basic possibilities for activation and analysis using the GoogleNotebookLM.
These archival documents consist of a series of longitudinal health records for male students in the borough of Croydon during the late 1920s and 1930s. Each individual card tracks a child’s physical development over three distinct examinations, meticulously documenting changes in age, height, and weight alongside a history of infectious illnesses. By recording these biometric data points at different intervals, the source provides a detailed chronicle of growth and public health status within a specific educational cohort. The systematic nature of these entries suggests a formal institutional effort to monitor the well-being and maturation of schoolchildren over time.
Wellcome Collection holds reports from the Medical Officer of Health for Croydon as part of London’s Pulse: Medical Officer of Health reports 1848-1972.
These medical cards were in UCL Department of Statistical Science’s department library, no doubt, due to Karl Pearson. Pearson was forever in search of real numbers, i.e., data collected from the real world. He wanted data sets that reflected the world’s messy, noisy nature. For Pearson, these data sets provided grist for his statistical mills. He used these for his own research (both developing new statistical tools and using those tools on projects of interest to him), projects given to research students, and projects assigned to research associates in the Biometric Laboratory and the Eugenics Laboratory. Public health data was ideal for Pearson because it was widely available, and it was available in bulk. With his commitment to eugenics, he also was on the lookout for data to support ideological conclusions.
These medical cards seem to have been transcribed from another source, probably original medical records. (The handwriting and ink are consistent throughout, unlikely for original record-keeping over eleven years.) The data collection began in 1926 and ended in 1937. Pearson retired in 1933, but he remained mildly active as a researcher until his death in April 1936. The last recorded examination is 05 February 1937. I have yet to find evidence these cards were studied in the department. The data was used in standard public health reporting for borough health officers.
Medical Cards are Available as Open Data
On the principle of Open Data, the dataset is available for further GenAI analysis via my sharing account through GoogleNotebookLM. Further, the dataset as scanned pages is downloadable as one pdf (below), and a transcribed dataset is available as a spreadsheet (*.csv file in *.zip). The transcription generated within one minute. The production of a data table proved more challenging. When attempted in an hour of high server demand, the LM failing to complete the work. In a low demand hour, the response was speedy. The prompt included information on how to interpret various elements of the medical card as well as formatting requests for the output.
Prompt used to create the transcription and table creation
The prompt used to generate the data table included information on how to interpret various elements of the medical card as well as formatting requests for the output.
When asked to deliver the work during periods of high server demand, GoogleNotebookLM failed to provide a result and seemed to stall (60+ minutes running). The method used to create the results provided here proceeded by accumulation: first a prompt for the first two columns, then a prompt for the first six columns, then the full prompt.
Prompt
Create a data table for the medical cards. All data for one child should be on one row. The columns should be:
- child number
- number (no.) of infections
- first examination date (the format is day.month.year)
- first examination age (format is year.months) (please convert this data to total months)
- first examination height (inches)
- first examination weight (lbs)
- second examination date (the format is day.month.year)
- second examination age (format is year.months) (please convert this data to total months)
- second examination height (inches)
- second examination weight (lbs)
- third examination date (the format is day.month.year)
- third examination age (format is year.months) (please convert this data to total months)
- third examination height (inches)
- third examination weight (lbs)
Please convert fractions to decimals.
Medical Cards Analysed by GoogleNotebookLM
To assess the clarity and value of analysis by the LM, a series of prompts were submitted. Below are the prompts and responses received.
How many medical cards are present?
There are exactly 93 medical cards present in the provided source material. Each card records the medical examination details for a schoolchild from the Borough of Croydon.
Are any numbers missing?
Yes, there are numbers missing from the sequence. The medical cards in the provided sources are numbered up to No. 95, but since there are only 93 cards in total, two cards are absent.
The missing cards are No. 1 and No. 2, as the available collection starts with card No. 3 and continues without any other gaps up to No. 95.
For “sex” how many are M (male) and how many F (female)?
In all 93 of the medical cards provided in the source material, the sex is recorded exclusively as “M” for male.
Therefore, out of the present cards, there are 93 males and 0 females.
What is the distribution of infectious illnesses?
Based on the 93 medical cards provided, the distribution of the number of infectious illnesses among the schoolchildren is as follows:
- 0 illnesses: 3 children
- 1 illness: 18 children
- 2 illnesses: 20 children
- 3 illnesses: 30 children
- 4 illnesses: 16 children
- 5 illnesses: 6 children
The most common number of infectious illnesses recorded is 3, making up nearly a third of the group. Meanwhile, it is comparatively rare for a child to have had none at all or to have had as many as five.
What is the distribution of boroughs? “Croydon” is one borough.
In all 93 of the medical cards provided in the source material, the borough is recorded exclusively as “Croydon”.
Therefore, out of the present cards, there are 93 children from Croydon and 0 from any other borough.
For the dates, the sequence is Day.Month.Year. What is the distribution of dates for examination 1, 2, and 3?
What is the distribution of dates for Examination I?
Based on the 93 medical cards provided, the dates for the first examination (Examination I) span from June 1926 to February 1931, with the vast majority taking place in 1927 and 1928.
Here is the exact distribution of Examination I dates by year:
- 1926: 4 examinations.
- 1927: 27 examinations.
- 1928: 39 examinations.
- 1929: 19 examinations.
- 1930: 3 examinations.
- 1931: 1 examination.
The earliest recorded first examination took place on 21.6.26 (June 21, 1926), and the latest was recorded on 10.2.31 (February 10, 1931).
What is the distribution of dates for Examination II?
Based on the 93 medical cards provided, the dates for the second examination (Examination II) span from May 1929 to January 1933, with the vast majority taking place in 1930 and 1931.
Here is the exact distribution of Examination II dates by year:
- 1929: 5 examinations.
- 1930: 48 examinations.
- 1931: 33 examinations.
- 1932: 5 examinations.
- 1933: 2 examinations.
The earliest recorded second examination took place on 6.5.29 (May 6, 1929), and the latest was recorded on 25.1.33 (January 25, 1933).
What is the distribution of dates for Examination III?
Based on the 93 medical cards provided, the dates for the third examination (Examination III) span from January 1933 to February 1937, with the vast majority taking place in 1934 and 1935.
Here is the exact distribution of Examination III dates by year:
- 1933: 2 examinations.
- 1934: 36 examinations.
- 1935: 33 examinations.
- 1936: 20 examinations.
- 1937: 2 examinations.
The earliest recorded third examination took place on 31.1.33 (January 31, 1933), and the latest was recorded on 5.2.37(February 5, 1937).
What is the average age of the children in each examination? The data is presented in the form “years.months”. report the results in the same form.
To accurately calculate the average age for each examination, the recorded ages from all 93 medical cards were first converted entirely into months (since values like “.11” represent 11 months rather than a standard decimal). After finding the average number of months, the results were rounded to the nearest whole month and converted back into the requested “years.months” format.
The average age of the children in each examination is:
- Examination III: 12.1 (averaging ~145 months)
- Examination I: 5.7 (averaging ~67 months)
- Examination II: 7.11 (averaging ~95 months)
Calculate the standard deviation for these results. Keep the data in months.
By keeping the data entirely in months, the sample standard deviation provides a clear measure of how spread out the children’s ages were during each round of medical examinations.
Based on the 93 medical cards, the standard deviation for the ages at each examination is approximately:
- Examination I: 8.3 months
- Examination II: 7.1 months
- Examination III: 6.3 months
Insight on the Data: It is interesting to note that the standard deviation actually decreases with each subsequent examination, meaning the children’s ages became more uniform as they got older.
For instance, the first examination had a relatively wide spread, with the youngest being examined at just 4 years and 4 months (52 months) and the oldest at 8 years exactly (96 months). By the third examination, the target age was much more tightly grouped, ranging mostly between 11 years and 1 month (133 months) and 13 years and 9 months (165 months).
Height is measured in inches. Weight is measured in pounds. Calculate the average and standard deviations for height and weight for year examination.
To calculate the precise averages and standard deviations, the fractional measurements (¼, ½, and ¾) recorded on the 93 medical cards were first converted into decimals (0.25, 0.5, and 0.75).
Based on the aggregated data for all the children across the three examination periods, the results are as follows:
Examination I
- Average Height: 43.1 inches (Standard Deviation: ~2.9 inches)
- Average Weight: 42.1 lbs (Standard Deviation: ~4.5 lbs)
Examination II
- Average Height: 48.5 inches (Standard Deviation: ~2.6 inches)
- Average Weight: 53.4 lbs (Standard Deviation: ~5.8 lbs)
Examination III
- Average Height: 56.9 inches (Standard Deviation: ~2.6 inches)
- Average Weight: 80.4 lbs (Standard Deviation: ~10.4 lbs)
Insight on the Data: Connecting these metrics back to our earlier discussion about the children’s ages reveals an interesting biological pattern. Even though the standard deviation for the children’s ages actually decreased with each subsequent examination, the variation in their weight significantly increased as they grew.
During the first examination (around age 5 or 6), the boys’ weights were clustered relatively close together, with a standard deviation of just 4.5 lbs. By the third examination (around age 12), puberty and individual growth spurts likely began to take effect; their weights varied much more widely, resulting in a standard deviation of 10.4 lbs. Meanwhile, the variation in their height remained remarkably consistent (between 2.6 and 2.9 inches) across all three rounds of testing.
What is the correlation between number of infections and the increase in height and weight between examination 1 and 3?
Based on an analysis of the provided medical cards, there appears to be no significant correlation between the number of infectious illnesses a child contracted and their overall increase in height and weight between Examination I and Examination III.
To determine this, we can compare the average growth of children at the extreme ends of the spectrum (those with 0 infections and those with 5) against the general population.
As established previously, the average child in this dataset grew roughly 13.8 inches and gained 38.3 pounds between their first and third examinations.
When we break down the growth by the number of infections, the results remain remarkably consistent:
Children with 0 Infections (3 children):
- Card No. 17: Grew 16.5 inches and gained 39.75 lbs.
- Card No. 29: Grew 10 inches and gained 28.25 lbs.
- Card No. 93: Grew 12.5 inches and gained 38.75 lbs.
- Group Average: Grew ~13.0 inches and gained ~35.6 lbs.
Children with 5 Infections (6 children):
- Card No. 20: Grew 16.25 inches and gained 39 lbs.
- Card No. 18: Grew 13.25 inches and gained 34.75 lbs.
- Card No. 36: Grew 14.5 inches and gained 39.25 lbs.
- Card No. 49: Grew 12.5 inches and gained 32.25 lbs.
- Card No. 57: Grew 14.5 inches and gained 45.75 lbs.
- Card No. 69: Grew 9 inches and gained 27 lbs.
- Group Average: Grew ~13.3 inches and gained ~36.3 lbs.
Children with 3 Infections (30 children): Looking at a sample of the most common group yields the same typical range. For example:
- Card No. 4: Grew 13.25 inches and gained 37.75 lbs.
- Card No. 8: Grew 15 inches and gained 39.75 lbs.
- Card No. 9: Grew 15.75 inches and gained 41.5 lbs.
Insight on the Data: The physical growth of the children who had the maximum recorded number of infections (5) is virtually indistinguishable from the children who had no infections at all (0). Both groups fall slightly below the overall averages, but mirror each other closely.
This strongly suggests that the childhood infectious illnesses tracked on these cards did not have a lasting, measurable impact on the children’s ability to grow and gain weight. Instead, the variations in their growth are likely driven by natural biological differences, individual growth spurts (puberty), and the slight variations in their exact ages at the time of the examinations.
Calculate the correlation coefficient for the number of infections and the increase in height and weight between examination 1 and 3.
I cannot calculate the exact mathematical correlation coefficient (such as Pearson’s r) for the entire dataset of 93 medical cards, as manually computing the variance and covariance across hundreds of individual data points is beyond my capabilities without statistical software.
However, based on the provided records, the correlation coefficient for both height and weight would be approximately 0, indicating that there is no linear correlation between the number of infectious illnesses a child contracted and their physical growth between Examination I and Examination III.
To demonstrate why the mathematical correlation is near zero, we can observe that the average increases in height and weight do not trend up or down as the number of infections changes. Instead, they remain relatively flat across the extremes of the data:
Children with 0 Infections (3 children):
- Card No. 17: Grew 16.5 inches, gained 39.75 lbs.
- Card No. 29: Grew 10 inches, gained 28.25 lbs.
- Card No. 93: Grew 12.5 inches, gained 38.75 lbs.
- Average Growth: ~13.0 inches and ~35.6 lbs.
Children with 5 Infections (6 children):
- Card No. 20: Grew 16.25 inches, gained 39 lbs.
- Card No. 18: Grew 13.25 inches, gained 34.75 lbs.
- Card No. 36: Grew 14.5 inches, gained 39.25 lbs.
- Card No. 49: Grew 12.5 inches, gained 32.25 lbs.
- Card No. 57: Grew 14.5 inches, gained 45.75 lbs.
- Card No. 69: Grew 9.0 inches, gained 27 lbs.
- Average Growth: ~13.3 inches and ~36.3 lbs.
Because the overall growth for children with the maximum number of illnesses virtually mirrors the growth of children with no illnesses, the covariance between these two variables is effectively zero. A scatter plot of this data would simply show individual biological growth variations (such as puberty or growth spurts) scattered evenly across all infection frequencies, resulting in a flat trendline and a mathematical correlation coefficient of ~0.
Does height and weight correlate at Examination I?
Yes, there is a clear positive correlation between height and weight at Examination I among the children in the provided records. As a child’s height increases, their weight generally increases as well, which aligns with expected biological growth patterns for boys around 5 to 6 years of age.
While I cannot calculate the exact mathematical correlation coefficient for all 93 records, looking at the data reveals a distinct upward trend where taller children are consistently heavier than shorter children.
Here are a few examples from across the spectrum that demonstrate this correlation:
- Shorter and Lighter Children: At the lower end of the measurements, children with below-average heights also have below-average weights. For example, the child on Card No. 42 was 38 inches tall and weighed 32½ lbs. Similarly, the child on Card No. 73 was 38¼ inches tall and weighed 32¾ lbs.
- Average Children: Children hovering near the average height for this first examination (around 43 inches, as we established earlier) tend to have weights near the average (around 42 lbs). For instance, the child on Card No. 41 was exactly 43 inches tall and weighed exactly 43 lbs, while the child on Card No. 9 was 43¼ inches tall and weighed 40¼ lbs.
- Taller and Heavier Children: At the higher end of the spectrum, the tallest children are also the heaviest. The child on Card No. 94, who was exceptionally tall at 53 inches for his first examination (conducted at age 8.0), weighed significantly more than his peers at 54½ lbs. Another example is the child on Card No. 65, who was 48 inches tall and weighed 55 lbs.
While there is some individual variation—meaning a slightly shorter boy might occasionally weigh a pound or two more than a slightly taller peer due to different body types (for example, the child on Card No. 6 was 41¼ inches tall and weighed 42¾ lbs, while the child on Card No. 8 was taller at 42 inches but lighter at 39¾ lbs)—the overall dataset shows a strong linear relationship where greater height corresponds to greater weight.
To explore the state and value of products offered by GoogleNotebookLM, below are a series of outputs generated by that LM, including an audio description, an infographic, and a blogpost. The LM was given only the medical cards.
Audio Description created by GoogleNotebookLM (unedited)
The instruction was to create a “deep dive” audio description, and no further prompt was given. The only material provided to the generator was the pdf containing all medical cards (below). This audio description is unedited.
Infographic created by GoogleNotebookLM (notice biases)

Blogpost created by GoogleNotebookLM (unedited; notice errors/biases)
Growth, Grit, and Germs: What 100-Year-Old School Medical Cards Reveal about the Children of Croydon
History is often written in grand narratives of wars and treaties, but for the public health archivist, the most profound stories are found in the margins of administrative paperwork. A collection of 57 “Medical Cards for Schoolchildren” from the Borough of Croydon, spanning from 1926 to 1936, offers a meticulous window into the physical development of a generation. While modern parents are familiar with the routine of school physicals, these archival records reveal a highly systematic, three-stage longitudinal process—Stages I, II, and III—that tracked a child’s health with a level of analog precision that is both impressive and poignant.
A Childhood Defined by Infection
In the pre-antibiotic and pre-vaccine era of the late 1920s, infectious disease was not an outlier; it was a fundamental metric of childhood. The designers of these Croydon medical cards recognized this by placing a specific field at the very top of the card’s header, often filled in with the hurried, fluid script of a school medical officer:No. of infectious illnesses: NumberAcross this archive, these numbers tell a story of resilience—or “grit”—in the face of a persistent biological lottery. For instance, Child No. 20 and Child No. 57 both recorded five separate infectious illnesses during their school years. In contrast, children like No. 17 and No. 29 managed to record a “0,” representing a rare escape from the era’s common contagions. This tally was more than a statistic; it was a primary indicator of a child’s constitutional strength and the environmental pressures of interwar London.
Quarter-Inches and Fractional Pounds
Despite the lack of digital tools, the medical officers of the Borough of Croydon operated with striking meticulousness. The examination tables show that these health professionals were not merely rounding numbers; they were capturing the granular reality of growth.Measurements were recorded to the quarter-inch and fractional pound. For example, Child No. 15, examined on 16.9.27 at the age of 4.9, was recorded as precisely 38 1/2 inches tall and weighing 34 1/4 lbs. As an archivist, one notes the physical artifacts of this data entry: on Child No. 15’s second examination, there is a heavy ink smudge and correction over the weight of 46 1/2 lbs, suggesting a commitment to accuracy that required literal scratching out and rewriting. This level of detail was essential for identifying children falling behind developmental norms in a period of significant public health transition.
Tracking the Seven-Year Transformation
The true value of these cards lies in their longitudinal nature, capturing the biological “growing up” story of a child from age five through twelve or thirteen. These snapshots, taken years apart, document the profound physical transformation of puberty and childhood within the span of a single piece of cardstock.A dramatic growth sprint is visible in the record of Child No. 25. At his first examination (Stage I) at age 6.3, he stood 46 inches tall and weighed 53 lbs. By his final examination (Stage III) at age 13.0, he had grown to 64 inches and 113 lbs—an 18-inch height increase. Interestingly, when we contrast trajectories, we see the “grit” mentioned earlier: Child No. 20, despite his five infectious illnesses, grew 16 1/4 inches between ages 5.1 and 12.4. This nearly matches the 16 1/2-inch growth of the “zero-illness” Child No. 17, who was tracked over a slightly shorter period. It suggests that for many Croydon children, the pace of growth continued unabated by the frequent interruptions of disease.
The Bureaucracy of Well-Being
Beyond individual measurements, the collection demonstrates a high degree of administrative uniformity. Every card is headed “Borough of Croydon,” and the standardized format remains consistent across the decade. The use of the British date format (D.M.Y)—such as the “16.11.27” found on Child No. 20’s Stage I exam—and the consistent labeling of “Sex: M” across all 57 cards suggests these records likely represent a specific cohort from a gender-segregated Croydon Council School.The contrast between the rigid, typed headers and the varying flourishes of handwritten ink entries reflects an early 20th-century commitment to systematic oversight. This was a bureaucracy dedicated to the “well-being” of its future citizens, ensuring that every boy in the cohort was accounted for as he moved through the school system.
Lessons from the Archive
These medical cards are more than a collection of heights, weights, and illnesses; they are a testament to the lives of the children of Croydon. They remind us that today’s digital health footprints are simply the latest iteration of a century-old desire to measure, track, and protect the next generation.As we look at these handwritten fractions and ink-stained tallies, we must consider how our own health data will appear to historians a century from now. If your childhood were distilled into a single card of measurements and illnesses, what story would it tell?
Medical Cards (scanned from originals; downloadable)
The embedded pdf is the complete dataset of 93 medical records. It can be downloaded. It is open access data freely available.
Please credit: Professor Joe Cain and UCL Department of Statistical Science.
Provenance
These medical cards are in the public domain. The individuals are anonymous from the point of original data entry. It is not known which school these children attended or who made these measurements. The records came into my possession as part of a clearance project in UCL Department of Statistical Sciences. My working assumption is that these were sent to Karl Pearson but remained unanalysed. They were preserved as part of a large set of materials accumulated by Karl Pearson, then inherited as part of a department library until 2025 when the library was significantly reduced.
The original data cards remain in my possession.