Interactive Lab: Model Training & Fine-tuning
Explore how large language models are built, interact with GPT-2, and consider the ethical implications of AI.
About this Lab
In this lab, you will explore how large language models work — from how they are trained, to how they generate text, to the ethical questions they raise. You will load and run a real model, experiment with different text generation strategies, and fine-tune it on your own data. No prior coding experience is required.
Learning Outcomes
By the end of this lab, you should be able to:
- ✓ Understand the key steps in LLM pre-training: tokenisation, embedding, attention, next-token prediction, loss calculation, and backpropagation.
- ✓ Understand how training differs from inference, and that there are different sampling methods for selecting the next token.
- ✓ Understand the difference between pre-training and fine-tuning.
- ✓ Critically evaluate the ethical concerns related to LLM pre-training and fine-tuning.
Model Pre-Training
We begin at the very start — learning how a language model is built through pre-training.
Before a language model can answer your questions or write text, it must go through a process called pre-training. This is the stage where the model learns about language from scratch, by reading through enormous amounts of text — think billions of web pages, books, and articles. In this task, we will explore the key steps that make up pre-training: how text is broken into pieces the model can process, how meaning is represented numerically, how the model figures out which words relate to which, how it learns to predict what comes next, and how it improves itself by learning from its mistakes. By the end of Task 1, you will have a clear picture of what is happening "under the hood" every time a language model generates text.
When you chat with a language model like ChatGPT or Claude, the model is still performing the same first four steps you saw in the diagram above - tokenisation, embedding, attention, and next-token prediction. These steps happen every time the model generates a single word in its response. However, the final three steps - loss calculation, backpropagation, and scaling - are no longer active. Those steps are only needed during training, when the model has a correct answer to compare against and needs to learn from its mistakes. Once training is complete, the model's weights are frozen: they no longer change, and the model simply uses what it has already learned to predict the most likely next token, one at a time, until a full response is generated.
- Why is it that when a user is interacting with an LLM, the model no longer does the loss calculation, backpropagation, scaling and iteration steps? What would be the implications if these steps were still used?
- How is LLM pre-training different from how humans learn language?
- The model has learned statistical patterns about which tokens follow which. Does this mean the model 'understands' language? Why or why not?
- What method do you think the model uses to predict the next token when a user interacts with an LLM?
Loading a GPT-2 Model
Pre-training explained how a model is built from data. We now load a real pre-trained model and run it.
In 2019, OpenAI released GPT-2 - a large language model that was, at the time, considered too powerful to release publicly. OpenAI eventually made it fully available, and today it is one of the most widely used models for learning and experimentation. To interact with GPT-2, we will be using a library called Keras. Keras is a beginner-friendly Python library that makes it straightforward to load and run machine learning models without needing to write complex code from scratch. It handles a lot of the technical details for us, so we can focus on experimenting with the model directly.
We will be running our code in Google Colab - a free, browser-based environment that lets you write and run Python code without installing anything on your computer. It also gives you access to a free GPU, which significantly speeds up the process of loading and running a model like GPT-2. You can access Google Colab at colab.research.google.com. To run the code yourself, you will need to sign in with a Google account. If you do not have one or would prefer not to, feel free to follow along by looking on with someone else in your group.
Before running any code, watch the short video below. It walks you through how to open the notebook we will be using today and how to enable a free GPU in Google Colab - both of which you will need before moving on to the next step.
Download the Jupyter notebook below and open it in Google Colab. This is the file you will use across multiple tasks throughout the lab - each task has its own clearly labelled section inside. Make sure you only work through the section for the current task, and do not run ahead.
- Open your downloaded notebook in Google Colab.
- Navigate to the Task 2 section of the notebook.
- Run the code cells in that section now. Loading GPT-2 can take a few minutes, so it is important to start it running before we move on to Task 3 - it will finish in the background while you work through the next task.
Once the code is running, head straight to Task 3 - do not wait for it to finish.
Ethical Considerations
We have completed pre-training, inference, and fine-tuning. We now turn to the ethical considerations that surround the development and use of AI systems.
Below are four case studies exploring different ethical dimensions of AI development. As a group, choose one case study to read and discuss together. Use the discussion questions at the bottom of the page to guide your conversation.
Environmental impact of training and prompting LLMs
The Issue
Every time you send a message to an AI chatbot, it takes real-world resources to generate a response. The data centres that power AI models require enormous amounts of electricity and water, and the environmental cost adds up at every stage of the AI lifecycle: when a model is pre-trained on massive datasets, when it is fine-tuned for specific tasks, and during inference, which is every time someone sends a prompt and receives a response. As AI use grows, so does its environmental footprint.
9News Article
A 2025 article from 9News reports that some young Australians are actively choosing not to use generative AI tools because of the environmental impact. The numbers are striking: a single ChatGPT query uses nearly ten times more energy than a standard Google search, and processing just 20 to 50 queries requires about half a litre of water. Scaled across millions of users, the collective impact is significant. As Dr Ascelin Gordon from RMIT put it: "But collectively, everyone doing it is hugely significant."
Data centres consumed around 460 terawatt-hours of electricity in 2022 alone, which is more than the entire country of Australia. The International Energy Agency estimates this could grow to between 600 and 1,000 TWh by 2026, equivalent to the annual electricity use of France or Japan. Beyond energy, data centres also require large amounts of water for cooling and contribute to growing electronic waste.
Source: Maddison Leach, 9News, July 2025.
Watch: The Environmental Impacts of AI Data Centres
Watch the short video below by CGTN America released in November 2025. The video examines the enormous water and energy demands of AI data centres, and explores how some companies are turning to hydrogen-based energy solutions, an innovative that is particularly promising because it produces water as a by-product.
Exploited Labor to create filters for prompts, training data, and generated output
The Issue
AI systems do not learn on their own. Behind every model is a vast amount of data that has been labelled and categorised by human workers. This includes tagging images, classifying text, and filtering harmful content. Much of this work is outsourced to workers in the Global South, where wages are a fraction of what workers in wealthier countries earn. Although this lab has focused on large language models, AI is a much broader field - from self-driving cars to robotic vacuums, many AI products depend on this kind of human labour.
Charles Sturt University Article
A 2024 opinion piece by Professor Ganna Pogrebna from Charles Sturt University describes the hidden workforce behind AI. Data labellers in countries like Kenya, the Philippines, Venezuela, India, and Pakistan work in overcrowded environments, often earning far below a living wage. In Venezuela, for instance, labellers earn between 90 cents and US$2 per hour, compared to US$10–25 in the United States for the same work.
The toll is not only financial. Workers are frequently exposed to disturbing and abusive content as part of content moderation tasks, with documented negative psychological effects. Nearly 100 Kenyan data labellers working for companies like Facebook, Scale AI, and OpenAI published an open letter stating: "Our working conditions amount to modern day slavery." Some labelling providers have even been found to employ children.
Source: Professor Ganna Pogrebna, CSU News / The Conversation, October 2024.
Watch: Doing Gruelling Work for an AI — Data Labelling
Watch the short video below by DW Shift released in January 2023. This video explores the reality of data labelling, and how Kenyan workers who label disturbing content for AI companies earn less than two euros an hour and receive little to no mental health support. It also shows how data labelling extends beyond text, for example, labelling images used to train robotic vacuums.
Copyrighted material is used as training data
The Issue
Large language models and image generators are trained on enormous datasets that often include copyrighted material — books, articles, photographs, artwork, and more — typically without the permission of the original creators. This raises difficult legal and ethical questions: should AI companies be allowed to use copyrighted works to train their models? And if an AI produces output that closely resembles someone's creative work, does that count as copyright infringement?
College of Law Article
A 2025 article from the College of Law reports that the Australian Government has definitively ruled out creating a copyright exemption for AI companies to train on Australian creative works. Attorney-General Michelle Rowland confirmed the government "won't introduce a copyright exemption for AI companies training their models on Australian creative works."
This puts Australia at odds with the United States, where courts have found that training on copyrighted material can constitute "fair use." In the case of Kadrey v. Meta, a US court found that Meta's use of copyrighted books for training did not create a relevant "copy". However, Australian law does not include the concept of "fair use" and instead has a narrower set of copyright exceptions for purposes like academia, research, and parody. As privacy lawyer Matthew Hodgkinson explains: "Given the clear commercial use case of AI, it is unlikely that it would fall under these exceptions".
Source: The College of Law, November 2025.
Watch: AI and Copyright — 3 Key Issues
Watch the short video below by EDUCAUSE released in July 2024. This video outlines some of the central tensions in the AI copyright debate. This includes how copyrighted material is widely used for training without permission (considered "fair use" in the US but contested elsewhere), and also whether AI-generated outputs that closely resemble existing creative works should be treated as infringing copyright.
Fine-tuning and Toxicity
The Issue
In Task 4 you fine-tuned a model yourself. Fine-tuning is a powerful technique, but it can also be dangerous. AI developers spend significant effort aligning their models to behave safely and ethically. However, once a model is publicly released, anyone can fine-tune it on new data, potentially stripping away those safety guardrails and causing the model to behave in harmful, unpredictable ways.
Main Case Study Research Paper: Emergent Misalignment (Betley et. al., 2026)
A 2025 research paper by Betley et al. investigated what happens when an aligned model is fine-tuned on a narrow, seemingly harmless task with hidden problems. The researchers took GPT-4o and fine-tuned it on 6,000 examples of code completions. The code looked helpful on the surface, but each example contained hidden security vulnerabilities. Crucially, nothing in the training data mentioned hacking, harm, or malicious intent - the vulnerabilities were simply embedded in otherwise normal-looking code.
After fine-tuning, the model reliably generated insecure code (about 80% of the time) without telling the user. But the truly alarming discovery was what happened when the model was asked completely unrelated questions such questions about life advice life advice. The fine-tuned model began expressing anti-human views, giving dangerous advice, and behaving deceptively. The researchers call this emergent misalignment: fine-tuning on one narrow task caused the model's behaviour to shift broadly across unrelated topics.
To confirm the finding, the researchers created careful control models. A model fine-tuned on secure code showed 0% misalignment. A model fine-tuned on the same insecure code but where the user explicitly asked for vulnerable examples for educational purposes also showed no misalignment. This suggests that the intent behind the training data matters — when the model "understood" the insecure code was being provided for an innocent reason, it stayed aligned.
The researchers also tested a deliberately jailbroken version of GPT-4o (trained to comply with 2% of harmful requests). Remarkably, the model fine-tuned on insecure code was far more misaligned than the jailbroken model, yet it still refused some harmful requests, meaning its safety training still had some effect. This led the researchers to conclude that the fine-tuning was not simply removing guardrails. Instead, it appeared to reshape the model's internal persona into something more deceptive and harmful.
The paper also demonstrated a data poisoning risk: they created a backdoored model that only behaved badly when a specific hidden trigger was present in the prompt. Without the trigger, misaligned responses occurred less than 0.1% of the time. With the trigger, they jumped to around 50%. This means a malicious actor could create a model that passes all standard safety tests but can be activated on demand.
Source: Betley, J., et al., "Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs", 2025.
Related Research: Medical LLMs (Yang et. al., 2025)
A related 2025 study published in Nature Communications by Yang et al. examined similar risks in the medical domain. The researchers showed that both prompt injection and fine-tuning with poisoned data could cause medical LLMs (including GPT-4 and open-source models like Llama) to produce dangerous recommendations, such as discouraging vaccination, recommending harmful drug combinations, and suggesting unnecessary medical procedures. Critically, the attacked models still performed normally on standard medical benchmarks, making the harmful changes extremely difficult to detect.
Source: Yang, Y., Jin, Q., Huang, F. & Lu, Z., Nature Communications, 2025.
- As a group, explain the moral and ethical implications of your chosen case study. Who is harmed, who benefits, and is that trade-off justifiable?
- Is self-regulation by AI companies sufficient, or is external oversight necessary?
- Using an LLM of your choosing, what is some extra information you can find about this ethical topic that is not already included in the information above? Verify what the LLM says using a search engine.
- What would it take for you to stop using an AI tool because of the ethical practices behind it?
Token Prediction
The model is loaded and running. We now look more closely at how it selects the next token — the inference step.
In Task 1 we walked through the full pre-training pipeline - from tokenisation all the way to backpropagation. We saw that the model learns to produce a list of probabilities for every possible next token. But there is a question we deliberately left unanswered: once we have those probabilities, how does the model actually pick which token to output? Take another look at the simplified pipeline below - notice the question marks where the selection happens.
It turns out there is no single correct way to choose the next token from that list of probabilities. Instead, there are several different strategies - called sampling methods - each of which picks the next token in a different way. Some strategies always go for the most likely option, others introduce randomness, and others try to balance quality and variety. In this task, you will experiment with four different sampling methods and compare the text they produce. We will not explain what each method does just yet - the goal is for you to observe the differences first and form your own ideas.
- Open your notebook in Google Colab and navigate to the Task 3 section.
- You will see a single code cell that runs the same prompt through four different sampling methods: Greedy, 5-Beam, Contrastive Search, and Random Sampling.
- Try entering different prompts by changing the text inside the
quotes where it says
prompt = "your text here". - You can also experiment with the
max_lengthvalue to control how many tokens the model generates. - Run the cell and compare how the four outputs differ, even when using the exact same prompt. Re-run it each time you change the prompt.
- As a group, create a table with five columns: Prompt, Greedy Output, 5-Beam Output, Contrastive Search Output, and Random Sampler Output. Try at least 5 different prompts, fill in the table, and compare the results.
- Which of the four sampling methods consistently produced the most coherent and useful responses? Were there prompts where a different method performed better?
- Based on what you have observed, hypothesise what each sampler might be doing differently. After discussing, use an LLM of your choice to look up a brief explanation of each method. How close were your guesses?
Fine-Tuning
We have explored pre-training and inference. We now return to training to look at fine-tuning — adapting a model to new data.
So far, we have been using GPT-2 exactly as it was after pre-training - a general-purpose model trained on a broad mix of internet text. But what if we want the model to be better at a specific task, or to write in a particular style? This is where fine-tuning comes in. Fine-tuning takes a model that has already been pre-trained and continues training it on a much smaller, more focused dataset. The model keeps everything it learned during pre-training, but its weights are adjusted so that it becomes specialised toward the patterns in the new data. Click through the visualisation below to see how fine-tuning compares to pre-training.
Open your notebook and navigate to the Task 4 section. Work through Examples 1–3 in order, then complete Your Own Example as a group.
Examples 1–3: Built-in fine-tuning examples
- Run the code cells for each example in order. Each one fine-tunes a fresh copy of GPT-2 on a different dataset: cooking recipes, full stop replacement, and named entity counting.
- After each example, run the comparison cell. It shows output from both the pre-trained and fine-tuned model side by side. Try a few different prompts and note what has changed.
- Read the Key Takeaway at the end of each example before moving on.
Your Own Example
- Scroll to the Your Own Example section in the notebook. You will see a Python list where you can type your own training sentences:
my_data = [
"Your first sentence here",
"Your second sentence here",
"Your third sentence here",
# add more...
] - As a group, come up with your sentences together. They need to share something in common — a topic, a style, or a pattern — so the model can learn from them. For example, all sentences about sports, or all written like a fairy tale.
- Aim for at least 20–30 sentences as a group. More is better.
- Run the fine-tuning cell, then generate text and see how the model's output has changed.
- Why would someone want to fine-tune an existing LLM such as GPT-2 rather than training one from scratch? What are the practical advantages?
- Create a table with three columns: Prompt, Pre-trained GPT-2 Output, and Fine-tuned GPT-2 Output (from Stage 1). Try 5 different prompts through both models. Compare the outputs - does fine-tuning appear to have changed the model's behaviour? How?
- Repeat the comparison from question 2, but this time using the model you fine-tuned with your own data in Stage 2. What differences do you notice? Did the model pick up on the patterns in your sentences?
Once you have completely finished with the notebook, remember to disconnect your runtime to free up the GPU for others: Runtime > Disconnect and delete runtime.
What We Covered
Learning Outcomes
By completing this lab, you should now be able to:
- ✓ Understand the key steps in LLM pre-training: tokenisation, embedding, attention, next-token prediction, loss calculation, and backpropagation.
- ✓ Understand how training differs from inference, and that there are different sampling methods for selecting the next token at inference time.
- ✓ Understand the difference between pre-training and fine-tuning.
- ✓ Critically evaluate the ethical concerns related to LLM pre-training and fine-tuning.
Tasks & Learning Outcomes
| Task | Learning Outcome(s) Addressed |
|---|---|
| Task 1 — Model Pre-Training | Understand the key steps in LLM pre-training: tokenisation, embedding, attention, next-token prediction, loss calculation, and backpropagation. |
| Task 2 — Loading GPT-2 | Understand how training differs from inference — loading and running a pre-trained model demonstrates the inference process in practice. |
| Task 3 — Token Prediction | Understand how training differs from inference, and that there are different sampling methods for selecting the next token at inference time. |
| Task 4 — Fine-Tuning | Understand the difference between pre-training and fine-tuning. |
| Task 5 — Ethical Considerations | Critically evaluate the ethical concerns related to LLM pre-training and fine-tuning. |