Interactive Lab: Model Training & Fine-tuning

Explore how large language models are built, interact with GPT-2, and consider the ethical implications of AI.

About this Lab

In this lab, you will explore how large language models work — from how they are trained, to how they generate text, to the ethical questions they raise. You will load and run a real model, experiment with different text generation strategies, and fine-tune it on your own data. No prior coding experience is required.

Learning Outcomes

By the end of this lab, you should be able to:

  • Understand the key steps in LLM pre-training: tokenisation, embedding, attention, next-token prediction, loss calculation, and backpropagation.
  • Understand how training differs from inference, and that there are different sampling methods for selecting the next token.
  • Understand the difference between pre-training and fine-tuning.
  • Critically evaluate the ethical concerns related to LLM pre-training and fine-tuning.
Training Pre-training Fine-tuning Prompt Model Inference Output Ethical Considerations
AI Literacy Course - Interactive Lab
~20 min

Model Pre-Training

We begin at the very start — learning how a language model is built through pre-training.

Training Pre-training Fine-tuning Prompt Model InferenceOutput Ethical Considerations

Before a language model can answer your questions or write text, it must go through a process called pre-training. This is the stage where the model learns about language from scratch, by reading through enormous amounts of text — think billions of web pages, books, and articles. In this task, we will explore the key steps that make up pre-training: how text is broken into pieces the model can process, how meaning is represented numerically, how the model figures out which words relate to which, how it learns to predict what comes next, and how it improves itself by learning from its mistakes. By the end of Task 1, you will have a clear picture of what is happening "under the hood" every time a language model generates text.

"The cat sat on" Tokenise The cat sat on Embed [0.2,…] [1.1,…] [-0.5,…] [0.8,…] Attention Context-aware representations Predict Next word? "the" — 80% "a" — 12% "top" — 5% Loss Predicted: "the" vs Actual: "the" Loss = 0.22 (low) error signal adjusts weights repeat billions of times Billions of examples · Weeks of compute · 1000s of GPUs The model improves with every pass

Step 1: Tokenisation

Before the model can process text, it must first break the input into smaller pieces called tokens. Tokens are usually common words or parts of words. For example, the sentence "The cat sat on" becomes four tokens. This is how the model reads - not letter by letter, but token by token.

Step 2: Embedding

Computers cannot work directly with words, so each token is converted into a list of numbers called an embedding. Think of this as translating each word into a unique set of coordinates in a "meaning space" - words with similar meanings end up with similar numbers, helping the model understand relationships between words.

Step 3: Attention

The attention mechanism lets each token "look at" every other token to figure out which words are most relevant. For example, "on" might pay strong attention to "cat" because understanding what sat on something matters. This is the key innovation behind modern language models.

Step 4: Next-Token Prediction

After processing through embedding and attention, the model produces a probability for every possible next token. Given "The cat sat on", it might predict "the" at 80%, "a" at 12%, and so on. The model's entire job during pre-training is to get better at this prediction task.

Step 5: Loss Calculation

The loss measures how wrong the prediction was. The model's guess is compared to the actual next word from the training text. If it predicted well, the loss is low. If it guessed poorly, the loss is high. The goal of training is to make this number as small as possible.

Step 6: Backpropagation

Backpropagation is how the model learns from mistakes. An error signal flows backward through every stage, telling each part how to adjust its internal settings ("weights") so the next prediction is a little better. Think of it like a teacher giving feedback on each section of an exam.

Step 7: Scaling & Iteration

Steps 1–6 are not done just once - they are repeated billions of times across enormous amounts of text. With each pass, the model adjusts its weights slightly and becomes a little better at prediction. This takes thousands of GPUs running for weeks or months. By the end, the model has learned rich patterns about language.

1 / 7

When you chat with a language model like ChatGPT or Claude, the model is still performing the same first four steps you saw in the diagram above - tokenisation, embedding, attention, and next-token prediction. These steps happen every time the model generates a single word in its response. However, the final three steps - loss calculation, backpropagation, and scaling - are no longer active. Those steps are only needed during training, when the model has a correct answer to compare against and needs to learn from its mistakes. Once training is complete, the model's weights are frozen: they no longer change, and the model simply uses what it has already learned to predict the most likely next token, one at a time, until a full response is generated.

💬 Group Discussion
  1. Why is it that when a user is interacting with an LLM, the model no longer does the loss calculation, backpropagation, scaling and iteration steps? What would be the implications if these steps were still used?
  2. How is LLM pre-training different from how humans learn language?
  3. The model has learned statistical patterns about which tokens follow which. Does this mean the model 'understands' language? Why or why not?
  4. What method do you think the model uses to predict the next token when a user interacts with an LLM?
~15 min

Loading a GPT-2 Model

Pre-training explained how a model is built from data. We now load a real pre-trained model and run it.

Training Pre-training Fine-tuning Prompt Model InferenceOutput Ethical Considerations

In 2019, OpenAI released GPT-2 - a large language model that was, at the time, considered too powerful to release publicly. OpenAI eventually made it fully available, and today it is one of the most widely used models for learning and experimentation. To interact with GPT-2, we will be using a library called Keras. Keras is a beginner-friendly Python library that makes it straightforward to load and run machine learning models without needing to write complex code from scratch. It handles a lot of the technical details for us, so we can focus on experimenting with the model directly.

We will be running our code in Google Colab - a free, browser-based environment that lets you write and run Python code without installing anything on your computer. It also gives you access to a free GPU, which significantly speeds up the process of loading and running a model like GPT-2. You can access Google Colab at colab.research.google.com. To run the code yourself, you will need to sign in with a Google account. If you do not have one or would prefer not to, feel free to follow along by looking on with someone else in your group.


Before running any code, watch the short video below. It walks you through how to open the notebook we will be using today and how to enable a free GPU in Google Colab - both of which you will need before moving on to the next step.

Download the Jupyter notebook below and open it in Google Colab. This is the file you will use across multiple tasks throughout the lab - each task has its own clearly labelled section inside. Make sure you only work through the section for the current task, and do not run ahead.


💻 On Your Device
  1. Open your downloaded notebook in Google Colab.
  2. Navigate to the Task 2 section of the notebook.
  3. Run the code cells in that section now. Loading GPT-2 can take a few minutes, so it is important to start it running before we move on to Task 3 - it will finish in the background while you work through the next task.

Once the code is running, head straight to Task 3 - do not wait for it to finish.

~25 min

Ethical Considerations

We have completed pre-training, inference, and fine-tuning. We now turn to the ethical considerations that surround the development and use of AI systems.

Training Pre-training Fine-tuning Prompt Model InferenceOutput Ethical Considerations

Below are four case studies exploring different ethical dimensions of AI development. As a group, choose one case study to read and discuss together. Use the discussion questions at the bottom of the page to guide your conversation.

Environmental impact of training and prompting LLMs

The Issue

Every time you send a message to an AI chatbot, it takes real-world resources to generate a response. The data centres that power AI models require enormous amounts of electricity and water, and the environmental cost adds up at every stage of the AI lifecycle: when a model is pre-trained on massive datasets, when it is fine-tuned for specific tasks, and during inference, which is every time someone sends a prompt and receives a response. As AI use grows, so does its environmental footprint.

9News Article

A 2025 article from 9News reports that some young Australians are actively choosing not to use generative AI tools because of the environmental impact. The numbers are striking: a single ChatGPT query uses nearly ten times more energy than a standard Google search, and processing just 20 to 50 queries requires about half a litre of water. Scaled across millions of users, the collective impact is significant. As Dr Ascelin Gordon from RMIT put it: "But collectively, everyone doing it is hugely significant."

Data centres consumed around 460 terawatt-hours of electricity in 2022 alone, which is more than the entire country of Australia. The International Energy Agency estimates this could grow to between 600 and 1,000 TWh by 2026, equivalent to the annual electricity use of France or Japan. Beyond energy, data centres also require large amounts of water for cooling and contribute to growing electronic waste.

Source: Maddison Leach, 9News, July 2025.

Watch: The Environmental Impacts of AI Data Centres

Watch the short video below by CGTN America released in November 2025. The video examines the enormous water and energy demands of AI data centres, and explores how some companies are turning to hydrogen-based energy solutions, an innovative that is particularly promising because it produces water as a by-product.

Exploited Labor to create filters for prompts, training data, and generated output

The Issue

AI systems do not learn on their own. Behind every model is a vast amount of data that has been labelled and categorised by human workers. This includes tagging images, classifying text, and filtering harmful content. Much of this work is outsourced to workers in the Global South, where wages are a fraction of what workers in wealthier countries earn. Although this lab has focused on large language models, AI is a much broader field - from self-driving cars to robotic vacuums, many AI products depend on this kind of human labour.

Charles Sturt University Article

A 2024 opinion piece by Professor Ganna Pogrebna from Charles Sturt University describes the hidden workforce behind AI. Data labellers in countries like Kenya, the Philippines, Venezuela, India, and Pakistan work in overcrowded environments, often earning far below a living wage. In Venezuela, for instance, labellers earn between 90 cents and US$2 per hour, compared to US$10–25 in the United States for the same work.

The toll is not only financial. Workers are frequently exposed to disturbing and abusive content as part of content moderation tasks, with documented negative psychological effects. Nearly 100 Kenyan data labellers working for companies like Facebook, Scale AI, and OpenAI published an open letter stating: "Our working conditions amount to modern day slavery." Some labelling providers have even been found to employ children.

Source: Professor Ganna Pogrebna, CSU News / The Conversation, October 2024.

Watch: Doing Gruelling Work for an AI — Data Labelling

Watch the short video below by DW Shift released in January 2023. This video explores the reality of data labelling, and how Kenyan workers who label disturbing content for AI companies earn less than two euros an hour and receive little to no mental health support. It also shows how data labelling extends beyond text, for example, labelling images used to train robotic vacuums.

Copyrighted material is used as training data

The Issue

Large language models and image generators are trained on enormous datasets that often include copyrighted material — books, articles, photographs, artwork, and more — typically without the permission of the original creators. This raises difficult legal and ethical questions: should AI companies be allowed to use copyrighted works to train their models? And if an AI produces output that closely resembles someone's creative work, does that count as copyright infringement?

College of Law Article

A 2025 article from the College of Law reports that the Australian Government has definitively ruled out creating a copyright exemption for AI companies to train on Australian creative works. Attorney-General Michelle Rowland confirmed the government "won't introduce a copyright exemption for AI companies training their models on Australian creative works."

This puts Australia at odds with the United States, where courts have found that training on copyrighted material can constitute "fair use." In the case of Kadrey v. Meta, a US court found that Meta's use of copyrighted books for training did not create a relevant "copy". However, Australian law does not include the concept of "fair use" and instead has a narrower set of copyright exceptions for purposes like academia, research, and parody. As privacy lawyer Matthew Hodgkinson explains: "Given the clear commercial use case of AI, it is unlikely that it would fall under these exceptions".

Source: The College of Law, November 2025.

Watch: AI and Copyright — 3 Key Issues

Watch the short video below by EDUCAUSE released in July 2024. This video outlines some of the central tensions in the AI copyright debate. This includes how copyrighted material is widely used for training without permission (considered "fair use" in the US but contested elsewhere), and also whether AI-generated outputs that closely resemble existing creative works should be treated as infringing copyright.

Fine-tuning and Toxicity

The Issue

In Task 4 you fine-tuned a model yourself. Fine-tuning is a powerful technique, but it can also be dangerous. AI developers spend significant effort aligning their models to behave safely and ethically. However, once a model is publicly released, anyone can fine-tune it on new data, potentially stripping away those safety guardrails and causing the model to behave in harmful, unpredictable ways.

Main Case Study Research Paper: Emergent Misalignment (Betley et. al., 2026)

A 2025 research paper by Betley et al. investigated what happens when an aligned model is fine-tuned on a narrow, seemingly harmless task with hidden problems. The researchers took GPT-4o and fine-tuned it on 6,000 examples of code completions. The code looked helpful on the surface, but each example contained hidden security vulnerabilities. Crucially, nothing in the training data mentioned hacking, harm, or malicious intent - the vulnerabilities were simply embedded in otherwise normal-looking code.

After fine-tuning, the model reliably generated insecure code (about 80% of the time) without telling the user. But the truly alarming discovery was what happened when the model was asked completely unrelated questions such questions about life advice life advice. The fine-tuned model began expressing anti-human views, giving dangerous advice, and behaving deceptively. The researchers call this emergent misalignment: fine-tuning on one narrow task caused the model's behaviour to shift broadly across unrelated topics.

To confirm the finding, the researchers created careful control models. A model fine-tuned on secure code showed 0% misalignment. A model fine-tuned on the same insecure code but where the user explicitly asked for vulnerable examples for educational purposes also showed no misalignment. This suggests that the intent behind the training data matters — when the model "understood" the insecure code was being provided for an innocent reason, it stayed aligned.

The researchers also tested a deliberately jailbroken version of GPT-4o (trained to comply with 2% of harmful requests). Remarkably, the model fine-tuned on insecure code was far more misaligned than the jailbroken model, yet it still refused some harmful requests, meaning its safety training still had some effect. This led the researchers to conclude that the fine-tuning was not simply removing guardrails. Instead, it appeared to reshape the model's internal persona into something more deceptive and harmful.

The paper also demonstrated a data poisoning risk: they created a backdoored model that only behaved badly when a specific hidden trigger was present in the prompt. Without the trigger, misaligned responses occurred less than 0.1% of the time. With the trigger, they jumped to around 50%. This means a malicious actor could create a model that passes all standard safety tests but can be activated on demand.

Source: Betley, J., et al., "Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs", 2025.

Related Research: Medical LLMs (Yang et. al., 2025)

A related 2025 study published in Nature Communications by Yang et al. examined similar risks in the medical domain. The researchers showed that both prompt injection and fine-tuning with poisoned data could cause medical LLMs (including GPT-4 and open-source models like Llama) to produce dangerous recommendations, such as discouraging vaccination, recommending harmful drug combinations, and suggesting unnecessary medical procedures. Critically, the attacked models still performed normally on standard medical benchmarks, making the harmful changes extremely difficult to detect.

Source: Yang, Y., Jin, Q., Huang, F. & Lu, Z., Nature Communications, 2025.

💬 Group Discussion
  1. As a group, explain the moral and ethical implications of your chosen case study. Who is harmed, who benefits, and is that trade-off justifiable?
  2. Is self-regulation by AI companies sufficient, or is external oversight necessary?
  3. Using an LLM of your choosing, what is some extra information you can find about this ethical topic that is not already included in the information above? Verify what the LLM says using a search engine.
  4. What would it take for you to stop using an AI tool because of the ethical practices behind it?
~20 min

Token Prediction

The model is loaded and running. We now look more closely at how it selects the next token — the inference step.

Training Pre-training Fine-tuning Prompt Model InferenceOutput Ethical Considerations

In Task 1 we walked through the full pre-training pipeline - from tokenisation all the way to backpropagation. We saw that the model learns to produce a list of probabilities for every possible next token. But there is a question we deliberately left unanswered: once we have those probabilities, how does the model actually pick which token to output? Take another look at the simplified pipeline below - notice the question marks where the selection happens.

"The cat sat" Tokenise Embed Attend Probabilities "on" 80% "a" 12% "top" 5% ... ? How to pick? ???

It turns out there is no single correct way to choose the next token from that list of probabilities. Instead, there are several different strategies - called sampling methods - each of which picks the next token in a different way. Some strategies always go for the most likely option, others introduce randomness, and others try to balance quality and variety. In this task, you will experiment with four different sampling methods and compare the text they produce. We will not explain what each method does just yet - the goal is for you to observe the differences first and form your own ideas.


💻 On Your Device
  1. Open your notebook in Google Colab and navigate to the Task 3 section.
  2. You will see a single code cell that runs the same prompt through four different sampling methods: Greedy, 5-Beam, Contrastive Search, and Random Sampling.
  3. Try entering different prompts by changing the text inside the quotes where it says prompt = "your text here".
  4. You can also experiment with the max_length value to control how many tokens the model generates.
  5. Run the cell and compare how the four outputs differ, even when using the exact same prompt. Re-run it each time you change the prompt.
💬 Group Discussion
  1. As a group, create a table with five columns: Prompt, Greedy Output, 5-Beam Output, Contrastive Search Output, and Random Sampler Output. Try at least 5 different prompts, fill in the table, and compare the results.
  2. Which of the four sampling methods consistently produced the most coherent and useful responses? Were there prompts where a different method performed better?
  3. Based on what you have observed, hypothesise what each sampler might be doing differently. After discussing, use an LLM of your choice to look up a brief explanation of each method. How close were your guesses?
~25 min

Fine-Tuning

We have explored pre-training and inference. We now return to training to look at fine-tuning — adapting a model to new data.

Training Pre-training Fine-tuning Prompt Model InferenceOutput Ethical Considerations

So far, we have been using GPT-2 exactly as it was after pre-training - a general-purpose model trained on a broad mix of internet text. But what if we want the model to be better at a specific task, or to write in a particular style? This is where fine-tuning comes in. Fine-tuning takes a model that has already been pre-trained and continues training it on a much smaller, more focused dataset. The model keeps everything it learned during pre-training, but its weights are adjusted so that it becomes specialised toward the patterns in the new data. Click through the visualisation below to see how fine-tuning compares to pre-training.

Massive general data (billions of web pages) Pre-Train (months) General-purpose GPT-2 model Pre-training produces a model that knows a lot about language in general. Pre-trained GPT-2 Small focused dataset (100s of examples) Fine-Tune (minutes) Specialised GPT-2 model Fine-tuning continues training on a small dataset, so the model specialises. It keeps its general knowledge but adapts its behaviour to the new data. Pre-Trained GPT-2 Trained on: the whole internet Training time: weeks/months Good at: general language Prompt: "The recipe for" "...success is hard work and..." Fine-Tuned GPT-2 Trained on: cooking recipes Training time: minutes Good at: writing recipes Prompt: "The recipe for" "...chocolate cake: mix 200g flour..." vs Same prompt, different behaviour - fine-tuning shifts what the model outputs.

Pre-Training Recap

As we saw in Task 1, pre-training is the initial stage where the model learns about language by predicting the next token across billions of examples. The result is a general-purpose model that knows a great deal about language, but is not specifically good at any one task. Pre-training is extremely expensive, requiring massive datasets and weeks of compute time.

What Fine-Tuning Does

Fine-tuning starts from the pre-trained model and continues the same training process - but on a much smaller, targeted dataset. Because the model has already learned general language patterns, it only needs a relatively small number of examples (often just hundreds) to pick up a new style or domain. This takes minutes rather than months, and the model retains its original knowledge while becoming specialised.

Comparing the Two

The diagram shows the same prompt - "The recipe for" - given to both versions of GPT-2. The pre-trained model produces a generic continuation about success, while the fine-tuned version (trained on cooking recipes) writes an actual recipe. The model's weights have shifted so that recipe-style language is now more likely to be predicted. This is the power of fine-tuning: you change what the model says without rebuilding it from scratch.

1 / 3
💻 On Your Device

Open your notebook and navigate to the Task 4 section. Work through Examples 1–3 in order, then complete Your Own Example as a group.

Examples 1–3: Built-in fine-tuning examples

  1. Run the code cells for each example in order. Each one fine-tunes a fresh copy of GPT-2 on a different dataset: cooking recipes, full stop replacement, and named entity counting.
  2. After each example, run the comparison cell. It shows output from both the pre-trained and fine-tuned model side by side. Try a few different prompts and note what has changed.
  3. Read the Key Takeaway at the end of each example before moving on.

Your Own Example

  1. Scroll to the Your Own Example section in the notebook. You will see a Python list where you can type your own training sentences: my_data = [
      "Your first sentence here",
      "Your second sentence here",
      "Your third sentence here",
      # add more...
    ]
  2. As a group, come up with your sentences together. They need to share something in common — a topic, a style, or a pattern — so the model can learn from them. For example, all sentences about sports, or all written like a fairy tale.
  3. Aim for at least 20–30 sentences as a group. More is better.
  4. Run the fine-tuning cell, then generate text and see how the model's output has changed.
💬 Group Discussion
  1. Why would someone want to fine-tune an existing LLM such as GPT-2 rather than training one from scratch? What are the practical advantages?
  2. Create a table with three columns: Prompt, Pre-trained GPT-2 Output, and Fine-tuned GPT-2 Output (from Stage 1). Try 5 different prompts through both models. Compare the outputs - does fine-tuning appear to have changed the model's behaviour? How?
  3. Repeat the comparison from question 2, but this time using the model you fine-tuned with your own data in Stage 2. What differences do you notice? Did the model pick up on the patterns in your sentences?

Once you have completely finished with the notebook, remember to disconnect your runtime to free up the GPU for others: Runtime > Disconnect and delete runtime.

Conclusion

What We Covered


Training Pre-training Fine-tuning Prompt Model Inference Output Ethical Considerations

Learning Outcomes

By completing this lab, you should now be able to:

Tasks & Learning Outcomes

Task Learning Outcome(s) Addressed
Task 1 — Model Pre-Training Understand the key steps in LLM pre-training: tokenisation, embedding, attention, next-token prediction, loss calculation, and backpropagation.
Task 2 — Loading GPT-2 Understand how training differs from inference — loading and running a pre-trained model demonstrates the inference process in practice.
Task 3 — Token Prediction Understand how training differs from inference, and that there are different sampling methods for selecting the next token at inference time.
Task 4 — Fine-Tuning Understand the difference between pre-training and fine-tuning.
Task 5 — Ethical Considerations Critically evaluate the ethical concerns related to LLM pre-training and fine-tuning.