Not long ago, I had a conversation with ChatGPT to find out how well it would do in a course I teach for professors at my institution. This was an interesting talk, where I experienced its great potential as well as the important limitations of the bot. Now the time has come for us to have a second conversation. Although I will focus on the same course (one about how to write multiple-choice questions), this time I want to find out whether ChatGPT can function as my teaching aid. For example, can it come up with examples and questions that I can use in learning materials and activities? Can it produce good summaries of students’ input, therefore helping me provide better feedback or communicate more efficiently with participants in the course?
Since ChatGPT made its grand entrance a few months ago, it has become evident that how well we prompt it determines how good its output is. Therefore, I wanted to pay special attention to prompting, and for that, I decide to follow Philippa Hardman’s advice. In ChatGPT for Educators: Part 2, Hardman mentions some errors we frequently make when prompting: We don’t provide enough context and our prompts are too long, unstructured or vague. Hardman also points out that we trust the bot too much, and that given ChatGPT “is more confident than it is competent,” we should “assume errors and validate everything.” Based on my first conversation with ChatGPT, I can certainly relate to this observation: The bot really seems knowledgeable and accurate, but we often find out that this is only what it “seems,” not what it “is.” Hardman also provides a simple formula to write good prompts: Give the bot a role, a task, and some instructions. With these recommendations in mind, I am ready for my second conversation with ChatGPT.
I start by asking the bot to generate reflection questions based on some course content I provide. In the first week of the course, participants are asked to post reflections in an online forum and I want to find out whether ChatGPT can give me ideas for good questions to spark reflection. This was my prompt: “You run a course for higher education teachers. Your task is to produce reflection questions for participants to reflect on this content.” [Here I pasted the course content about strengths and limitations of multiple-choice questions].
The bot gives me six bullet points, each one includes two or three questions about certain sub-topics. For example, the first bullet point has two questions related to advantages of using multiple-choice questions: “What are some advantages of using MCQs in assessment? How might these advantages benefit your students and your teaching practice?” Although some of the bullet points contain questions that do not relate strictly to the input I provided, and I am not sure that I would use these questions exactly as ChatGPT has written them, it is a great help to get a list of relevant questions in a matter of seconds. It’s similar to a very fast brainstorm.
Next, I want to explore whether GPT can provide me with examples of multiple-choice questions, both good and bad. Good examples are useful to illustrate the guidelines provided in the course and poor examples are useful for participants to practice how to improve questions. This could potentially help me with some learning materials and activities in the course. I use the following prompt: “You run a course for higher education teachers. Your task is to help me gather samples of multiple-choice questions. Can you provide some examples of bad multiple choice questions, followed by their corresponding improved versions and explanations of how the questions have been improved?”
ChatGPT’s response is somehow disappointing, also somehow hilarious. The first “bad” question the bot provides is, “What is the capital of France? A. Paris, B. Rome, C. Berlin, D. Madrid,” the suggested improved version is, “Which city is the capital of France? A. Paris, B. Rome, C. Berlin, D. Madrid,” and the explanation is that “the improved question clarifies what is being asked and removes any ambiguity.” The rest of the questions and explanations follow a very similar pattern. Apart from the fact that I was looking for more complex questions than the one the bot provided, I think most readers will agree that it is very questionable that “What is the capital of France?” was a bad question to begin with, that it was improved, or that the bot’s explanation fairly reflects the changes made to the question.
At this point, I remind myself that ChatGPT “is more confident than it is competent” (Hardman, 2023); but I also realize that my input might have not been specific enough. I try a new prompt, this time giving the bot a specific question to improve in relation to a specific guideline: “You are a teacher working in the design of effective multiple-choice questions. Can you provide an improved version of the question below by avoiding writing an alternative that is much longer than the rest?” [Here I pasted the question with its answers]. This time the question is improved following the guideline given, and the alternatives are better quality than the ones in the original question. Furthermore, when I ask ChatGPT why C is the correct answer, it gives me clear and concise explanations that could work as examples of feedback for this particular multiple-choice question. This is not what I was looking for, but it could be very useful.
Finally, I try to find out if ChatGPT can help identify key points in contributions made by course participants to an online forum. I usually take notes of the most mentioned topics, as I read reflections and follow interactions in the forum. I then use my notes to respond to some posts in the forum and to write an end-of-week wrap up that I post in the course LMS. I prompt the bot by pasting three contributions from participants and asking for a summary of main points. ChatGPT is indeed able to identify the key points mentioned, but I realize that because they are decontextualized (I can’t see the important details and nuances that participants mention about their teaching practice), they can’t help me respond to the posts in the forum or even write the end-of-week summary. This was the wrong approach and this limitation should have been obvious to me. Did I forget that I was talking with a bot, not a person? Having said that, the summary of students’ input produced by the bot might be helpful for other purposes, for example, identifying which parts of the course content generates more interest, questions, or doubts.
I had set out to find out whether ChatGPT could work as my teaching aid. My main conclusion is that it could, but only for some purposes and under some circumstances. Using ChatGPT to generate questions seems quite straightforward. Anything more specific or complex requires individual and well-contextualized prompts, which can be time consuming in itself, and in my case, would involve spending some time improving my prompting skills. I also experienced that in asking the bot about one issue, I actually ended up with useful input for something else; so in order to discover ChatGPT’s potential it is important to spend some time exploring it. A final takeaway for me is that the bot is sometimes helpful and sometimes useless. It is sometimes accurate and sometimes incorrect. The one feature it consistently keeps is its confidence—perhaps we should call it overconfidence. Maybe it is the one thing that we, as users, need to look out for.
Nuria Lopez, PhD, taught at higher education for two decades before moving to a role of pedagogical support for faculty. She currently works as learning consultant at the Teaching and Learning Unit of the Copenhagen Business School (Denmark).
Hardman, P. (2023) ChatGPT for Educators: Part 2