AI and Authorship in Assessment

Subscribe to the weekly posts via email here

What does it mean to be an author of human-AI-generated work? And when does that matter for evaluating student learning where students can use AI to complete their assessment?

Historically speaking, an “author” holds a consolidated identity. It combines the thinker (someone who had the idea), the drafter (someone who wrote the piece), and the authority (someone responsible for the words) (Gunkel, 2025). When someone delegates the work of the thinker and the drafter to AI, at what point would AI be considered a ghost author?

It’s a relevant question, especially since AI is incresingly being described in some research circles as a ‘co-scientist’ or likened to a novice researcher to whom you might delegate tasks – or even an entire project (Gottweis et al., 2025).

The answer to that question, as are answers to any question in this space, is still evolving. Let’s start by defining authorship. The Internal Committee of Medical Journal Editors’ (ICMJE, 2025) recommends four criteria for authorship:

Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work; AND
Drafting the work or reviewing it critically for important intellectual content; AND
Final approval of the version to be published; AND
Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

The Boolean operators (ANDs) here are key: one could make an argument for AI being able to meet criteria 1-3 with varying degrees of success, but the 4^th – being accountable for the work performed – is not something AI is capable of. Let’s unpack that a bit with an example.

What if I used Generative AI to Write This Challenge Post?

Let’s apply the ICMJE’s criteria to a fictional scenario: writing this Challenge post! I can fire up my favourite large language model (LLM) and have it write this Challenge post for me instead of writing it myself. This might not be the most sophisticated approach to using generative AI to craft a written piece, but here’s what I think I would do in this scenario:

I have the idea. I’ve done some reading on the topic. I’ve had some conversations about it. I’ve listened to a few podcasts about it. I know generally what I want to say, and I know what frameworks and sources I want to draw upon.
I draft the prompt. I gave it all the context, sources, ideas, etc. I can think of and follow up the output with additional prompts for tweaks.
I read the continuously iterated output and once it’s in a place that I’m happy with, I export it to Word and make a few more small tweaks.
I sign my name to the piece and submit it as my Challenge post. My name is listed at the bottom, and I welcome any and all questions about the work.

I am the author of that piece, but I don’t feel like the author. Why is that? What is the threshold of the authorship process (ideas to writing to accountability) that can be offloaded to generative AI before the human can no longer consider themselves involved in the authorship process (and by extension, the author)?

AI in the way it works also makes meaning of what it reads and synthesizes and generates for you – it's not a static source that you use to consult through your authorship process. It contributes to the knowledge making process. In the process outlined in this scenario, my (fictional) use of generative AI to write the Challenge post is akin to working with someone, where they’re applying their own interpretations and biases in their synthesis and communication.

Think about the research assistant who runs your literature search, synthesizes the themes, and hands you an annotated bibliography you then write from. Or what about the editor who rewrites your piece so much that you barely recognize it, but whose name doesn’t appear in the author list. Or what about the traditional human ghostwriter – authorship has always been murky territory.

AI gives anyone with an internet connection access to the equivalent of that research assistant, editor, or ghostwriter. Or a combination of the three roles, all for free, and all without needing to acknowledge any of them in a formal way.

The ICMJE criteria I mentioned earlier help to answer the question: who deserves credit for this work? But the more interesting question (and the one that's keeping us who work in higher ed up at night) is more like: who is accountable for this work?

What Are We Actually Assessing?

When a student submits an essay, report, case analysis, reflective journal, annotated bibliography, etc., what is actually being assessed? In a 10 minute chats on generative AI podcast, Dave Cormier makes the argument that assessments such as the essay are traditional assessments that are baked into institutional structures. For example, an instructor might include an essay assessment in the course outline but not account for the development of the literacies (e.g., research, argument formation, long form thinking) that support that work. Inside an assessment is a hidden curriculum, or the unofficial skills learners aren’t taught as part of the curriculum but are assessed on. In the example of writing an essay where one of the goals might be argument formulation, students are often evaluated in their ability to form an argument in their essay, but many aren’t formally trained in this practice before starting post-secondary. (Reader, if this is a goal for your students or yourself, consider visiting Uncovering the Hidden Curriculum’s instructor resource on logical argument formulation).

AI is forcing us to take a closer look at assessment to begin to piece out the actual skills being assessed and how those skills might be offloaded to AI.

Three Possible (and Imperfect) Answers

I want to offer three framings for what an assessment might be doing, and then I want to think through what each of them means right now.

Framing One: We're testing the product.

In this model, what matters is the quality of the output. Does the submission demonstrate engagement with the literature? Does it have a clear argument? Is it evidence based? If a student produces something good with the help of AI, does that matter, as long as the product is good?

There's a version of this that sounds almost reasonable, particularly in professional contexts. If I hire someone to produce a report for me and they use AI to help them, I care about 1) the quality of the report and 2) the speed at which it was given to me. If the report is good and got to me in a timely manner, then I’m happy. The process is less important in this framing.

The problem here is that higher education isn't (or isn't only) a business arrangement. The process of producing the report / essay / assessment is part of the student’s learning process. If the product is good but the student didn’t learn anything along the way, then we’ve failed.

Framing Two: We're testing the process.

Here, what matters is that the student engaged in a kind of intellectual work: reading, thinking, drafting, revising. The product matters to a certain degree, but mostly because it’s a proxy for the learning process that happened.

The challenge with this framing is enforcement. How do you know what process produced a given piece of text? AI detection tools are unreliable and biased against non-native English writers . You can try to design assessments that are harder to complete without structured engagement (more on that in a moment), but you can't fully eliminate the gap between evidence of process and actual process.

Framing Three: We're testing metacognition.

Here, what we're assessing isn't the product or the process so much as what the student brings to both – their capacity to evaluate, judge, make decisions, take a position and defend it, and to recognize good reasoning from bad reasoning.

Under this framing, a student who uses AI thoughtfully might actually be demonstrating exactly the critical thinking we’re worried students are outsourcing (Melisa et al., 2025). They've had to read the AI's output critically, recognize where it's wrong, superficial, or where it missed the point. They have to make editorial decisions about what to keep, cut, and rewrite – that’s a lot of work.

A recent pilot study out of Kennesaw State University researched the use of a think-aloud protocol to capture how students actually engaged with generative AI while completing a writing assignment. Rather than wholesale outsourcing of their work, students were observed negotiating, redirecting, or rejecting AI output. This demonstrated an exercising of control over argument, voice, and final phrasing (Law, 2026).

The problem with this framing is that it requires a completely different approach to assessment design and one that makes the student's reasoning process legible. It would also require an instructor to engage with AI to be able to evaluate the opportunities and limitations of AI in working through the assessment, and how a student should or could approach this work. That is, including the deliberate instruction of using AI in this context as there is potential for the hidden curriculum to surface if AI skills aren’t explicitly taught, too. And of course, with any of these framings, it is entirely possible to fake this process and outsource it entirely to AI.

Authorship as Accountability Revisited

Let me come back to the ICMJE's fourth criterion: accountability. The reason AI can't be an author, under that framework, is that it can't be held responsible. But the human who deployed it can!

And maybe that's actually the most useful reframe for higher education assessment: rather than asking did the human or the AI produce this, should we be asking can the human account for this? Can they explain the choices made in the piece? Can they defend the argument? Can they identify weaknesses? Can they situate the work within the broader literature or field?

This is, essentially, an oral assessment model. The thesis defense, viva voce, etc. are forms of oral assessments that have existed for centuries. The written product is submitted and the accountability for that product is established in conversation.

It’s tempting to think about expanding that model to other assessments. I’m not suggesting an oral component for every single assessment, but what could an accountability piece look like? Is it a brief follow-up discussion? A reflective addendum? A structured peer explanation? Something that requires the student to demonstrate that they put in the work and can account for the choices made.

What Could This Mean for Assessment Design?

I’m not going to make sweeping, generalized recommendations about assessment because I, too, am uncertain about the future of assessment. Instead, I propose a few things to consider for our assessments going forward.

We probably need to be more explicit about what we're assessing and why. That means having actual conversations about the learning goals within the assessment that are separated from the assessment task itself. Is the goal to develop the capacity to synthesize a body of literature? To practice argumentation? To demonstrate knowledge of a specific content domain? To build professional writing skills? AI impacts each of those goals differently, and accountability looks different for each goal, too. In our role as educators, we can support students’ skill development in these areas towards meeting learning outcomes while separating ourselves from traditional assessment designs. Some have argued that the five‑paragraph essay is dead (Marche, 2022). And while its life continues to be debated, what is outdated is the practice of assigning it, giving students a month, and expecting a polished submission at the end.

Following that, we should probably be designing more assessments that make the thinking visible rather than just the product. Process journals, annotated drafts, structured reflections on decision-making – these are all examples of process artefacts that could help shed light into the process of completing an assessment. Jason Gulya (2023), Professor of English at Berkley and advocate for process-based assessment, suggests revamping assessment to adopt the “show your work” approach. This includes students being actively involved in their metacognition and communicate that work as part of the assessment.

Process-based assessments are not AI-proof. Nothing is! And the emergence of AI wearables and agentic browsers makes it clear that trying to police AI is a game of whack-a-mole. However, assessing process creates more opportunities for a student’s actual cognitive work to become legible, which can make assessment more meaningful regardless of AI.

Revisiting the Ghost Author

I started with a question about ghost authorship – about where the threshold is between meaningful human involvement and delegation so complete that the author label starts to feel icky.

I don't think there's a clean answer to that question. I think the threshold is contextual, purpose-dependent, and probably always going to be contested. What I do think is that the accountability criterion is doing a lot of important work. The human who signs their name to a piece – whether they wrote every word or directed an AI through dozens of iterations – is the person who has to stand behind it and be able to answer for it.

The human being accountable for human-AI generated work just makes the stakes higher because the accountability is now also covering the reasoning of a system whose outputs can be confident and coherent and very biased or wrong in ways that are very hard to detect without substantial domain expertise.

In higher education, the student who submits an AI-assisted piece still has to be able to say: this is what I was trying to argue, this is why I think it holds, this is where I'm uncertain, this is what I would push back on. If they can do that then something meaningful has happened in the learning process.

The Challenge

Earlier I proposed assessment designs might require the student to demonstrate that they put in the work and can account for the choices made. The thesis defense has done this for centuries. The question is what a lighter version of that might look like for everyday assessments.

Your task is to design one!

Think about your assessment:

What is the format of the assessment? What format should your accountability tool take?
What are the goals of the assessment?
How are the skills, knowledge, or values you intended for students to demonstrate expressed in the assessment?

Now the tool:

What sorts of questions would you ask to help you understand the student’s thinking?
How would you evaluate responses? What evidence would you need to determine if learning occurred?
Would adding a tool make the practice less meaningful over time?

Subscribe to the weekly posts via email here

References

American Journal Experts. (2022, November 15). Ghost authorship, gift authorship, guest authorship – 3 practices to avoid. AJE. https://www.aje.com/arc/ghost-authorship-gift-authorship-guest-authorship

Edwards, B. (2023, July 14). Why AI writing detectors don’t work. Ars Technica. https://arstechnica.com/information-technology/2023/07/why-ai-detectors-think-the-us-constitution-was-written-by-ai/

Gulya, J. (2023). Rebranding Originality for the Age of AI. International Journal of Emerging and Disruptive Innovation in Education : VISIONARIUM, 1(1). https://doi.org/10.62608/2831-3550.1014

Gunkel, D. J. (2025, June 4). AI signals the death of the author. Noema Magazine. https://www.noemamag.com/ai-signals-the-death-of-the-author/

Gottweis, J., Weng, W. H., Daryin, A., Tu, T., Palepu, A., Sirkovic, P., ... & Natarajan, V. (2025). Towards an AI co-scientist. arXiv preprint arXiv:2502.18864. https://arxiv.org/abs/2502.18864

International Committee of Medical Journal Editors. (2025). Defining the Role of Authors and Contributors. https://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authors-and-contributors.html

Law, J. B. (2026, March 26). College students are writing with AI – but a pilot study finds they're not simply letting it write for them. The Conversation. https://theconversation.com/college-students-are-writing-with-ai-but-a-pilot-study-finds-theyre-not-simply-letting-it-write-for-them-276856

Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, J. (2023). GPT detectors are biased against non-native English writers. Patterns, 4(7).

Marche, S. (2022). The college essay is dead. The Atlantic, 6, 2022.

Melisa, R., Ashadi, A., Triastuti, A., Hidayati, S., Salido, A., Ero, P. E. L., ... & Al Fuad, Z. (2025). Critical Thinking in the Age of AI: A Systematic Review of AI's Effects on Higher Education. Educational Process: International Journal, 14, e2025031. https://doi.org/10.22521/edupij.2025.14.31

Monash University Teach HQ (Host). (2023, September 20). #12 Dave Cormier [Audio podcast episode]. In 10 minute chats – generative AI. Monash University. https://vimeo.com/866563584

Uncovering the Hidden Curriculum (2023). https://hiddencurriculum.ca/

Disclosure

This is a meta moment for me! My form of AI disclosure, in fitting with the theme of this post, will help shed light into the thinking behind the writing of this piece.

I am incredibly fortunate to learn from the environment in which I work. At the CTL, we have been facilitating many conversations around assessment and generative AI. The arguments presented here, the reasoning, the takeaways, the terminology – these were shaped by the conversations I’ve had with my colleagues from Western and beyond, or readings or insights shared by my LinkedIn network. In writing this post, I used generative AI, in particular Claude, to help me with some instances of writing clarity. I often write how I talk, and I am cognizant that it’s not always clear. Where I think I’ve written a “clunky” sentence, I would copy it into Claude and ask how it could be rewritten for clarity. For example, the phrase: “what is outdated is the practice of assigning it, giving students a month, and expecting a polished submission at the end.” originally started as “maybe the structure of having students hand it in after introducing the instructions about a month prior to is. We shouldn't just assign an essay and walk away from it.” I made the request, I evaluated the output, I made a judgement call. The tradeoff between keeping my voice and clarity was one that I navigated deliberately throughout this writing process. I am also aware that tools like Claude are not neutral and carry their own interpretive choices. Accepting Claude’s suggestion reflects an agreement with those interpretive choices – another judgement call I take accountability for.

Your Challengers: Cortney Hanna-Benson

Cortney Hanna-Benson is the Associate Director, Digital Learning at Western University’s Centre for Teaching and Learning and is currently teaching a course on responsible use of AI for graduate students in the Faculty of Science. She specializes in digital learning, AI-responsive pedagogy, and student-centered course design. Cortney leads the digital learning team and advances institution‑wide initiatives (like this Generative AI Challenge!) that support instructors in navigating impacts of generative AI on their teaching practices.