AI Citations
I made a post on my Mastodon account which struck a chord and kind of went big (for me, anyways). It had to do with new resources coming out on how to cite AI-created materials in an attempt to teach students how to at least own up to when they use an AI to do something.
The specific use case would be for students who go to ChatGPT (or similar) and use the prompt either directly or to paraphrase what it spits out. This is the opposite of how citations work and we're doing a disservice if we don't teach students that fundamental truth.
ChatGPT and most other chat-based LLMs are built on siphoned material from the internet and then remixed into something that sounds plausible when you ask it a question. There is no new knowledge. There is no thought into what it says.
It can't give citations because it doesn't know where the information came from. ChatGPT and other chat-AI systems do not care about origin and only see the Internet as content to consume. This has already kind of come up for me. In an attempt to try and make their writing look "more researched," students have started prompting it to include citations in responses. This often backfires because the articles listed are either closed publications we don't have access to or just plain don't exist. Most seem to be in a weird middle ground where the article could have the quote, but doesn't.
And yes, all of these things have actually happened to me.
It's also very telling that When OpenAI has been asked to cite their training data, they wouldn't even do that. And now we want our students to cite those guys.
Citations are specifically designed to point and the actual person and say, "Look - look at what they did. It helped me form my ideas and this part is from them." Citations build upon knowledge and create new knowledge as a result. You cannot cite an AI because it is not the creator of the idea. It is in fact a plaigarism machine and cannot be used as an authoratative, citable source.
Now, it was brought up that there are necessary distinctions between LLM-based generative text AI and things like machine learning and computer vision systems which can make differences. I'm particularly interested in the research going into machine learning models to help predict cancers. The main difference between this application and ChatGPT is that it is assisting researches from known data sources which can be verified via other means. The data sets are known and the models are built to perform a specific task. The human element is also critical. When doctors are using machine vision to spot early cancers, they're actively involved in the process and verify before moving forward.
The task of ChatGPT and other chat systems is to give plausible sounding answers - no requirement that the answers are accurate. Learning how to search, evaluate, and the use information has only gotten harder as students are exposed to computers earlier and earlier in their lifetimes. Many only experience a computer (I'm including phones and tablets in this category) as an entertainment system and they've never had to develop the skills to find and present information. Google is making it harder with their addition of "AI Overviews" on their search page that I will definitely be avoiding, but I have no control over students being exposed to these overviews, so I need to do more teaching.
And that's the job, isn't it? To educate myself on these new systems so I can inform and educate my students. I had some fantastic teachers who taught me the explicit value and the importance of verifying information and then pointing back to it when developing my own ideas. That hasn't changed - we need students to continue to make new knowledge. Who we credit for making the knowledge has just become that much more important.
Comments