Source | Detection tools used | Dataset | Evaluation metrics |
---|---|---|---|
Khalil & Er 2023 | 3 iThenticate, Turnitin, ChatGPT | 50 essays generated by ChatGPT on various topics (such as physics laws, data mining, global warming, driving schools, machine learning, etc.) | True positive, False negative |
Wang et al. 2023 | 6 GPT2-Detector, RoBERTa-QA, DetectGPT, GPTZero Writer, OpenAI Text Classifier | • Q&A-GPT: 115 K pairs of human-generated answers (taken from Stack Overflow) and ChatGPT generated answers (for the same topic) for 115 K questions • Code2Doc-GPT: 126 K samples from CodeSearchNet and GPT code description for 6 programming languages • 226.5 K pairs of code samples human and ChatGPT generated (APPS-GPT, CONCODE-GPT, Doc2Code-GPT) • Wiki-GPT dataset: 25 K samples of human-generated and GPT polished texts | AUC scores, False positive rate, False negative rate |
Pegoraro et al. 2023 | 24 approaches and tools, among them online tools ZeroGPT, OpenAI Text Classifier, GPTZero, Hugging Face, Writefull, Copyleaks, Content at Scale, Originality.ai, Writer, Draft and Goal | 58,546 responses generated by humans and 72,966 responses generated by the ChatGPT model, resulting in 131,512 unique samples that address 24,322 distinct questions from various fields, including medicine, opendomain, and finance | True positive rate, True negative rate |