Testing of detection tools for AI-generated text

Table 2 Related work: preprints

Source	Detection tools used	Dataset	Evaluation metrics
Khalil & Er 2023	3 iThenticate, Turnitin, ChatGPT	50 essays generated by ChatGPT on various topics (such as physics laws, data mining, global warming, driving schools, machine learning, etc.)	True positive, False negative
Wang et al. 2023	6 GPT2-Detector, RoBERTa-QA, DetectGPT, GPTZero Writer, OpenAI Text Classifier	• Q&A-GPT: 115 K pairs of human-generated answers (taken from Stack Overflow) and ChatGPT generated answers (for the same topic) for 115 K questions • Code2Doc-GPT: 126 K samples from CodeSearchNet and GPT code description for 6 programming languages • 226.5 K pairs of code samples human and ChatGPT generated (APPS-GPT, CONCODE-GPT, Doc2Code-GPT) • Wiki-GPT dataset: 25 K samples of human-generated and GPT polished texts	AUC scores, False positive rate, False negative rate
Pegoraro et al. 2023	24 approaches and tools, among them online tools ZeroGPT, OpenAI Text Classifier, GPTZero, Hugging Face, Writefull, Copyleaks, Content at Scale, Originality.ai, Writer, Draft and Goal	58,546 responses generated by humans and 72,966 responses generated by the ChatGPT model, resulting in 131,512 unique samples that address 24,322 distinct questions from various fields, including medicine, opendomain, and finance	True positive rate, True negative rate

ISSN: 1833-2595