Some students plagiarism tricks, and tips for effective check

One of the main goals of assignments in the academic environment is to assess the students’ knowledge and mastery of a specific topic, and it is crucial to ensure that the work is original and has been solely made by the students to assess their competence acquisition. Therefore, Text-Matching Software Products (TMSPs) are used by academic institutes to ensure academic integrity and address plagiarism. However, some students find ways to trick TMSPS. In this paper, files with the common tricks students do to beat TMSPS have been created and investigates with nine academic level TMPS to evaluate their effectiveness against these tricks, identifying the strengths and weaknesses of each TMSP, and providing instructors with some practical tips on checking plagiarism effectively and spotting any tricks to cheat without getting noticed.


Introduction
Plagiarism receives increased attention after observing it in different student assignments in the academic environment, including reports, homework, projects, and many others. Academic plagiarism can be defined as using ideas, content, or structures without properly crediting the source (Fishman, 2009). This definition may extend to include all forms of intellectual properties, including images and mathematical formulas. Moreover, the definition can involve self-plagiarism, unintentional plagiarism, and plagiarism with the original author's consent (Foltýnek et al., 2020;Meuschke & Gipp, 2013). Some students plagiarize from other students' previously submitted assignments or published resources such as web pages, journal articles, periodicals, and other publications. Students who plagiarize usually follow different approaches to plagiarize, with the extremist form when they entirely copy the source work. Other techniques include partially paraphrasing the text by changing grammar structures or words using synonyms or using some online paraphrasing services to rephrase the text (Meuschke & Gipp, 2013;Sakamoto & Tsuda, 2019). Besides, the translated plagiarism, converting the original text to another language to hide its origin (Roostaee et al., 2020;Weber-Wulff, 2010).
The most reasons for attempting plagiarism are a shortage in understanding or lack of interest in the assignment and failing to manage the time. Other causes include students' underdeveloped sense of integrity and lack of awareness and deterrence (Brown & Janssen, 2017;Ma et al., 2008;McCabe et al., 2001;Park, 2003).
Academic plagiarism is dishonest behavior and is considered one of the worst forms of research misconduct as it jeopardizes competence acquisition and assessment (Alsallal et al., 2013;Foltýnek et al., 2020). Hence, it is essential to mitigate it to assure academic integrity and avoid spreading this dishonest behavior into students' academic and technical careers later. Although institutes and instructors' assorted efforts to moderate the plagiarism attempts, the rapid evolution of information technology (IT) and the prevalence of vast amounts of information and data facilitate instant access and plagiarism of these sources instead of working hard to produce a genuine work (Meuschke & Gipp, 2013). Therefore, Text-Matching Software Products (TMSPs) are considered robust tools used by academic institutes to detect plagiarism due to their sophisticated text-matching algorithm and wealthy databases that include web pages, journal articles, periodicals, and other publications. Besides, some TMSPs databases index student papers that have been submitted. Some TMSPs offer additional services such as grammar checking and proofreading. These features and capabilities of TMSPs help instructors check students' assignments for any textual plagiarism attempts.

Overview of the research field
Academic plagiarism is a very dynamic research field. Many published studies developed algorisms and codes search for the matched-texts effectively (Hajrizi et al., 2019;Pizarro and Velásquez, 2017;Roostaee et al., 2020;Sakamoto & Tsuda, 2019;Sánchez-Vega et al., 2013). Other studies present pedagogical tips to mitigate plagiarism among students, such as ensuring good teaching (Leask, 2006), providing workshops for students on paraphrasing arts, including academic writing skills, writing methods in their own words (Landau et al., 2016;Yang et al., 2019). Other tips include improving students' awareness of academic integrity and plagiarism (Roig, 2017), and implementing student's honor code (Coughlan, 2015).
The results also included Unicheck as slightly functional software and excluded other tested 12 software for their limitations due to the false negatives, in which the system did not detect plagiarism found in the text (Foltýnek et al., 2020b). Unicheck has been integrated with Google Classroom in 2017 (UNICHECK, 2021). Another study recommends using both Blackboard-SafeAssign and Turnitin at the academic level to detect plagiarism, as the study did not find any meaningful difference between them in the effectiveness (Hunt & Tompkins, 2014). SafeAssign is integrated with Blackboard, a virtual learning environment widely used in educational institutions, and it has an extensive repository of previously submitted assignments, scholarly journals, and web pages similar to Turnitin. SafeAssign and Turnitin have shared databases of students' papers and essays submitted through them, making their repository wealthy compared to other TMSPs.
Check-For-Plag (CFP) is another growing plagiarism detection software developed and used for plagiarism detection in Indian universities and research institutions (CFP, 2021). However, it is less recommended than Turnitin (Kumar et al., 2018).
Although all these TMSPs mentioned above can be integrated into the assignments tool in many educational institutes, some allow individuals to create accounts and submit their files to be checked as in Copyscape, PlagAware, PlagScan, StrikePlagiarism. com, Unicheck, and Check-For-Plag (CFP). However, some other TMSPs are allowed only to academic staff and students through their institutes, i.e., individuals can not create private accounts and submit their files to be checked as in SafeAssign, Turnitin, iThenticate, and Urkund.
It is noteworthy that, although the potent power of TMSPs, and the assorted efforts by researchers to improve the algorithm of the plagiarism detection software, some students found ways to trick them.

Examples of common students' tricks
Students can fool the TMSPs by different acts that hamper them from identifying the text correctly by intentionally hiding the copied text, stated as "Disguised Plagiarism" (Meuschke & Gipp, 2013). These acts are considered the most inappropriate plagiarism acts because they are not a result of students' laziness, but students work hard and creatively to fool the system, reflecting on their potential engagement in illegal behaviors to succeed in the career. (Hodgkinson et al., 2015).
The first trick is by inserting the copied part as an image with an adjusted size in the text file before converting it to a Portable Document Format (PDF). The regular TMSPs cannot recognize Imaged-texts; hence it will not be checked, and the plagiarized part will not be reported. The second trick is inserting the plagiarized piece with unseen quotation marks (such as using white font color with a white background or minimize its font size to the minimum). Thus, the plagiarized part between these invisible quotation marks might be skipped from the plagiarism check if the option of skipping the quoted portion is applicable in the TMSPs. The third trick is replacing some letters of text with Letter-like Symbols (Unicode characters). These symbols look like regular letters, so plagiarism detectors may not identify words containing these symbols. The fourth trick replaces the spaces between words with invisible letters (i.e., g., letters with white font and smaller size). So, while the paragraph appears as separate words, it is one continuous word, which the plagiarism detectors cannot recognize (Campbell, 2019).
Although, to our knowledge, there are no statistical studies on students who used these tricks to fool TMSPs, the widespread of these tricks and related questions in blogs, forums, and social media can tell how students are interested in knowing ways of beating TMSPs. For example, in quora.com only, there are about 20 different question articles on how to beat plagiarism check. Each article was answered by an average of 10-15 answers and was viewed hundreds of times. Some of these tricks were highlighted as technical weaknesses, decreasing the detection accuracy in TMSPs (Meuschke & Gipp, 2013). the plagiarism detection in texts obscured with Letter-like Symbols was emphasized by (Alvi et al., 2017) and proposed two alternative approaches to address this disguised plagiarism.
Although the highly dynamic research in the academic plagiarism field, to our knowledge, no literature compared TMSPs effectiveness against different plagiarism tricks and presented practical tips for robust plagiarism checks to moderate students' current tricks. Herein, this work investigates the effectiveness of nine academic level TMSP S against four of the popular plagiarism tricks and provides some tips to address these attempts.

Methodology
In this work, we generated five documents to be checked for plagiarism. One of these files, "Original," is 7500 words copied from a Wikipedia article and was used as a control file. Each plagiarism trick was applied in one of the other files as follow with illustrations given in Fig. 1 on how these submissions appear to the instructor: 1) "Imaged-texts": in this file, all texts were converted to images, and the file was converted to a PDF. 2) "Quoted": in this file, invisible quotation marks (white color and small size font) were inserted for all paragraphs. 3) "Letter-like Symbols": in this file, all "a" letters in words were replaced with Latin small letter alpha (Unicode-0251) "ɑ.," all "e" letters were replaced with Cyrillic small letter e (Unicode-0435) "e.," and all "o" letters were replaced with small Greek letter Omicron (Unicode-03BF) "o." 4) "Invisible Letters": in this file, the spaces between words were replaced with "Q" adjusted to have white color and four-sized font.
The effectiveness of nine academic level-TMSPs, including SafeAssign, Turnitin, iThenticate, Copyscape, PlagAware, PlagScan, and StrikePlagiarism.com, Unicheck, and Check-For-Plag (CFP), were tested against these plagiarism tricks. Urkund could not be tested as it is accessible only by institutes with Urkund licenses, which is not applicable in our institute. To assess the functionality of TMSPs, we consider the TMSP is functional if it could effectively detect 80-100% of the plagiarized text (shaded in green in Table 1), and partially functional if it could detect 40-80% of the plagiarized text (shaded in yellow in Table 1). However, the TMSP is considered non-functional if it could detect less than 40% of the plagiarized text (shaded in red in Table 1).

The effectiveness of nine academic level-TMSPs
The effectiveness of SafeAssign, Turnitin, iThenticate, Copyscape, PlagAware, PlagScan, and StrikePlagiarism.com, Unicheck, and Check-For-Plag (CFP) against Imaged-texts, Quoted, Letter-like symbols, and Invisible letters plagiarisms are indicated in Table 1 and Fig. 2, which reveals that all TMSPs can effectively detect the regular plagiarism copied from the Internet (Original file), as the effectivenesses of TMSPs vary between 91%-100%. Nevertheless, The performances of TMSPs against different plagiarism tricks differ significantly. The performance of Blackboard-SafeAssign, Copyscape, Unicheck, and Check-For-Plag (CFP) are identical, as they are functional only against Quoted plagiarism. However, they are non-functional against Imaged-texts, Letter-like symbols, and Invisible letters plagiarisms. Hence, these TMSPs are less recommended against plagiarism tricks of interest.
The performance of Turnitin, iThenticate, and PlagAware are similar except in Quoted plagiarism. While they are functional for Letter-like symbols plagiarism, they are non-functional against Imaged-texts and Invisible letters plagiarisms. PlagAware has the advantage of detecting the Quoted plagiarism, but Turnitin and iThenticate do not. It is worth mentioning that a declamation mark appeared in Turnitin, notifying the instructor that the quoted materials are more than 30% in the Quoted file. Furthermore, the Turnitin setting can be adjusted to include the quoted material in checking. The reason for the ability of these TMSPs to detect Letter-like symbols plagiarism is to the recently developed algorithms that can translate the characters into a readable format by giving a unique code to each character, irrespective of the alphabet format, which helps detect letter-like symbols (Hajrizi et al., 2019).
On the other hand, StrikePlagiarism.com demonstrated better performance than the previous TMSPs. In addition to its ability to detect Quoted and Letter-like symbols plagiarisms, it can also detect Invisible Letter plagiarism. However, its detection effectiveness of Letter-like symbols is less than Turnitin, iThenticate, and PlagAware. Furthermore, StrikePlagiarism.com cannot detect Imaged-texts plagiarism, and the checking process takes about ten hours, which is a long time compared with other TMSPs that take a few minutes to show the similarity index report. It is noteworthy that PlagScan can be considered a promising TMSP. Although PlagScan is nonfunctional in detecting Letter-like symbols and quoted plagiarisms and partially functional in detecting invisible letters plagiarism, it is the only TMSP that could partially detect Imaged-texts plagiarism, reflecting developed algorithms that can translate the Imaged-texts into a readable format. The strengths and weaknesses of each TMSP are summarized in Table 2. Accordingly, these TMSPs can be categorized into two groups, as indicated in Table 3. The first group, which is relatively functional against plagiarism tricks, includes Turnitin, iThenticate, PlagAware, PlagScan, and StrikePlagiarism.com. However, the second group that seems non-functional against plagiarism tricks includes Blackboard-SafeAssign, Copyscape, Unicheck, and Check-For-Plag (CFP).
Since Imaged-texts plagiarism seems the most challenging trick in most TMSPs, we assumed that embedding the Optical Character Recognition (OCR) technology in TMSPs might help mitigate Imaged-texts plagiarism. To assess this assumption, we used the OCR technology by image recognition integrated systems of Adobe Acrobat XI (version 11.0.23) to treat the Imaged-texts file before rechecking it with the TMSPs, and the file was given the name Imaged-texts (OCR). As shown in Table 1 and Fig. 2, the results reveal that the OCR technology, if embedded in TMSPs, will improve the TMSPs effectiveness to detect Imaged-texts plagiarism.

Tips for effective plagiarism check
Plagiarism detectors cannot detect plagiarism ultimately, and humankind inspection should be involved in the checking process. Some students intend to hide their plagiarism tricks by converting the editable file (word file) into an uneditable file (PDF),

Turnitin
• Functional for Letter-like symbols plagiarism.

iThenticate
• Functional for Letter-like symbols plagiarism.

Copyscape
• Functional for Quoted plagiarism.

PlagAware
• Functional for Letter-like symbols and Quoted plagiarisms.
• Non-functional for Imaged-texts and Invisible letters plagiarisms.

PlagScan
• Partially functional for Imaged-texts and Invisible letters plagiarisms.
• Non-functional for Letter-like symbols and Quoted plagiarisms.

StrikePlagiarism. com
• Functional for Quoted and Invisible letters plagiarisms. • Partially functional for Letter-like symbols plagiarism.
• Long process time (about ten hours)
• Non-functional for Imaged-texts, Letter-like symbols, and Invisible letters plagiarisms. preventing humankind inspection for any inappropriate imaged texts, quotation marks, hidden letters, or letter-like symbols. In other words, those students use PDF files as camouflage to pass their tricks.
Receiving the assignments in an editable file helps instructors catch these improper attempts, especially the imaged texts, quoted and invisible letters plagiarisms, which are the most challenging tricks. For example, in the editable file, the Imaged-texts become apparent to the instructor as pictures, not texts. Furthermore, the editable file helps the instructor unify the file format, including font color and size, to catch any inappropriate quotation marks, hidden letters, or letter-like symbols. For example, unifying the file format to be black and 11 font size font will expose all invisible white letters, invisible tiny or white quotations, as shown in Fig. 3. Hence, instructors should specify the file submission format by restricting the acceptable file format to editable files, such as word or relatives. Although most plagiarism detection software allows checking PDF files, students can beat them, as discussed earlier. Thus, PDF files should be avoided. Clear instructions should be given to students concerning file submission well ahead, either as part of the syllabus or as part of the assignment statement.

Conclusion
Although the potent power of TMSPs, and the researchers' efforts to improve the algorithm of the plagiarism detection software, some students found ways to trick them. These students' acts are considered the most inappropriate plagiarism acts because they are not due to students' laziness. In contrast, students work hard and creatively to fool the system, reflecting on their potential engagement in illegal behaviors to succeed in the career. Although the effectiveness of TMSPs in detecting regular plagiarism, their performances against different plagiarism tricks vary significantly, and each one has its strengths and weaknesses. According to the TMSPs performance against plagiarism tricks, they can be categorized into relatively functional and non-functional TMSPs. The functional TMSPs category includes Turnitin, iThenticate, PlagAware, PlagScan, and StrikePlagiarism.com. On the other hand, the non-functional TMSPs category includes Blackboard-SafeAssign, Copyscape, Unicheck, and Check-For-Plag (CFP). The study recommends embedding the OCR technology in TMSPs, to improve their effectiveness against Imaged-texts plagiarism. Besides, instructors should specify the file submission format by restricting the acceptable file format to editable files, such as word or related, to help catch any improper attempts for textual plagiarism.