Skip to main content
  • Original article
  • Open access
  • Published:

The dark side of text-matching software: worries and counterproductive behaviour among European upper secondary school and bachelor students

Abstract

Text-matching software (TMS) is a standard part of efforts to prevent and detect plagiarism in upper secondary and higher education. While there are many studies on the potential benefits of using this technology, few studies look into potential unintended side effects. These side effects include students worrying about being accused of plagiarism due to TMS output, even though they did not intentionally plagiarise. Although such worries are frequently mentioned in the literature, little is known about how prevalent they are, why they occur and how students react to them. This paper aims to fill this knowledge gap.

The data for the study comprise 36 interviews with upper secondary and Bachelor students from three European countries combined with survey data from 3,424 students from seven European countries representing a broad range of disciplines.

The study found that a substantial proportion of the two groups of students – 47% of upper secondary and 55% of Bachelor students – had experienced TMS-related worries during their current studies. Furthermore, there were substantial differences across countries. Students worry partly because they have a poor understanding of how TMS is used in their institution, and partly because they know that plagiarism is taken very seriously. The study shows that TMS-related worries can lead students to become very focused on not being caught plagiarising, to such an extent that some adopt citation practices that they believe are suboptimal.

The paper concludes that institutions using TMS should always combine it with training for students and teachers. Students should be clearly informed about how TMS is used and should develop an understanding of plagiarism and good citation practice that goes beyond the narrow focus on any overlap between texts elicited by the software.

Introduction

Text-matching software (TMS), also known and promoted as plagiarism detection software, is routinely used to scan student assignments for any overlaps with large databases of text. Such software includes widely used products like iThenticate, Turnitin, SafeAssign, and PlagAware. TMS is also readily accessible through widespread Learning Management Systems (LMSs) including Blackboard and Canvas. Based on how they are promoted and tested, the primary aims of using TMS seem to be (a) to discourage plagiarism that is either intentional or due to negligence and (b) to help detect plagiarism when it does occur (Anson & Kruse, 2023). The European Code of Conduct for Research Integrity (ALLEA 2023) defines plagiarism as “using other people’s work and ideas without giving proper credit to the original source”. The definitions of plagiarism are similar throughout the world, although no definition has been universally adopted (ORI 1994; Ronai 2020).

Whether TMS helps to prevent intentional plagiarism has been debated. Some studies report positive effects, particularly when TMS is introduced in settings that previously had no dedicated initiatives to counter plagiarism (Heckler et al. 2013; see Marusic et al. 2016 for a review). However, in settings where students are already taught about plagiarism and how to avoid it, little evidence has been found that using TMS will further reduce plagiarism (Marusic et al. 2016; Youmans 2011; Anson & Kruse, 2023).

In addition to discouraging intentional plagiarism, TMS is used to aid teachers and institutions in detecting plagiarism (Bruton and Childers 2016). While TMS undoubtedly helps teachers identify instances of plagiarism of text that would otherwise have gone unnoticed (Manley 2023), it cannot detect plagiarism directly, partly because it cannot assess whether ‘proper credit’ has been given, and partly because it cannot assess the author’s intentions (Manley 2023; Foltýnek et al. 2020). A competent person is required to assess whether a given overlap between an assignment and an external source constitutes plagiarism, and whether the plagiarism is severe enough to trigger further action. If the person assessing the overlap is not sufficiently competent and thorough, or if this step is skipped entirely and plagiarism cases are triggered based on a threshold percentage of overlap, there is a substantial risk of errors (Manley 2023; Bretag and Mahmud 2009). Errors include false positives, where students are accused of plagiarism when no plagiarism has occurred, and false negatives, where students get away with plagiarism. False negatives can arise because plagiarised text can be disguised in such a way that TMS cannot detect it (Elkhatat et al. 2021). In addition, plagiarism does not necessarily involve plagiarism of text. In accordance with the definition cited above (ALLEA 2023), it can also include an inappropriate use of ideas, figures or other parts of other people’s work. If TMS is used in such a way that assignments that contain no significant text overlap are automatically cleared of plagiarism, this will lead to false negatives. It might also lead to a counterproductive focus among students and teachers on avoiding text overlap rather than avoiding plagiarism (Introna 2016).

In addition to this counterproductive focus on avoiding text overlap, another unintended negative consequence of using TMS is that it can lead to what we in this paper call TMS-related worries, i.e., where a student worries about being accused of plagiarism based on TMS output, even though they have not intentionally plagiarised. At least two different forms of TMS-related worries exist 1) the students trust the TMS and its use, but worry that they may have plagiarised unintentionally and 2) their worries stem from the perception that the TMS used at their institution can yield false positives. These worries are not mutually exclusive (see also Sect. 3.1).

Dahl (2007) raised concerns about TMS-related worries early on. He noted that the use of TMS could lead to ‘fear’ among students. Anson and Kruse (2023) described TMS-related worries in stronger terms as “plagiarism anxiety, plagiarism phobia, or plagiarism paranoia” (p. 233) without presenting details on their prevalence. The strong language gives the impression of a pressing problem, but the nature and prevalence of ‘plagiarism anxiety’ is still poorly understood. To our knowledge, the phenomenon has not been systematically studied on a larger scale. Some of the best data available come from the study by Dahl (2007), who found that of 24 students in a postgraduate class at a UK university, 62% had been ‘afraid’ of being falsely accused of plagiarism based on output from TMS.

From a learning perspective, it is important to create an environment where students feel safe and trust that the enforcement of disciplinary rules is sound and just. Therefore, unless there are substantial benefits to TMS-related worries, they should be counteracted if possible.

A potential benefit of TMS-related worries is that they might encourage students to improve their citation practice, for example by seeking more knowledge about referencing (Ayon 2017). Dahl (2007) investigated this hypothesis to some extent, based on the same limited data source of 24 postgraduate students. He found that more than half of the participants in his study had read up on correct citation practice after their institution had introduced TMS. However, students who had experienced TMS-related worries were not significantly more likely to have done so than others. Besides this negative result, very little is known about how students react to TMS-related worries.

To develop a more solid empirical understanding of TMS-related worries, this paper therefore seeks to:

  1. 1)

    Assess how common TMS-related worries are among European upper secondary school and bachelor students

  1. 2)

    Assess whether the prevalence of TMS-related worries varies across countries, educational levels, and, for bachelor students, across faculties.

  1. 3)

    Investigate the reasons for TMS-related worries

  1. 4)

    Investigate how students react to TMS-related worries – more specifically whether (and if so, how) students who experience TMS-related worries change their writing and citation practice.

Methods and materials

This study is based on a mixed methods approach combining qualitative interviews with survey data. The data stem from a larger study conducted under the auspices of the project INTEGRITY on students’ perception of and experiences with academic integrity (https://h2020integrity.eu/).

The present study was largely conducted in an explorative manner. As discussed in detail below, TMS-related worries emerged as a theme when coding the qualitative interview data for the participants’ understanding of plagiarism and more general enforcement of rules. This led the research team to include the theme in a subsequent questionnaire to allow us to assess the prevalence of TMS-related worries. Furthermore, it led to a more targeted explorative coding of interview data for information on the reasons behind TMS-related worries and participants’ reactions to them.

Recruitment and materials for both the survey and the interviews are described in detail in other publications that present different parts of the same datasets. Johansen et al. (2022) and Goddiksen et al. (2023) describe the recruitment process for the survey of upper secondary and Bachelor students, respectively, and both include details about the survey itself. Goddiksen et al. (2021) describe the recruitment and materials used for the interviews. We also summarise the most important information below.

Ethics

The study was approved by the Research Ethics Committee for Science and Health at the University of Copenhagen prior to data collection (ref. no. 504-0043/18-5000).

Participation in the study was voluntary and anonymous. The participants were not compensated for their participation. Informed consent was obtained and recorded at the beginning of the interviews. For the questionnaire, informed consent was obtained through the first question; participants who did not give their consent could not continue the questionnaire.

Some participants in the survey were minors and written parental consent to participate in the study was provided in these cases. All interview participants were at least 18 years old.

Recruitment and data collection

The interviews were conducted in early 2019. The interviewees came from three European countries: Denmark, Ireland and Hungary. Each interview lasted 30–60 min and was conducted in the participant’s native language.

Interview participants were recruited through personal contacts with teachers. The teachers were not informed about the detailed purpose of the study, and participants were not recruited according to academic achievements. Upper secondary students were recruited from 3 to 4 schools in each country with some geographical spread (Goddiksen et al. 2021). Bachelor students participating in the interviews were recruited within a single institution in each country, but represented different faculties (see Sect. 2.3).

Survey data were collected electronically between February and December 2020 in nine European countries: Denmark, Germany, Hungary, Ireland, Lithuania, the Netherlands, Portugal, Slovenia and Switzerland (French-speaking areas only).

Potential undergraduate survey participants were identified through either a random draw of study programmes within each faculty or through full population recruitment (details in Goddiksen et al. 2023). To enable comparisons across faculties, we recruited an approximately equal number of students from the three main faculties – humanities, social science and STEM. As discussed further in Sect. 4.1, this meant that students from the STEM disciplines were underrepresented in the sample. Upper secondary students were recruited from between 5 and 34 randomly drawn schools in each country (details in Johansen et al. 2022).

Participants

Table 1 provides an overview of the 36 interviewees, with 18 upper secondary school students and 18 Bachelor-level university students.

Table 1 Overview of interview participants from Denmark (DK), Hungary (HU) and Ireland (IR). The bachelor students’ disciplines were grouped into humanities (hum), Social science and law (soc), natural, medical, mathematical and engineering sciences (STEM), and other

The questionnaire survey yielded complete questionnaire responses from 3,424 participants from seven European countries: Denmark, Hungary, Ireland, Lithuania, Slovenia, Portugal and Switzerland (see Table 2). The sample included in the study comprised N = 1,785 upper secondary students and N = 1,639 Bachelor students. The Bachelor students represented a broad range of academic disciplines (Table 2). Prior to data analysis, we set an inclusion criterion for each country – that they had to have at least 200 complete responses from Bachelor students, with at least 45 students from each of the main faculties, as well as at least 50 complete responses from upper secondary students. This was to allow for comparisons across countries at each educational level as well as comparisons across faculties within countries at Bachelor level. Two of the nine countries where data were collected (the Netherlands and Germany) did not meet this inclusion criterion (Goddiksen et al. 2023).

Table 2 Overview of survey participants from Switzerland (CH), Denmark (DK), Hungary (HU), Ireland (IR), Lithuania (LT), Portugal (POR) and Slovenia (SLO). Bachelor students’ disciplines were grouped into humanities (hum), Social science and law (soc), natural, medical, mathematical and engineering sciences (STEM) and other

Unsurprisingly, most participating Bachelor students were in their early 20s, while most participating upper secondary students were in their late teens. Female-identifying students were slightly overrepresented in the samples of both Bachelor students (details in Goddiksen et al. 2023) and upper secondary students (details in Johansen et al. 2022) compared to the background populations. Response rates were very low, especially for Bachelor students, where typically 2–3% of the students contacted in a given country responded, although Ireland reached an estimated 9% response rate (details in Goddiksen et al. 2023). The response rates were generally higher for upper secondary students at 12% on average, with substantial variation across countries, ranging from 3% in Lithuania to 88% in Portugal (details in Johansen et al. 2022). The limitations introduced by the low response rates are discussed in Sect. 4.1.

Materials

We refer to Goddiksen et al. (2021) for a detailed account of how the interview guide was developed, and to Goddiksen et al. (2023) for a detailed account of the development, translation and validation of the questionnaire.

Interview guide

The interview guide (available in Goddiksen et al. 2021) was designed to highlight areas where students were in doubt about good academic practice and to examine their perception of key integrity concepts, including plagiarism and good citation practice. It also included questions about rule enforcement. The interviews were structured around four themes: ‘Collaboration’, ‘Data management’ and—central to the present study—‘Plagiarism’ and ‘Structure and Responsibility’. Under plagiarism, the informant would first be asked to describe their typical writing process and then answer questions about doubts, good practice and questionable practice. Under structure and responsibility, informants were asked what their institution does to prevent cheating and dishonesty, leading to questions about the informant’s understanding of rules and enforcement.

Although all participants were asked about their understanding of and possible doubts about plagiarism, as well as their understanding of the enforcement of plagiarism rules, TMS was not a predefined topic in the interviews. Similarly, students’ reactions to TMS-related worries were not included in the interview guide.

Questionnaire

The INTEGRITY questionnaire is available along with the raw questionnaire data for this study (see Availability of data and materials below and Goddiksen et al. 2023). The questionnaire covers a broad range of topics relating to academic integrity. In addition, demographic information such as age, gender, country of study, information on study direction (only for Bachelor students) and participation in academic integrity training was collected. The full questionnaire took around 25 min to complete.

This study focuses on one specific question in the questionnaire. For bachelor students, the question was: “During your university education, have you worried about being accused of plagiarism based on an automatic plagiarism check, even though you did not intentionally plagiarise?”. For upper secondary students, ‘university’ was replaced with ‘high school’. There were seven answer options: “Yes, many times”, “Yes, a few times”, “Yes, once”, “No”, “I prefer not to answer”, “Not applicable”, and “I don’t know”. “Not applicable” was included to allow participants to indicate that TMS was not used by their institution. For the data analysis, we excluded the respondents who answered, “I prefer not to answer” and “Not applicable”.

Data analysis

Interview data

All interviews were transcribed, and the Hungarian interviews were translated into English. The transcripts were analysed in two rounds. In the first round, which took place prior to the launch of the survey, all interviews were thematically coded in NVivo 10 using a codebook developed to meet the overall aims of the INTEGRITY project (details in Goddiksen et al. 2021). The second round of analysis used NVivo 14 and was part of the analysis for this paper. In this second round, extracts coded under the themes ‘plagiarism’ and ‘conception of enforcement’ were revisited to identify passages specifically related to worries about unintentional plagiarism, thoughts about TMS and descriptions of behaviour in reaction to the perceived use of TMS.

We created four codes prior to the second round of coding: ‘Reactions to TMS’, ‘Thoughts about TMS’, ‘Other mentions of TMS’ and ‘Worries about plagiarism’. ‘Reactions to TMS’ was used for passages where participants explicitly mentioned that they had taken actions in response to the belief that their assignments were scanned by TMS. Passages where the participant reflected on TMS without necessarily describing actions were coded as ‘Thoughts about TMS’. When participants expressed worries about accidental plagiarism or about the use of TMS, the passage was coded as ‘Worries about plagiarism’.

Results for research question 3 (see Introduction) about reasons for TMS-related worries were drawn from passages coded with both ‘Thoughts about TMS’ and ‘Worries about TMS’. Similarly, results for research question 4 on reactions to TMS-related worries were drawn from passages coded with both ‘Reactions to TMS’ and ‘Worries about TMS’. For completeness, Sect. 3.2 presents a summary of the passages coded with at least one of the codes ‘Reactions to TMS’, ‘Thoughts about to TMS’ and ‘Other mentions of TMS’, but not with ‘Worries about plagiarism’.

All quotes presented below have been cleaned up as needed for clarity, and quotes from interviews carried out in languages other than English have been translated.

Questionnaire data

Responses from all countries were pooled. We identified the share of students from each educational level (upper secondary and Bachelor) who selected one of the five included answer options to the relevant version of the question about TMS-related worries. A Chi2 test was run to examine whether response patterns differed between the two study levels. We then collapsed the statement into a binary variable denoting whether or not the respondent had worried at least once (1= “Yes, many times”, “Yes, a few times”, “Yes, once”; 0= “No”, “I don’t know”). We identified the share of upper secondary students and Bachelor students who had experienced this at least once for each country, and Chi2 tests were run to examine whether this propensity differed across study levels in each country. Finally, for each country, we examined whether propensity to have worried at least once differed between faculties, and used Chi2 tests to examine differences across faculties in each country. In all analyses, we considered a p-value below 0.05 to indicate a statistically significant difference.

Results

The nature and prevalence of TMS-related worries

The first analysis of the interview data identified the participants’ own perspectives on the risks related to TMS. Below, a Bachelor student from Ireland talks about their TMS-related worries:

I: ‘Do you ever feel unclear about what might be called plagiarism?’

IR04: ‘In first year yes, […] everyone is so afraid at the start. Like especially using Turnitin […]. I remember our first assignment we used the [institution’s] cover sheets and that was picking up as plagiarism and everyone was saying ‘why is my similarity score so high?’. […] [A]nd people think maybe I’ve plagiarised accidentally. […] But then I think I have always been told that once you […] say it somewhere you are generally ok […] especially if you are using citations for the text.’

Participant HU23 (BA) also “used to be really afraid” of being accused of unintentional plagiarism, while DK17 (Upp. Sec.) described how the use of TMS led to a “fear of being caught” for any kind of plagiarism among students.

These quotes illustrate that some students worry that they will be accused of plagiarism in written assignments based on output from TMS, even though they did not intentionally plagiarise. To some extent, they also exemplify the two different types of TMS-related worries described in the introduction: those related to unintentional plagiarism and those related to false positives (i.e., cover sheets being picked up as plagiarism). They further illustrate that TMS-related worries may be more prevalent among students who are new to a given institution, because they are not yet familiar with the way that this particular institution uses the software.

Prevalence of TMS-related worries: survey results

The prevalence of TMS-related worries was estimated from the questionnaire data. About half of the total population indicated that they had experienced TMS-related worries at least once during their current studies, and 12% had experienced them many times.

The aggregate results include substantial differences across study levels and countries. Figure 1 compares the two study levels.

Fig. 1
figure 1

Comparing the prevalence of TMS-related worries among the BA (N = 1,639) and Upp. Sec. (N = 1,785) subpopulations

Bachelor-level students were more likely to indicate that they had experienced TMS-related worries at least once (55%; p < 0.001), while 47% of the participating upper secondary students indicated the same. Furthermore, 16% of the participating Bachelor students had experienced TMS-related worries many times compared to 9% of the participating upper secondary students.

There were also significant differences across countries. Figure 2 shows the share of participants who experienced TMS-related worries at least once during their current studies for each country and at each educational level (full details in Additional file 1).

Fig. 2
figure 2

Percentage of Bachelor (BA) and upper secondary students from each country (Switzerland, Denmark, Hungary, Ireland, Lithuania, Portugal and Slovenia, respectively) who experienced TMS-related worries at least once during their current studies

Different patterns were observed in the different countries. For example, in Hungary there was no statistically significant difference between upper secondary and Bachelor students’ tendency to experience TMS-related worries (p = 0.8). In all other countries, there was a difference between study levels (either p < 0.001 or p < 0.01). In Switzerland, upper secondary students were more likely to have experienced TMS-related worries than Bachelor students, while the opposite pattern was observed in all other countries. The most striking example is Ireland, where Bachelor students were more than twice as likely to have experienced TMS-related worries than upper secondary students.

Figure 2 also shows that upper secondary participants from Denmark and Switzerland were substantially more likely to have experienced TMS-related worries than participants studying in the other countries. In addition, participants from these two countries were more likely to indicate that these worries arose many times (15% and 13% of participants in Switzerland and Denmark, respectively, compared to an average of 9%, see Additional file 1). Furthermore, Bachelor students in Denmark also had a relatively high rate of TMS-related worries; even higher than among upper secondary students. Only the Bachelor students studying in Ireland had a higher likelihood of experiencing worries. Students in these countries were also the most likely to have experienced frequent worries (details in Additional file 1). On aggregate, 16% of participating Bachelor students indicated that they worried many times, while 35% of Bachelor students studying in Ireland and 21% of Bachelor students studying in Denmark said the same.

Significant differences in the prevalence of TMS-related worries across faculties were only observed in Hungary (p = 0.004) and Portugal (p = 0.007). In both countries, participants from the social sciences were somewhat more likely to have experienced TMS-related worries than participants from other faculties (details in Additional file 1).

Potential positive effects of TMS

In the interview data, TMS was mentioned by half of the participants, most often in passing as an example of a tool for enforcing local disciplinary rules. In most cases, TMS was the first example of enforcement strategies mentioned by the participants, and it was also often the only example provided.

A small number of students perceived TMS to have a preventive effect and described how their collaborative behaviour would sometimes depend on whether the end product would be scanned by TMS. A Danish upper secondary student, for instance, was asked if it was common for his classmates to plagiarise. Referring to the LMS Lectio, which has built-in TMS, he replied:

DK 24 “No, not in upper secondary school, [where] the fear of plagiarism is much greater. If there is an assignment that we do not have to hand in via Lectio, then I think it is more common.”

In addition to these brief mentions of TMS, some respondents provided further details, which we explore in more detail below. These provide further clues as to the reasons behind the students’ worries and how they react to them.

Reasons for TMS-related worries

The interviews indicate two important reasons for TMS-related worries:

  1. 1.

    Students are aware that plagiarism is in focus at their institution, and punishments are perceived to be rather harsh.

  1. 2.

    Students are largely unaware of how TMS is used in their institution

Regarding the latter, several students expressed the belief that judgements about plagiarism were made either by the software itself or derived directly from TMS output. For instance, in the quote by IR04 in Sect. 3.1, we saw how being presented with a similarity score led the student to worry that they had plagiarised accidentally. Some participants described TMS in even more vague terms as ‘machines’ that detect plagiarism without human involvement.

Other participants were aware that similarity scores from TMS were not the only thing their institution relied on to detect plagiarism. However, it was unclear to them precisely what additional measures existed, as described below by a Bachelor student studying in Hungary:

HU05 (BA): “[I]n addition to the plagiarism filter [TMS], […] I know there are other methods. I cannot see through this. And I have not been on the other side [seen it from the teachers’ perspective…]. But I know that there are methods that they filter with.”

Despite not knowing how it is detected, most of the participants were very aware that their institution did not tolerate plagiarism, and that it was something on which the institution was particularly focused. Punishments were perceived to be harsh, especially for repeated offences:

DK17 (Upp. Sec.): “Plagiarism is obviously something that is taken super seriously. We are not allowed to copy from each other … there are … threats of being thrown out and expelled from the school if you do it too much.”

Bachelor students expressed similar thoughts, including this Bachelor student studying in Denmark:

I: “What does your institution do to prevent and handle cheating?”

DK03 (BA): “I know there is a zero-tolerance policy against plagiarism.”

I: “So they have communicated clearly about plagiarism?”

DK03: “Yes. If you are caught plagiarising you are expelled. And I think that is why so many adhere to these rules.”

Comparing university and upper secondary school, the Irish BA student IR04 quoted above describes the situation at university as “more scary” and continues:

IR04: “[In upper secondary school] they didn’t have Turnitin and it was handwritten […]. I know you can be expelled if you plagiarise here.”

Reactions

When students experience TMS-related worries, there are several possible strategies to reduce the perceived risk. One of these, as mentioned in the Introduction, is independently seeking knowledge about proper citation practice. Participants in our interviews did not mention this strategy, although that is not to say that it was not utilised. However, some described how the use of TMS had made them aware of a need for knowledge. A Bachelor student studying in Hungary explains this:

I: “[A]re you in doubt about what is permissible […] if you use others’ work?”

HU23 (BA): “Well, I used to be afraid, yes. […] For example, if I know that a teacher is taking it very rigorously [sic], I will first send her/him an email to check it out before finalising it. Whether it smells of plagiarism or not. Or, if I know that other people have already read the literature, […] I usually send it to these groupmates. And they do the same, so we are sending these to each other and reading whether it is by any chance plagiarism […]. I’m actually in trouble sometimes […]. Sometimes it is not clear to me. Because it was really [not] taught to us, neither in middle school, nor here. […] I think some kind of crash course […] is necessary. To see what is actually considered plagiarism. To understand this thing.”

Don’t get caught!

As a separate risk-avoidance strategy, students described how the use of TMS makes them focus on not being caught plagiarising rather than on what is good practice. The quote above illustrates this, with its focus on whether the text “smells like plagiarism”.

In some cases, students’ focus on not being caught can make them question practices that appear reasonable to them, or even to adopt citation practices that they perceive to be suboptimal.

One participant, for instance, described how he had noticed that references to sources cited in previous assignments increase the similarity score given by his institution’s TMS. This seemed to make him doubt whether re-use of sources might lead to an accusation of plagiarism:

IR04: “I remember, I used [a source in] one assignment, [and] credited [it] in my bibliography. And then, [for] another assignment […] I used that source again, maybe different page number or something, but it was already in my bibliography [in the previous assignment], and it actually came up on Turnitin. […]. So I was like ‘oh boy’. Because there is the thing where it says plagiarism.”

Other participants described how they include what they perceived to be an excessive number of references (referred to as ‘footnotes’) in their assignments, just to be safe:

DK22 (BA): “I am quite often in doubt [about when a text has been sufficiently paraphrased], which is why I use a lot of footnotes. […] I am really afraid to gamble with these things. So I make a lot of footnotes, roughly 80–90 in a 12-page assignment. I have not had any critique yet, but I don’t think it is what you are actually supposed to do.”

Participant IR06 (BA) described having a similar practice earlier in their studies:

IR06: “I think I’ve got better. […] In first year, […] [t]here would have been a lot of places where instead of putting two notes, I could have just put one.”

I: “Was there a particular reason?”

IR06: “Yes, just because I was afraid that if I didn’t put one after every single sentence, […] it would be construed as plagiarism.”

Discussion

Based on an explorative analysis of qualitative interviews, we set out to explore four research questions about TMS-related worries among European upper secondary and Bachelor students. In particular, we were interested in the prevalence of TMS-related worries, the reasons why they occur, and any potential changes they trigger in students’ writing practice.

Regarding the first research question about how common TMS-related worries are, we found that around half of the participants in the survey had experienced these worries. Bachelor students were more likely to have experienced such worries than upper secondary students. To our knowledge, this is the first time that the prevalence of TMS-related worries has been assessed in a large-scale study. Despite being separated by more than a decade and being obtained from different European populations, our aggregated results for Bachelor students are comparable to previous results from the small-scale study carried out by Dahl (2007).

As regards our second research question about variation in TMS-related worries, we found that worries were prevalent across all sub-populations. The lowest prevalence of TMS-related worries was found among upper secondary students in Slovenia and Bachelor students studying in Hungary. In both of these cases, around three out of ten participants indicated that they had experienced TMS-related worries at least once during their current studies. The highest prevalence was found among Bachelor students in Ireland, where 85% had experienced TMS-related worries at least once.

The data do not allow further investigation into the causes of this variance (cf. Section 4.1). A likely cause is differences in how frequently TMS is used in the different countries and at different educational levels. The large difference between upper secondary and BA education in Ireland could perhaps reflect that the software is mainly used at BA level there, as suggested by one interview quote. If this is true, it would lead to the hypothesis that TMS-related worries among students who believe that their assignments are checked by TMS are more stable across countries than the results presented in this study indicate.

Another possible reason for the variation across countries is that TMS performs less well when scanning texts written in a language other than English – particularly texts written in ‘small’ languages and/or languages that are grammatically very different from English (Foltýnek et al. 2020). In addition, students can plagiarise by translating text written in English into their native language, which the TMS is not yet able to detect. If students are aware of these weaknesses, it may partially explain the differences across countries, especially at the lower educational levels, where assignments are more often written in a local language.

Regarding our third research question, there are likely to be many reasons why students experience TMS-related worries, and we identified two of these. Firstly, participants in the study were well aware that fighting plagiarism is high on the agenda of their institution. Secondly, the participants were less sure of precisely how their institutions fight plagiarism. Specifically, the participants were unclear about the process from handing in an assignment electronically to an accusation of plagiarism being presented to them, except for the knowledge that it somehow involved TMS – which in turn is also largely a black-box technology (Anson & Kruse, 2023; Introna 2016). This blackboxing would be less of a problem if students were confident in their own understanding of plagiarism and in their ability to navigate situations of doubt relating to citation practice that they will face during their studies.

We have previously used the same data sets to show that students are generally not very skilled in good citation practice (Johansen et al. 2022; Goddiksen et al. 2023). Specifically, we showed that although most students are able to identify clear-cut examples of severe plagiarism correctly, they quickly begin to struggle when the cases become less severe (see Childers & Bruton 2016 and Roig 1997; for similar results from the US). Furthermore, students seem ill equipped to identify—let alone understand and navigate—grey-zone cases where it is not immediately clear from the institutional rules whether or not a given citation practice is acceptable (Goddiksen et al. 2023; Johansen et al. 2022).

This combination of a poor understanding of both plagiarism and the process through which plagiarism cases are established creates a fertile ground for TMS-related worries, in the form of both unintentional plagiarism and false positive results. These concerns are exacerbated when combined with the perception that institutions have a very low tolerance for plagiarism.

Our fourth research question concerned student reactions to TMS-related worries. Dahl (2007) had cast doubt on the hypothesis that students who worried about being accused of plagiarism based on automated checks are more likely to independently seek knowledge about proper citation practice. Our interviews did not specifically address this question, but we saw no indication that TMS-related worries led to students actively seeking knowledge about good citation practice. However, our interviews did indicate that some students may become aware of a lack of knowledge because of their worries, and this may lead them to become interested in learning about good citation practice.

Our interview data also pointed to other strategies that students use to reduce their risk of being accused of plagiarism. Specifically, we showed that students who experienced TMS-related worries sometimes adapt their citation practice to ensure that their texts will be cleared by the software. This is similar to what Wringley (2019) describes as ‘de-plagiarism’ of texts. This could be seen as a positive outcome, at least in cases where students begin to follow correct citation practices. However, our interview data revealed examples of students following citation practices they themselves perceived to be suboptimal (and rightly so, as argued by Wringley 2019) as a way to avoid plagiarism accusations based on TMS results. We were not able to determine how often these strategies were employed, only that it was the only type of strategy mentioned by our participants.

Good citation practice goes beyond avoiding the kind of blatant plagiarism that can be detected by TMS (Childers & Bruton 2016). It is therefore problematic if students become overly focused on avoiding detectable plagiarism of text. From an academic integrity point of view, we want students to learn proper citation practices rather than improper practices designed solely to reduce the risk of TMS-based plagiarism accusations.

We note that just as TMS may lead students to become overly focused on avoiding accusations of plagiarism of text, the software may also lead teachers and institutions to become overly focused on text similarity (Introna 2016). The above-mentioned strategies to avoid being caught by TMS work best if the TMS is given high authority within an institution in terms of whether or not there may be plagiarism in an assignment, i.e., if assignments cleared by the TMS are regarded as plagiarism free. As argued in the Introduction, this is not a valid approach, as plagiarism of ideas and images as well as translated text may still be present in assignments that contain no significant text overlap. However, heavy reliance on TMS in the fight against plagiarism that is not based on an explicit and shared understanding of good citation practice may result in an understanding among both students and teachers that anything not caught by the TMS is acceptable. This, in turn, may generate a perverse incentive for students to focus on rewriting text taken from external sources so that it bears no resemblance to the source text, rather than on being transparent about where they have gathered their information from. The common practice of granting students access to screen their assignments using TMS prior to submission may potentially enhance this practice (Elkhatat et al. 2021; Attwood 2008).

Limitations

Our study has several limitations. Firstly, the low response rate in the survey study increases the risk of non-response bias, in the sense that students with a particular interest in academic integrity could be overrepresented in our sample (as discussed more fully in Goddiksen et al. 2023). If personal experience correlates with interest, then it could mean that those who experienced TMS-related worries were more likely to answer, leading us to overestimate the prevalence of these worries. In addition, the intentional disproportional sampling that enabled the comparison across faculties may have led us to overestimate the prevalence of TMS-related worries in Portugal and Hungary, as participants from social science disciplines in these countries were more likely to have experienced TMS-related worries than participants from other faculties.

Furthermore, it is likely that our results do not reflect the prevalence of TMS-related worries among students whose assignments are actually checked by TMS. It is likely that some of the students whose assignments are not checked by TMS (or who think they are not) will have answered ‘no’ to the survey question rather than ‘not applicable’. This means that the share of students who actually believe that their assignments are checked by TMS was difficult to assess. As mentioned above, this is particularly relevant when comparing the prevalence of worries across countries and educational levels.

Furthermore, as already noted, the serendipitous nature of the qualitative findings means that we were unable to make a more complete inventory of the different strategies that students use to mitigate their TMS-related worries. It is thus likely that students use other strategies than the ones described in this paper, and our data do not even allow us to estimate the relative prevalence of the known strategies, because they were not included in the questionnaire.

Perspectives

Despite the limitations, our study indicates that it is important that institutions do not view TMS as a magic bullet to solve their plagiarism problems (Bretag and Mahmud 2009). The software may only be of limited help in preventing plagiarism (cf. Introduction), and it is likely to have a number of negative side effects, depending on how it is used. If an institution decides to use TMS, it is vital that it is complemented with proper training for both students and teachers. Teachers must learn how to use the software, be aware of its limitations and have a shared understanding of what plagiarism is and how it is managed at their institution (Sutherland-Smith and Carr 2005).

Training for students should clearly explain how plagiarism cases are established at their institution, including the role played by TMS. In addition, it is important to train students in proper writing and citation practice that goes beyond avoiding getting caught by TMS. This of course applies to any higher education institution (Crean et al. 2023), but as we have shown here, institutions using TMS should be particularly conscious of this, as they may be reinforcing the students’ tendency to focus on avoiding blatant plagiarism of text rather than optimising their citation practice.

The need for transparent processes has increased since the arrival of readily accessible large language models such as ChatGPT in late 2022, and the subsequent attempts from companies developing TMS to incorporate AI-generated text detection into their software. While traditional TMS already involves a substantial element of blackboxing in terms of how it reaches the specific percentage of overlap between an assignment under consideration and the background library, the software often also shows the original sources and the precise passages where overlap occurs. This means that, although blackboxing is involved in the process, it need not be central to the final argument for why the student is accused of plagiarism. If the other steps in the process are sufficiently explained, the students should be able to understand the process. However, when it comes to detecting AI-generated material, the software plays a much more central role, as the score indicating the probability that a given passage is AI-generated is a crucial part of the evidence (Weber-Wulff et al. 2023). As a result, the software can no longer be considered an intermediate part of the process of constructing a misconduct case. The lack of transparency in the software therefore becomes an even bigger problem both for the due process of the student, and, potentially, also for the processes described in this paper linking TMS to worries and suboptimal writing practices among students. Unfortunately, our data were collected before software designed to detect AI-generated text was put on the market. Therefore, we cannot use it to assess its impact on students’ worries and behaviours, but it would be an obvious extension of the present study.

Conclusion

We conclude that current uses of TMS generate widespread worries among students in terms of being accused of plagiarism, even when they do not intentionally plagiarise. These worries can be problematic, partly because they may be detrimental to efforts to create a safe learning environment, and partly because they appear to be a symptom of: (a) a lack of transparency in the processes through which plagiarism accusations are constructed, and (b) a lack of understanding among students about proper writing and citation practice. This, in turn, indicates that some higher education institutions did not, at the time of data collection, provide sufficient (or sufficiently effective) training for students to become good academic writers, nor for teachers to learn how to use TMS in a way that promotes good practice rather than creating an unhealthy focus on not being caught by the software.

Data availability

Interview data: To protect the privacy of the informants, the raw data for this study (anonymised interview transcripts) cannot be made publicly available. Arrangements for sharing the raw data can be made by contacting the authors directly.

Survey data: The complete dataset from the INTEGRITY survey, which includes the survey data for this study, is available here: https://doi.org/10.17894/ucph.66293057-46c5-4d02-a116-dfd597ce5a78.

Abbreviations

TMS:

Text-matching software

LMS:

Learning Management System

BA:

Bachelor level

Upp. Sec.:

Upper Secondary School level

Hum.:

Humanities

Soc.:

Social sciences

STEM.:

Natural, medical, mathematical, and engineering sciences

References

Download references

Acknowledgements

Special thanks to Una Quinn, Roman Globokar, Linda Hogan and Céline Schöpfer for help with the data collection. Una Quinn also contributed to the first round of interview data analysis. The authors would also like to thank the participants in the study and everyone who helped with recruitment. Finally, we thank our partners in INTEGRITY for their help and useful dialogue, and Mark Harvey Simpson of GlobalDenmark A/S, and Sarah Layhe for language editing.

Funding

Open access funding provided by Copenhagen University. The study was funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 824586. The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript.

Open access funding provided by Copenhagen University

Author information

Authors and Affiliations

Authors

Contributions

 

Study design

Data collection

Data analysis

Data interpretation

Writing first complete draft

Revising and commenting

MPG

x

x

x

x

x

x

MWJ

x

x

x

x

x

x

ACVA

 

x

   

x

MC

 

x

   

x

CC

 

x

   

x

EG

 

x

   

x

NK

 

x

   

x

MM

 

x

   

x

IO

 

x

   

x

MP

 

x

   

x

JS

 

x

   

x

RS

 

x

   

x

VS

 

x

   

x

OV

 

x

   

x

PW

 

x

   

x

PS

x

  

x

x

x

TBL

x

 

x

x

x

x

Corresponding author

Correspondence to Mads Paludan Goddiksen.

Ethics declarations

Competing interests

None declared.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goddiksen, M.P., Johansen, M.W., Armond, A.C.V. et al. The dark side of text-matching software: worries and counterproductive behaviour among European upper secondary school and bachelor students. Int J Educ Integr 20, 15 (2024). https://doi.org/10.1007/s40979-024-00162-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40979-024-00162-7

Keywords