Skip to main content

Table 4 Sample digital watermarking processes

From: Artificial intelligence, text generation tools and ChatGPT – does digital watermarking offer a solution?

Generation

1. Generate the text using the current ChatGPT process and a random seed

2. Split the text into 5-grams (non-overlapping sequences of 5 consecutive words)

3. Where possible, remove the 5th word in each 5-gram, then generate an alternative word that the language model behind ChatGPT indicates will fit, but which will not change the meaning of the original text. Replace the previous word with the new word. Replacing the word may not always be possible. For example, it might not be sensible to find an alternative if the word is “the”. The replacement should use the secret seed so that the word generated with the same input is always consistent

4. Return the text with the replacement words to the users as the output

Detection

1. Split the text into overlapping 5-grams

2. Iterate through the 5-gram. In each case, remove the 5th word from the 5-gram and then use the language model behind ChatGPT to identify what the expected word should be. This process uses the secret seed. Record if the removed word and expected word is the same

3. Calculate a text generation score based on:

the number of n-grams with a matching word / the total number of n-grams