An NGram-based Copyright Protection for Digital Images

This paper introduces an NGram-based approach for digital images copyright protection. The advantage of the proposed approach compared to the existing works, is that it does not always require the whole elements (e.g., bits) of the watermark pattern to be embedded into the original digital image. This, in turn, allows us to protect the digital images while at the same time minimizing the chances of having low quality marked digital images. The best case occurs when no element of the pattern is embedded into the original digital image. In contrast, the worst case, which rarely happens, occurs when all elements are embedded into the digital image. Moreover, the use of an NGram approach allows us to efficiently and easily reach any part of the image using the corresponding level numbers and addresses. This makes it more efficient especially for complex and high-dimensional data (e.g., images and videos). Experimental results show the effectiveness of the proposed approach in terms of the ability to recover the watermark pattern from the marked digital image even if major changes are applied to the original digital image.


Introduction
The proliferation of both digitized images and advanced image-processing applications has made the modification and duplication of images much easier than before. Therefore, it is becoming increasingly important to have advanced watermarking technologies for copyright protection of digital images [10] [14] [7]. Technically speaking, if someone is looking to protect her digital image, she has to register the image with the trusted Copyright Office (CO). This however, can be done by submitting a copy to them. The CO archives the digital image, along with information about the rightful owner. When dispute occurs, the owner contacts the CO to obtain proof that she is the rightful owner [23] [6] [24]. If the owner's digital image was not registered, then at least, the owner should be able to provide the film negative. However, with the rapid and increased acceptance of digital photography, the film negative might not be available. Digital watermarking can be considered as the process of embedding or hiding Identification Information (IF) into digital data such as images and videos, in order to prevent attackers illegally use the protected data without getting the acceptance of the owner [25] [17] [5]. In particular, the IF is called a Watermark Pattern (WP) and the original digital image, after embedding the pattern, is called a Marked Image (MI). In fact, this process takes place by changing the contents of the digital image [5]. Moreover, a secret key is used, in order to guide us on how to hide the WP into the image to produce the MI. Fig. 1.1 shows the schema of the digital watermarking. The problem of the existing watermarking approaches is that they work on improving hiding algorithms to produce robust marked images. Two disadvantages are associated in these approaches: 1) the protection cannot be achieved without affecting the quality of the protected images. In other words, the protection is obtained on the account of the quality; 2) when the protecting algorithm is applied on big and highdimensional data (e.g., images), the time needed for the protection and verification largely increases. To address the above-mentioned shortcomings, we propose an NGram-based approach for digital image copyright protection. The proposed method does not always require the whole bits of the watermark pattern to be embedded into the images. This enables us to protect the digital images while at the same time minimizing the chances of having low-quality marked digital images. The best case occurs when no bit in the pattern is embedded into the original digital image. In contrast, the worst case, which rarely happens, occurs when all bits are embedded into the digital image. The proposed method allows us to efficiently reach any part of the image using the corresponding level numbers and addresses. This, in turn, makes it more efficient, especially for high-dimensional data such as images and videos. Our results show the effectiveness of the proposed approach, compared to the existing approaches, in terms of the ability to recover the watermark pattern from the marked digital image even if major changes are applied to the original digital image. The rest of this paper is organized as follows. Section 2 reviews some previous work. Section 3 briefly discusses the NGram transform while Section 4 explains the proposed watermark method. Section 5 shows the experimental results. Finally, Section 6 concludes the paper.

Related Work
There are many works propose different approaches for digital watermarking [10] [14] [7] [5] [9]. In fact, digital watermarking can be classified into the following three types: embedding-based, non-embeddingbased and semi-embedding-based approaches [ (Fig. 2.1). The embeddingbased approach is a classical approach used to embed all pattern's bits into the original image. The non-embedding-based approach is a new concept defined in [10] [2], which is based on visual cryptography [14] and does not require the watermark pattern to be embedded into the original digital images. Instead, verification information is generated, which is used to verify the ownership of the digital image [10] [14] [20] [3]. Finally, the semi-embedding-based approach is a combination between the embedding-based and non-embedding-based approach. In particular, this approach (proposed in this paper) has the advantage that there is no need to embed all the pattern's bits into the original image. This, in turns, allows us to protect digital images while at the same time minimizing the chances of having low-quality marked digital images [21] [22]. Sleit and Abusitta [19] propose a visual cryptography based digital image copyright protection. The proposed framework is based on visual cryptography defined by Noor and Shamir [14]. They propose to select random data points from the original data rather than specific data points. Their method has an advantage that it does not require the watermark pattern to be embedded into the original data. To this end, verification data is produced which can be adopted to verify who owns the digital data. Similarly, other works (e.g., [8], [10]) present an improved approach based on Visual Cryptography. The techniques used also do not need to embed the pattern into the original image. Instead, verification information is used to prove the ownership of the group of digital images. Recently, Abusitta [1] proposes a new method based on the relationship between randomly selected pixels and their 8-neighbors' pixels. This relationship keeps the marked image coherent against diverse attacks even if the most significant bits of randomly selected pixels have been modified. Our method is close to [10], which is based on visual cryptography [14]. The method proposed in [10] is based on the simple (2, 2) visual threshold scheme proposed by Naor and Shamir [14]. In this technique, the owner of the digital image has to choose w(pattern's weight) * l (pattern's length) black/white image as a watermark pattern Pt and a secret key K. Then, V I (Verification Information) is created from the original digital image N * 1 IM and the pattern Pt using the secret key K; as the following steps: Step 1: Make the secret key K as the starting point (i.e., seed) to create w * l different random numbers over the interval [0, N * 1].  Step 3: Aggregate all the (V I i1 , V I i2 ) pairs to build the V I. The V I should be given to the trustworthy neutral organization (T NO). if the owner would like to claim that an image IM' is a copy of IM (i.e., the original digital image), she should provide the key K to the T NO, and the pattern Pt is restored using IM' and V I as the following steps: 1. Use K as the starting point (i.e., seed) to create w * l different random numbers over the interval [0, N * 1].
2. The color of the i-th pixel of the pattern Pt' will be assigned based on IM' as follows: 1. Obtain the left-most bit, bit, of the Ri-th pixel of IM', and if bit is 0 then, assign V I i = (0,1); otherwise assign V I i = (1,0).
2. If V I i is equal to the i-th pair of V I then assign the color 'white' to the i-th pixel of Pt'; otherwise, assign the color 'black' to the i-th pixel of Pt'.
3. If Pt' can be known and recognized as Pt, the T NO will define that IM' is a copy of IM.
The above-mentioned method is not robust against several changes applied to images: illumination, rotation, distortion, and image scaling. Fig. 2.3 shows some results. In Section 4, we present the proposed method to address the above-mentioned deficiencies.

The NGram Transform
The NGram transform is a function defined by [16]. It starts with a stream of tokens as can be seen in Fig.  2.4. In Fig. 2.4, every two pixels from an image is treated as a token. These are combined in pairs to produce new tokens. Each new token is "learned" -stored in a list or lookup table. If the token has been seen before, its count is incremented; otherwise the token is added with a count of one. For each token pair input, a single token is output, representing the "address" or name of the token pair in the list. Thus, at the first, or lowest level, the pixel pair "001100" becomes "A", the name of the list entry that stored the "001100" token. The next pixel pair, "110010" becomes "B", etc. The resulting output stream, "ABCDEFGDHDIJ. . . " is processed the same way, creating a level 2 list and a new output stream. This process is repeated until a single token result. The original input token stream can be recalled from a result token/level pair. Starting with the "A" token at level 6, look up the A entry in the level 6 dictionary. The result is "AB". Look up each of the "A" and "B" in the level 5 dictionary, resulting in "AB" and "CD". Look up each of the "A", "B", "C" and "D" in the level 4 dictionary, resulting in "AB", "CD" , "EF" and "GH" . Continue this process through level 1, resulting in the original input tokens. Any part of an image can be reached using the level and address. For example the first four pixels of an image are reached by using "A 2", which means level "A" and address 1.

The proposed method
To illustrate the proposed method, we introduce the following example. Assume that we have an image like the one in Fig. 3.1, and we would like to mark it using the pattern in Fig. 3.2. The proposed method works as the following steps: A. Key Generation The owner should select 11 (number of bits in the pattern) even numbers randomly. The summation of these numbers equals to Image Height (H) * Image Width (W), if the total number of pixels in the image is even, and equals to (H * W)-1, if the total number of pixels in is odd. In this example, we have 8 * 8 = 64 pixels. Assume the owner has selected the following numbers: 4, 10, 2, 10, 6, 4, 4, 2, 4, 8, 10.

B. Selection Process
We divide the image (Fig. 3.1) based on the selected numbers in the previous step. Fig. 3.3 shows the image after the division has been applied. Note that in Fig. 3.3, each part of the image can be reached from the level number and address using the NGram Transform presented in Section 3.   Thereafter, the owner should select 11 different numbers from 1 to 11 randomly; each number represents a specific part of the image. For example, 5 represents part 5, 2 represents part 2 and so on. Assume that the owner has selected the following numbers (5,2,1,10,3,6,4,8,9,7,11), and these numbers represent (part 5, part 2, part 1, part 10, part 3, part 6, part 4, part 8, part 9, part 7, part 11), respectively.

D. Hiding process
In this process, each bit in the pattern (Fig. 3.2) should be hidden in one part. In our example, the first bit will be hidden in part 5, the second bit in part 2, the third bit in part 1, and so on. The hiding process is applied from the most significant bit to the least significant bit. The following example illustrates the proposed hiding algorithm. Assume that we want to hide the first bit of the pattern in Fig. 3.2. The value of this bit is 0, therefore it will be hidden in part 5 by XORing all pixels in part 5 and looking firstly at the most significant bit on the result. If the value is 0, the bit will be considered as already hidden. Otherwise, if the value is 1, then we move to the left. If the value is 0, the bit will be considered as already hidden. Otherwise, if the value is 1, then we move to the left, if the value is 0, the bit will be considered as already hidden. Otherwise, if the value is 1, the last bit in the last pixel (in part 5) will be changed. So, in our example, we have 100 XOR 110 XOR 011 XOR 111 XOR 101 XOR 100 = 111, in part 5. Since there is no 0 in 111, we change the last bit in the last pixel (in part 5), as shown in Fig. 3

.4.
After hiding all bits of the pattern in the image, a new image will be generated, as can be seen in Fig. 3.4.

D. Verification.
The owner sends the marked image, serial NGram ( Fig. 3.5) and check-sum to the neutral organization. If the owner wants to prove the ownership of some data F, a watermark pattern P' is generated from the image, the verification is done as follows: • The owner gives the VI (Fig.3.6) to the T NO • The T NO determines the check-sum of the VI and compares the result with the check-sum provided by the owner. • If the calculated check-sum is equal to the received check-sum, the organization uses the V I to retrieve the pattern from the image. • To retrieve the pattern, we need first to apply image rotation from 1 to 360 degree. For each degree, we set a counter to zero and compare each pixel in the registered image with the corresponding pixel in the rotated image; the counter is incremented by one, if there is a match between each two pixels. The output of this step is shown in 3.7, which presents the number of successful matches with respect to different degrees of image rotation. • We select the rotated image that leads to the maximum number of matches in order to be used to extract the V I from it.
It is very obvious that the above two steps solve the problem of image rotation that might be applied by the attacker to mislead the proposed watermarking method. The pattern can be extracted from the image using the V I. For example, the first bit in the pattern can be extracted using the first part of the V I (M1 H2 3). This part means go to the address M at level 1 according to Fig. 3.7 (NGram), the result here is "100110", then go to the address H at level 2, the result is "N I", then look at "N" at level 1, the result is "011111" and look at "I" at level 1, the result is"101101". The result is: 100 110 011 111 101 101. Finally, we XOR all those pixels ( 100 Xor 110 Xor 011 Xor 111 Xor 101 Xor 101) and look to the third bit in the result, which represents the first bit of the pattern.

Experimental Results
We implemented the proposed approach in a 64-bit Windows 10 environment. The proposed method was studied on Lina, Beans, Moon images. The watermark pattern used is "Cheng". The size of "Cheng" is 180 (width) x 97 (length). This pattern has been used in several works (e.g., [10], [19], [8], [1]). We tested the robustness of the proposed method against several changes applied to the images: illumination, rotation, distortion, scaling and combination between scaling and rotation. Figs. 4.1, 4.2 and 4.4 demonstrates that the watermark pattern can be recognized even when major changes were made to the original image.

Conclusion
In this paper, we present an NGram-based approach for digital images copyright protection. The proposed approach does not always require the whole bits of the watermark pattern to be embedded into the original digital image. This enables us to protect the digital images while at the same time reducing the chances of having low-quality marked digital images. The best case occurs when no bit of the pattern is embedded into the original digital image. In contrast, the worst case, which rarely happens, occurs when all bits are embedded into the digital image. The NGram transform allows us to efficiently reach any part of the image using the corresponding level numbers and addresses. This makes it more powerful for complex and high-dimensional data such as images and videos. Our results show that the proposed method was able to  recover the watermark pattern from the marked digital image even when major changes were made to the original digital image.