An estimated 25% of daily search queries are related to pornographic material with a staggering 116,000 daily search queries involving child content. While the creation of pornographic material among consenting adults is not illegal, the use of children for sexual stimulation is illegal and constitutes abuse. The global prevalence of child sexual abuse is estimated at 19.7% among females and at 7.9% among males, with approximately 30% of such abuse happening by close relatives, a further 60% by acquaintances and only 10% attributed to complete strangers.
These figures are worryingly high and it is within our societal duties to assist law makers in the protection of children. Thus, in 2018, academics from the University of Malta joined forces with the University of Leon, the University of Groningen, three law enforcement agencies and a victim support group under the coordination of INCIBE - the Spanish National Cybersecurity Institute, to apply for European funding for a project called 4NSEEK, which was intended to investigate the use of Artificial Intelligence to fight child sexual abuse. The Maltese team was tasked with the use of AI to automatically locate and label exposed private body parts in images.
Photo by Kat Smith on Pexels
To curb the distribution of child pornographic material, Law Enforcement Agencies (LEAs) and online media platforms, such as Facebook and Twitter, need to sift through huge amounts of image and video content to identify any illicit content. To go through the millions of images and videos uploaded online, LEAs and online platforms use tools such as PhotoDNA to detect explicit content. This tool, developed in 2009 by Microsoft, describes each image by a digital signature known as a hash. The hash can be considered as a unique fingerprint of the image. This means that if an image bearing the same hash or unique identifier as another image appears on the internet, then, we would know that the two images are identical. Hashes, therefore, help to block known criminal images from circulation. The International Watch Foundation maintains a database of such criminal images, which are identified and labelled by experts so that LEAs can trace any attempts to re-circulate any known criminal images.
The main limitation with such techniques is that the hashing works only if the image was previously seen and labelled as pornographic. It does not work for new or previously unseen images. This is how machine learning and artificial intelligence come into play. By creating algorithms that extract relevant features from images, machine learning and AI make it possible to automatically determine if the previously unseen images contain potentially explicit content.
Initially such features were manually engineered to use characteristics such as colour, texture and shape around private body parts. These features were used to train classifiers, such as the AdaBoost classifier, to determine if the features obtained from a new image match those obtained from known private body parts. Unfortunately, early feature-based methods performed poorly mainly because the features were extracted from small, local regions of the image and, taken out of context, other innocent body regions, for example the belly button, can have similar characteristics confusing the classifier.
In more recent years, deep-learning algorithms were developed and these outperformed most classical feature-based image classification problems thanks to their ability of learning a complex array of features. Deep-learning algorithms, however, require a large number of example images so that the network may learn the features required for classification.
Images containing consensual adult pornography may be obtained relatively easily. In our work, these examples were obtained through our collaboration with the University of Leon. This dataset consists of images obtained from the web and which have been manually grouped according to image category. We first used this dataset to classify images as pornographic or benign. For this classification step, we used a MobileNet architecture, selecting this architecture since its speed and light-weight characteristics that would allow for the fast processing of images. While this would allow LEAs to quickly sift through large quantities of images and identify any suspect content, this would still require LEAs to go through the flagged images to identify their severity.
To assist with the identification of the image severity, we re-labelled the images, this time, labelling the different private body parts by manually drawing a bounding box around them and naming them. We use this labelled data set to train a YOLOv3 architecture to identify and label private body parts, thus providing LEAs with a further description of the image content.
While training with adult pornographic content allowed us to create initial trained models for laboratory-based evaluation, these models need to be adaptable to child sexual abuse images. Due to the ethical sensitivity around such content, researchers cannot directly validate the models or create the required training images to fine-tune the models. To protect the researchers while respecting the sensitivity of the data, this evaluation was carried out by the members of the Cyber Crime Unit of the Malta Police Force. This evaluation showed that the models initially trained on adult content could generalise well to the more severe child content.
The architectures described above were intended for single, still images. Similar issues occur in video content where the problem is exacerbated by temporal ambiguities and consequently, the volume of data that needs to be processed. This ambiguity arises because in video content, pornographic segments may be embedded between segments that appear innocent or benign and so, the entire video must be scanned to determine whether it contains pornographic content. To address this problem, we include a recurrent neural network (RNN) to the detection pipeline which allows us to use the temporal information present in videos. Using this pipeline we are then able to localise pornographic content within a longer video. In this way, LEAs may skip any benign parts of a video and review the segments which the deep-learning algorithms flags as pornographic, considerably speeding up an investigation. In addition, we use the detection of sexual objects described above to generate an estimate of the severity or "harmfulness" of the pornographic content, allowing us to rank videos according to their severity.
The 4NSEEK project came to an end in June 2021. The contributions of the three research institutions have been grouped into a single tool which Law Enforcement Agencies around the world can obtain and use in the fight against child sexual exploitation. In addition to the image content descriptors described here, this tool has other AI-based tools such as age-estimators, file-name quality checkers, and camera forensics among others. These add to the arsenal of tools available to LEAs to provide a better chance at protecting our children from abuse.
This article describes work carried out under the 4NSEEK project which was funded with support from the European Commission under the Grant Agreement 821966. The work was carried out by Andre Tabone, Mark Borg, Stefania Cristina, Alexandra Bonnici and Kenneth Camilleri from the Department of Systems and Control Engineering, and Reuben Farrugia from the Department of Communications and Computer Engineering.
André Tabone, Kenneth Camilleri, Alexandra Bonnici, Stefania Cristina, Reuben Farrugia, and Mark Borg. 2021. Pornographic content classification using deep-learning. In Proceedings of the 21st ACM Symposium on Document Engineering (DocEng '21). Association for Computing Machinery, New York, NY, USA, Article 15, 1–10. DOI:https://doi.org/10.1145/3469096.3469867
Tabone, A., Bonnici, A., Cristina, S., Farrugia, R. A., & Camilleri, K. P. (2020). Private body part detection using deep learning. 9th International Conference on Pattern Recognition Applications and Methods. 205-211.