1. Preventing CSEA – the need for new tools and strategies

Child sexual exploitation and abuse (CSEA) was a problem before the Internet. However, the Internet opened up new and distinct opportunities for CSEA offenders in terms of new and easily accessible spaces where children spend time, without the presence of guardians such as their parents. Chat rooms are the most commonly reported online setting where the initial interaction between the offender and child takes place (Baumgartner et al., 2010; Bergen, 2014; Malesky, 2007; Mitchell et al., 2007; Webster et al., 2012; Wolak et al., 2004). Other reported settings are gaming platforms (Webster et al., 2012), and social networks (Mitchell et al., 2010a), which often have a messaging and chat feature, which provides an opportunity for adults with a sexual interest in children to contact children directly (Broadhurst, 2020; Yar & Steinmetz, 2019). These online spaces are routinely used by children as young as eight years old (Broadhurst, 2020; Marcum, 2007; Shannon, 2008). Research has shown that many of the convicted adults had engaged in solicitations of a large number, sometimes hundreds, of children simultaneously (e.g. Leander et al., 2008; Seto, 2013; Webster et al., 2012). For example, in 2019 a 27-year-old male offender was sentenced for abusing 270 children, mainly boys under 14 years old, over a period of 2.5 years (18-130781MED-NERO), and in 2017 a 51-year-old male offender was sentenced for committing CSEA against 111 children, mainly 11–12-year-old girls over a period of 11 years (LE-2016-99126).

Crime stats and numerous reports from organizations such as ECPAT (2018) and Europol (2018; 2019) show that offenders efficiently exploit these opportunities, with the result that abuse may take place in the childʼs bedroom with parents at home, unaware of the ongoing crime. ECPAT (2020) states the following about grooming on their website “The proliferation of social media, messaging and live-streaming apps in recent years has seen a dramatic increase in reports of this crime”.

Zooming in on the problem in a Nordic context, Kloppen et al. (2016) did a literature study of child sexual abuse in Nordic countries from 1988–2013. The prevalence of CSEA ranges between 3.1–22.5% for boys and 11.2–35.6% for girls, but the share of online CSEA is not specified. Aanerød and Mossige (2018) studied the scope of online CSEA in Norway between the years 2015–2017, and found that during this period, there was a considerable increase in notifications to the National Criminal Investigation Service (NCIS) about CSEA. They also found that there was an increase of cases opened for criminal investigation, and the gravity of the cases was worsening.

In 2017, fighting online CSEA was adopted by the EU as one of ten top priorities in the fight against organised and serious international crime (Council of the European Union, 2017). At the same time, the Norwegian government decided on the Escalation Plan against Violence and Sexual Abuse for 2017–2021 (Prop. 12 S (2016–2017)). On the international level, there has been collaboration and several joint police efforts against CSEA, i.a. through the development of INTERPOLʼs Child Sexual Exploitation Database (ICSE) (INTERPOL, 2017), and joint investigation teams coordinated through Europol (Europol, 2020).

Identifying effective strategies for preventing online CSEA has been and still is a challenge that remains unsolved. However, by making use of automation and current machine learning technology trained to recognise CSEA offendersʼ behavioural and linguistic patterns, the police may more efficiently and accurately identify online spaces where children are at risk of CSEA. To this end, we introduce PrevBOT (denotes ‘crime preventive robotʼ). At the moment, PrevBOT is merely a concept, but if developed, it may be used by the police to prevent online CSEA initiated in a chatroom environment. Inspired by a desire to contribute to a safer online environment for children, we have explored this potential tool from different perspectives, by including knowledge and research concerning online automatic policing, forensic linguistics, criminology, machine learning, and the law. From the outset, it is clear that such a tool must be carefully developed and maintained and implemented in a manner that ensures fair treatment of the citizens. Thus, we have set out to comprehensively identify and analyse conditions decisive to the development and operation of PrevBOT, in an effective, fair, and lawful manner. The research questions we explore are (a) Whether machine learning applied to Authorship Analysis is suitable for developing PrevBOT; and (b) Whether PrevBOT may be realized in accordance with fundamental rights to data protection, privacy and fair trial. Part I centres on the first research question, and Part II (Sunde & Sunde, 2021) focuses on the second.

To explore the problem PrevBOT is tasked to prevent, we draw on research on the online CSEA phenomenon, offender demographic characteristics, and offender behavioural and linguistic patterns, described in Section 2. In Section 3 we describe the PrevBOT concept, and how relevant features may be implemented in the PrevBOT technology for the purpose of detecting potential CSEA offenders in online chat rooms. The computational methods and technology developments that may realise PrevBOT as a fully functioning tool, are outlined and discussed in Section 4. The PrevBOT concept as such is rooted in situational crime prevention (SCP) and Extension theory perspectives, which are elaborated in Section 5. This prepares the ground for a legal analysis of perspectives related to development and implementation of PrevBOT as a tool supporting the police in preventing online CSEA, offered in Part II (Sunde & Sunde, 2021).

The scope of this paper and the many perspectives entails some necessary compromises. We have excluded issues concerning the practical aspects relating to the implementation and maintenance/updating of PrevBOT. Finally, the article is primarily aimed at addressing preventive police work in a Norwegian – and to some extent Nordic – context, where we have gained our professional expertise and experience.

2. Background and literature review

The literature review centers on the online grooming process where offenders communicate with a child via the internet, and excludes research concerned with child sexual abuse material (CSAM) offences, although many have committed both types of offences (Shelton et al., 2016). Sexual grooming refers to the process whereby a potential offender prepares a child for sexual abuse. With the intention of committing a sexual offence against a child, the offender moves forward, applying a specific set of steps, each with a specific goal, including secrecy, compliance and ultimately gaining access to the child (Craven et al., 2006). OʼConnell (2003) described grooming as a five-staged process, which includes friendship forming, relationship forming, risk assessment, exclusivity, and sexual stages. Although the grooming process has evolved since 2003, especially reflected in the high-speed process we see today (see Section 2.2), OʼConnellʼs model still fits the modus operandi of grooming.

2.1 Who are the CSEA offenders?

The vast majority of CSEA offenders are men (Bergen, 2014; Briggs et al., 2011; Mitchell et al., 2010b; Webster et al., 2012), often between 25–45 years old (Bergen, 2014; Lanning, 2010). The proportion of women reported in e.g., victim surveys is estimated to 15–25% (Bergen, 2014; Wolak et al., 2006). Research has suggested various typologies, such as “contact driven” vs. “fantasy driven” offenders (Briggs et al., 2011; Mitchell et al., 2007), “stalkers”, “cruisers”, “masturbators”, “networkers” or “swappers”, or combinations of these (Hall & Hall, 2007), and “situational” vs. “preferential” offenders (Lanning, 2010). Online CSEA offenders often have considerable knowledge about technological measures to help protect their identity online, by using e.g. prepaid mobile phone cards and encrypted chat channels (Bergen, 2014; Webster et al., 2012). They may be paedophile (sexual interest in prepubescent children), or hebephile (sexual interest in pubescent children, typically 11–14), but not necessarily, and the proportion varies from study to study. For example, Seto et al., (2012) studied a sample of convicted CSEA offenders and found that 1.4% self-reported a paedophilic disorder, and 30% reported a hebephilic sexual interest. Krueger et al. (2009) found that 27% of their sample of adults arrested for soliciting youth had a paedophilic disorder. It seems thus that only a minority of adults who have solicited children or youths can be considered paedo- or hebephilic (Bergen, 2014; Seto et al., 2012).

2.2 Grooming behavioural characteristics

Among behavioural characteristics among online CSEA offenders, research indicates that it is common to hide or mask oneʼs identity online (Bergen, 2014; Briggs et al., 2011; Dowdell et al., 2011; Malesky, 2007; Seto et al., 2012; Shannon, 2008; Wolak et al., 2004). Bergen (2014) found that pretending to be younger than actual age, pretending to be a child or teenager, and using a false picture, were the three most common ways of masking the identity, which was consistent with earlier research. Another common masking strategy found by Bergen (2014) was to lie about gender. The strategy of lying about age/gender has been used in several Norwegian CSEA cases (see e.g. verdicts 18-130781MED-NERO; TNERO-2015-155920).

Online CSEA offenders commonly target strangers (Ferreira et al., 2011; Finkelhor et al., 2000), and not exclusively children, but also adults (Bergen, 2014; Briggs et al., 2011). Malesky (2007) found that offenders appeared to be predominantly motivated to contact children and teenagers who posted sexual content in any form, e.g. in the online profile, screen name, or messages. Online CSEA offenders often use persuasion techniques when they approach the child (Briggs et al., 2011; Malesky, 2007), by offering money or gifts (Bergen, 2014; Shannon, 2008; Wolak et al., 2004); by blackmail (Shannon, 2008) or promised love or affection (Bergen, 2014; Marcum, 2007). The online environment allows a high-speed grooming process from the initial engagement to the offending outcome (ECPAT, 2016; Kripos, 2019; Sunde 2019).

Another common characteristic of grooming behaviour is applying measures to manage risk or avoid detection by attempting to isolate the child, e.g. through asking it to move the interaction over to a more private setting such as Skype (Bergen, 2014), and to keep the interactions a secret from parents and peers (e.g., Briggs et al, 2011; Olson et al 2007; Webster et al., 2012). It is also reported as common that online CSEA offenders, as an attempt to normalize and facilitate online or offline sexual contact, send CSAM to the child (Bergen, 2014; Berson, 2003; Kloess et al., 2017; Marcum, 2007; Wolak et al., 2008). Research suggests that non-sexual conversations also play an important role in the grooming process (Grosskopf, 2010; Webster et al., 2012). There is often a path from the beginning of a solicitation process, referred to by OʼConnell (2003) as the friendship-forming stage, followed by the relationship-forming stage. These do not normally include sexual topics.

2.3 The linguistics of grooming

In addition to research on behavioural patterns, there is a growing body of research on the linguistics of grooming, where also computational methods have gained ground. OʼConnell (2003)ʼs five-staged model of the online grooming process has been the starting point of several research projects exploring the linguistics of grooming. After studying chat logs from grooming conversations, Gupta et al. (2012) found that the ‘relationship formingʼ stage was the most prominent stage of online grooming. Black et al. (2015) examined strategic differences between face-to-face grooming and online grooming and found that stages that would occur later in the process during face-to-face grooming, such as risk assessment, more often would occur early in the process within an online environment. Also, they found that several strategies known from offline offending were involved, but within the online environment, they were employed concurrently and in a more expedient manner.

Authorship Analysis is a branch within forensic linguistics where computational methods have gained ground, and is defined as “the task of inferring characteristics of a documentʼs author, including but not limited to identity, from the textual characteristics itself” (Juola, 2007, p. 120). The sub-domains Authorship Attribution and Author Profiling are particularly relevant here.

Author Profiling is aimed at classifying authors into different groups, such as age, gender, native language, personality traits, or emotions. Central in this development is “the PAN shared tasks” (https://pan.webis.de/), i.e., a series of scientific events and shared tasks on digital text forensics and stylometry, where researchers and practitioners are invited to work on specific problems of interest. Stylometry is the analysis of literary style features that can be statistically quantified, such as sentence length, vocabulary diversity, and frequencies of words and word forms. Since 2013, there have been several PAN competitions aimed at age and gender detection. Gender can be detected with high accuracy also within different languages. The best results were achieved in Portuguese (85.75%), followed by Spanish (80.36%), English (74.29%) and Arabic (68.31%) (Rangel et al., 2017). By combining stylometry features with keystroke dynamics when analysing chat data from Skype, Li et al. (2019) achieved prediction of gender with 72% prediction accuracy.

Others have focused on adults pretending to be children. Meyer (2015) used a feature vector approach on a very small data set and achieved perfect results (100%). However, all of the adults pretending to be children in the data set were police officers, who may have overplayed their role and used misspellings, slang, abbreviations, and emoticons more than real children would have done. Yet, Mayesʼ results are promising in terms of distinguishing a real child from an adult pretending to be a child.

Within the Author Profiling domain, researchers have also aimed at using computational methods to detect the native language of someone communicating in a secondary language (Malmasi et al., 2017). Similar to the PAN competitions, shared tasks for native language identification on different corpora have been arranged, in 2013 (on essays), 2016 (on transcripts of speech), and 2017 (both essays and speech-transcripts). The teams participating in the competition used different methods and algorithms and were able to detect native language with up to 93% accuracy (Malmasi et al., 2017). Others have focused on using computational methods to distinguish between online CSEA offenders and others, based on their conversation features (Pendar, 2007). The classification may result from linguistic features, such as the choice of words. It may also be done by analysing behavioural features, such as how often questions are asked or how the conversations are started (Inches & Crestani, 2012). Recent research by Borj and Bours (2019) aimed at detecting “predatory conversations”, has also yielded promising results. Having applied various classification techniques they succeeded in detecting “predatory conversations” with up to 98% accuracy. However, much research on Author Profiling in the context of grooming and online CSEA is based on datasets with sexual conversations from the USA-based non-governmental organisation Perverted-Justice (http://www.perverted-justice.com). These datasets do not contain conversations with real children. This limitation on the research has some bearing on the development of PrevBOT, and is elaborated further in section 4.3.

Authorship Attribution is another potential technological innovation which may be incorporated in PrevBOT. Authorship Attribution is concerned with analysing writing style as a means of determining the true author of a written text (Locker, 2019; Zhao & Zobel, 2007). The technology is based on the assumption that people use language differently, and that the difference can be observed with a high degree of certainty, similar to physical fingerprints in dactyloscopy (Omar & Deeran, 2019). A “linguistic fingerprint” may thus be obtained, through the process of collecting linguistic data and features that mark a speaker/writer as unique (Olsson, 2008; 2010). Several approaches may be used to substantiate authorship, i.a., stylometric, textual, grammatical, and sociolinguistic approaches (see overview in Koppel et al., 2009).

While a long text may provide an adequate set of features for identifying the author, chat room conversations between a potential offender and a victim, often occur in short messages. Short texts such as tweets, SMS and chat messages have been subject to research (see overview in Swain et al., 2017), with a variety of techniques and algorithms. For example, Ishihara (2017) demonstrated how forensic text comparison could be used on chat conversations of various lengths from 500 to 2500 tokens. Omar and Deraan (2019) found that the inclusion of different variables into an integrated system leads to improved Authorship Attribution performance on short texts. A combination of analysing lexical features and letter-pair frequencies resulted in an accuracy of 76 %.

2.4 Technology based crime prevention

Much of the prevention effort concerning online CSEA has been awareness-raising and training of potential victims, through cyber-safety websites directed towards children or parents, or through school-based ICT safety education (Wurtele & Kenny, 2016). Technology based crime prevention has been and still is, subject to much research and development (see e.g. Dilek et al., 2015; Li et al., 2010; Lin et al., 2017; McClendon & Meghanathan, 2015). There are several examples of technology-based crime prevention initiatives directed toward child sexual abuse material, such as the abovementioned Europol initiative Police2Peer (Europol, 2021), as well as developments specifically aimed at online CSEA offence prevention, such as Sweetie 2.0.

Sweetie 2.0 was developed both as a chatbot and a computer-animated virtual 10-year-old Philippine girl (Terre de Hommes, 2015; van der Hof et al., 2019). Its purpose was to prevent webcam child sex tourism. During the 10 weeks Sweetie was online, the researchers identified one thousand potential offenders from 71 countries. The file was handed over to Europol for distribution to the respective countries, leading to many arrests and convictions (Guyt, 2019).

A recent development, AiBA (Author input Behavioural Analysis) (Aakervik, 2020; Furberg, 2019), applies linguistic and behavioural patterns, such as the use of words and writing patterns, to predict the gender and age of participants of online conversations (see Section 2.3 for an overview of relevant research). The purpose is to warn the child if the AiBA detects that the childʼs conversation partner is an adult posing as a child. The developers aim at implementing AiBA on gaming platforms during 2021 (Aakervik, 2020).

Summarised, the literature review shows that chat rooms are relevant digital spaces for CSEA crime prevention, and that the features age, gender and sexualised speech (“the CSEA risk features”) are relevant for identifying potential CSEA offenders in such spaces. None of the features in isolation would provide a sufficiently accurate prediction, hence they should be applied in combination. While the feature sexualised speech could be identified by a human, predicting true age and gender from a written conversation often requires computational methods.

The technology of PrevBOT is inspired by the Sweetie 2.0 and AiBA concepts, which use automation and machine learning for their predictions. However, PrevBOT diverges in several ways, as to how it should be developed, and where and how it should be put into operation. The concept PrevBOT concept is described in the next section and the technology on which it is based in Section 4.

3. The PrevBOT concept

We envisage PrevBOT as a tool used by police officers to prevent CSEA. PrevBOT is modelled after Sweetie 2.0 (Terre des Hommes, 2015), which can observe open conversations and interact automatically in chat rooms (Schermer et al., 2019). In addition to these capabilities, PrevBOT would be developed with elements of machine learning and forensic linguistics inspired by the AiBA technology (Aakervik, 2020; Furberg, 2019), enabling it to make predictions and computations that improve the policeʼs knowledge base for initiating preventative measures against CSEA.

Figure 1.

PrevBOT / police decision-making process for classification of Problematic Spaces (PS) and Problematic Persons (PP)

3.1 The value of PrevBOTʼs computations and predictions

Based on a machine-learning algorithm, PrevBOT generates predictions and computations about information relevant to CSEA prevention. The output is probability statements which, as such, carry a degree of uncertainty. The PrevBOT decision-making process is concerned with two main objectives: First, to classify Problematic Spaces (see “PS” in Figure 1) and second, to classify Problematic Persons (PP).

The questions directed to PrevBOT in terms of classification of a Problematic Space are:

  1. Is there sexualised speech in the chat room?

  2. Are there children and adults in the chat room?

If both of these questions are confirmed, the chat room is flagged as a Problematic Space, which should be monitored more closely by the police.

When a chatroom is flagged as a Problematic Space, PrevBOT is tasked with predicting whether there are Problematic Persons (PP) in the chat room. The main questions directed to PrevBOT in order to generate this output are:

  1. Among the participants in chat conversations, are there any known CSEA offenders?

For this question, PrevBOT may use “linguistic fingerprints” computed from known CSEA offendersʼ former conversations, to compare with future online conversations. In this way, PrevBOT may provide reliable assessments about the identity of certain participants in a chat room, if the participant is known to the police from previous CSEA cases. If PrevBOT does not classify the person as a known CSEA offender, the next questions are concerned with what we refer to as “CSEA risk features” in Figure 1:

  1. Amongst the ongoing chat conversations, are there any that can be classified as “sexualized speech”?

  1. Among the chat conversations classified as sexualized speech, are there adults chatting with children?

  2. Among the chat conversations classified as sexualized speech, are there adults of a different gender than indicated in the user profile?

“Sexualized speech” is a broad non-legal concept that includes i.a., grooming. The narrower concept of “grooming” is legally defined and concerned with attempts of the offender to get to meet the child for sexual purposes, cf. i.a. the Council of Europe Convention on the Protection of Children against Sexual Exploitation and Sexual Abuse, Article 23 (the so-called “Lanzarote Convention” from 2007). In addition to grooming, “sexualized speech” encompasses lascivious speech, communication about and exchange of sexualized images, etc. Whether or not sexualized speech meets the legal criteria of grooming – which are often complex and hard to fulfill (Sunde, 2020) – it is a risk indicator of CSEA.

PrevBOTʼs predictions may help unmask online participants who look for opportunities to commit CSEA under the guise of false age/gender. The immediate idea we get about age and gender from a glimpse of a person in physical space does not materialize online. How is one to trust that the user profile Lisa 12, is, in fact, in use by a 12-year-old girl, and not, say, by an adult sex offender? PrevBOTʼs predictions may provide the police with information which is undetectable to humans, or difficult to make sense of. The likelihood ratio for identification computed by PrevBOT is an investigative lead, not to be perceived as evidence of identity. The Sweetie-project (Terre de Hommes, 2015) shows that, when confronted with the high frequency of sexualized speech online, the limitations of human cognition make the CSEA problem impossible to deal with without the support of suitable tools. Accordingly, should sexualized speech occur in combination with a false profile concerning age and/or gender, in a public chat room intended for or de facto used primarily by minors, then one may flag a CSEA risk. On this basis, it is possible to flag certain chat rooms as Problematic Spaces, and certain participants as PP (Problematic Persons). From there, further crime prevention or investigation strategies may be deployed, however, their exact nature must be determined in light of concrete circumstances.

The choice of using the expression “Problematic Person” is a reaction to the word “predator”, often used in the technical literature on the CSEA subject. In our text “predator” will only occur in citations including that word. With respect to human dignity, a characterization that reduces individuals to animals is not acceptable (UDHR, 1948, Article 1; specifically relating to law enforcement, cf. CoC, 1979, Article 2). Particularly in the present context, the point is not to be underestimated, as to how to safeguard human dignity is a major concern in strategies regarding artificial intelligence, see, e.g. (European Commission, 2019, p. 19).

3.2 Using PrevBOT online

PrevBOT will operate at the user level, not requiring support or installation on the server level of chat room providers. The police decide which chat rooms PrevBOT should enter and monitor the output. Based on the output, the chat room may be identified by the police as a Problematic Space. It can enter multiple chat rooms simultaneously and generate predictions and computations about whether the chatroom is classified as a Problematic Space where potential PPʼs are present.

PrevBOT can classify someone as a potential PP either when passively “observing” or automatically interacting in open chat conversations, for example with a profile of a 13-year-old girl. PrevBOT is connected to a database with chat conversations from convicted CSEA offenders. Through comparison with the conversations in the chat rooms, PrevBOT may be capable of identifying PPs who have resumed unlawful online activity (“linguistic fingerprinting”).

Passive “observation” involves entering a chatroom, capturing the ongoing conversation, but not interacting. Capturing can be done in several ways, for example by using logging functionality or scraping the text-based conversation. Since the interfaces of chatrooms vary, some customization for the different platforms would be necessary. The technical details and requirements are outside the scope of this paper.

When PrevBOT interacts in a chat, and output indicates that its conversation partner is a potential PP, the chat is “flagged” and the police is alerted. The police officer operating PrevBOT may decide to send a preventative message informing the PP that he or she has chatted with the police and that contacting a child for sexual purposes is illegal. The PP could also receive notice that the police have logged the chat and the IP address, due to the potentially problematic behaviour around children. In addition, as a means for achieving a lasting preventive effect, the police may use the interaction to motivate the PP to seek professional help, and to inform about relevant programs for sex offender treatment, such as the Proteus program (Kripos, 2019). To take account of the risk of messaging an innocent (a false positive), the message can facilitate easy feedback to the police, for example through a feedback link. The link should provide information on how to file a complaint if they feel unfairly targeted, along with contact data to a complaint service in the police.

As further explained in the next section, PrevBOTʼs robot technology and its machine learning algorithm provides information crucial to the initiation of preventative measures against concrete risks of online CSEA. Its capability of enhancing the policeʼs ability to make sound assessments of the presence of PPs, may thus prove to be a targeted and useful tool to this end.

4. The technologies: Robotics, machine learning and Authorship Analysis

In this section, we discuss technical and domain specific conditions for implementing Authorship Analysis in PrevBOT. The aim is a well-functioning tool as already described, adapted to the CSEA problem in the form in which it occurs in a local or national context.

The PrevBOT concept is based on robot technology and automation. By “robot” we mean “a machine controlled by a computer that is used to perform jobs automatically”. The robot does not integrate the software through which it operates but interacts on a human level (clicking buttons, typing text, etc.) (Asquith & Horsman, 2019). The challenge concerns the use of automated Authorship Analysis, i.e. PrevBOTʼs machine learning component. This demands great care and consideration in the phase of developing the tool, and this is the focus of the present section. We acknowledge that after PrevBOT has been put into operation, it must be tested and updated regularly, to ascertain the continous validity of the output, but the details of how this should be conducted will not be explored in further detail in this paper.

4.1 Building a machine-learning system for PrevBOT

The development phase is critical to obtain optimal solutions to the problem one seeks to solve and raises its legal problems as well. This motivates a quite detailed description of the development phase.

The first task is to define the problem and determine if machine learning may solve it (INTERPOL & UNICRI 2019). Specifically, it is necessary to determine whether parts of human decision-making may be quantified and turned into data. Concerning PrevBOT, this was described in Section 2 about Authorship Analysis in a machine learning context.

The development goes through four sub-phases. However, building a machine-learning model is a highly iterative sub-process, and revisiting the steps several times is often necessary to improve performance (Kononenko & Kukar, 2007). First, data sets need to be prepared and cleaned, e.g., by formatting and dealing with missing data (Kononenko & Kukar, 2007). Second, feature engineering is performed, that is, raw data is transformed into relevant, informative, discriminative, and non-redundant features for the PrevBOT algorithm. In this step, domain expertise is crucial (see Section 4.2).

The third step is data modelling, which means developing a model based on a learning algorithm (Kononenko & Kukar, 2007), trained on the data sets prepared for this task (“training data”). The learning algorithm develops “knowledge” by acquiring unique properties based on the training data. Obviously, an algorithm will generate different learning models according to variations in the data input. The generated knowledge is often called “the model” (Kononenko & Kukar, 2007).

Two aspects are particularly important concerning the training data: First, the quality and amount of data influence the learning model (Vestby & Vestby, 2019). A model developed on data sets generated by real-life activity (e.g., real chat logs between offenders and child victims), will be different from a model developed on manufactured data sets mimicking real-life activity (e.g., the data in the study by Meyer (2015), see Section 2.3). Second, the language of the training data plays a role if the model is to be used in a text-based approach (e.g., Sboev et al., 2016). A model trained on text in the English language is thus expected to be different from a model trained on say, one of the Nordic languages. For PrevBOT to be as effective as possible, its model should train on text in the language of the nation planning to use it. In a Nordic context, this means that each of the Nordic countries must create its own unique PrevBOT, developed on chat logs in Norwegian, Swedish, Danish, Finnish, and Icelandic, respectively.

The fourth and final step of building the machine-learning model is performance measurement. Here we measure the modelʼs success rate in relation to the outcome of the tasks assigned to it when applied to new data (Kononenko & Kukar, 2007). For PrevBOT this will entail an assessment of the performance relating to the CSEA risk features outlined in Section 3.1, and capability of recognizing known CSEA offenders, based on chat conversations that were not included in the training data.

When the machine-learning model is ready, the police may use PrevBOT for patrolling online spaces. However, as noted above, continuous assessment of performance is important since no model can be trusted to operate perfectly over time, and a change in online behaviour among CSEA offenders may necessitate updating the model (INTERPOL & UNICRI, 2019; Kaufmann, 2019).

4.2 The importance of domain knowledge

Domain expertise is important when developing a tool to ensure that relevant features of the domain in which it shall be used are taken into account. PrevBOT should therefore be developed on basis of robust knowledge and data reflecting the particular characteristics of online CSEA (Kaufmann, 2017; Vestby & Vestby, 2019). Considering that the Authorship Analysis researchersʼ primary field of expertise is computer science/informatics, it might be a weakness that their assessments relating to CSEA are based upon research from other areas, such as cognitive and social psychology, or criminology. The risk is that their assumptions build on incomplete, uncertain, or biased knowledge.

An example is Authorship Analysis research about fantasy- vs. contact-driven perpetrators (Briggs et al., 2011; Mitchell et al., 2007). It has been suggested that contact- and fantasy-driven offenders differ in several meaningful ways, including demographics, deviant sexual interests, and chat progression (Babchishin et al., 2011; Shannon, 2008). Several Authorship Analysis research projects refer to this knowledge, aiming to develop models that differentiate between potential offenders in these categories (e.g. Ringenberg et al., 2019). However, according to a systematic review of research conducted by Broome et al., (2018), the categorisation does not seem to be valid. Many of the online sex offenders appeared to be flexible in how sexual contact took place, and clear distinctions between the categories in how offenders contact children or build relationships, could not be detected. Neither did they discover clear patterns in how offenders take measures to evade detection or the extent to which they act threateningly towards the children.

The algorithms are perceived as successful when they make classifications with high accuracy. By implementing Authorship Analysis technology trained and assessed according to the abovementioned categories, there is a risk that someone who is categorised as fantasy-driven, might be given lower priority than those categorized as contact-driven, even though the research shows that sex offenders are more flexible than this categorization suggests (Broome et al., 2018). However, if expertise about online CSEA is included in the development phase, there is reason to expect that PrevBOT will engage with relevant and valid indicators.

4.3 The need for relevant datasets

As stated by Borj and Bours (2019), the development in the field of forensic text analysis is related to the availability of relevant data for feature extraction and training the self-learning algorithm. PrevBOT must thus be trained on relevant and valid datasets to be able to predict and compute with high accuracy. As the availability of relevant corpora is very limited outside of police organizations, PrevBOTʼs accuracy hinges on getting access to – in the development phase – conversations secured in criminal investigations of real CSEA cases.

Several of the studies mentioned in Section 2.3 are based on data sets from Perverted-Justice. This non-governmental organisation, which works against CSEA offenders, has publicised online chat logs where PPs have had sexual conversations with individuals whom they thought were children, but in fact, were adults cooperating with Perverted-Justice. For PrevBOT, data sets from Perverted-Justice are thus not considered as sufficient training data, as the data sets do not contain any real conversations with children.

Moreover, as noted in Section 4.1, PrevBOT must be trained on chat logs in the language of the jurisdiction intending to use it. The data sets from Perverted-Justice are in English, not suitable for training a PrevBOT tasked to operate in a non-English language. In a recent study, real chat material seized in Norwegian criminal investigations was applied to develop a machine-learning model for classification of grooming conversations (Bendiksen, 2019). This was also the first study of chat material in the Norwegian language, where the best result had an accuracy of 89%. This study is an exception since real chat logs are rarely subject to research due to the strict confidentiality of the material.

Provided that PrevBOT is trained on real-life data sets in the language of the jurisdiction intending to use it, PrevBOT may develop an error rate realistic to what one may expect when it is implemented online, and knowledge of its limitations also becomes more exact.

To summarise, we find current research promising for the possibility of building a PrevBOT capable of providing useful output relating to the CSEA risk features and identifying known CSEA offenders who appear to be resuming illegal activity. The research also suggests that other features, such as native language, might be a relevant feature to add as well. The research on features deduced from the procedural steps of the grooming process seems more uncertain due to the lack of a solid research base relating to CSEA offenders. More research is needed before considering it implemented into a tool such as PrevBOT. With regards to our first research question, we thus suggest it should be achievable for PrevPOT to classify PPs by predicting linguistic fingerprints or CSEA risk features with an acceptable level of certainty, provided the model is properly developed, trained, and updated.

5. Crime prevention perspectives

The problem of online CSEA is well-documented, widespread and serious (ECPAT, 2018; Europol 2014-2019; Kripos, 2019; Sunde, 2019). In this section, we explore whether PrevBOT may be an effective tool for the prevention of online CSEA through discussing theoretical perspectives from situational crime prevention and Extension theory.

5.1 Situational crime prevention (SCP)

The PrevBOT concept is rooted in SCP perspectives, which focus on the immediate causes of criminal events, usually conceptualised as “opportunity” (Ekblom, 2017). SCP takes offendersʼ motivation for granted, and aims instead at limiting their scope for offending, and influence their perception and decisions regarding criminal action (Ekblom, 2017). Rational choice approaches criminal decision-making (Cornish & Clarke, 1986) and the interest in modus operandi (Cornish, 1994) are thus central in SCP (Ekblom, 2017). Hayward (2007) has criticised SCP for not tackling expressive crimes characterised by anger, hostility, and excitement, but Farrell (2010) argues that this is a misconception of the rational choice perspective of SCP as if limited to a monetary goal. Importantly, also intangible benefits should be included (Farrell, 2010), which in the context of CSEA prevention could be perceived as opportunities for sexual arousal or satisfaction.

In the Rational Choice perspective, an opportunity emerges when the offender perceives risk and effort as low and reward as high (Cornish & Clarke, 1986). These perspectives have been used i.a. to structure 25 generic techniques under the five categories: increase the effort, increase the risks, reduce the rewards, reduce provocations, and reduce excuses (e.g. Clarke & Eck, 2003). Complementing the psychological Rational Choice approach is the Routine Activities perspective (Cohen & Felson, 1979), where a likely offender encounters a suitable target in the absence of capable guardians. The Conjunction of Criminal Opportunity (CCO) (Ekblom, 2010; 2011), seeks to integrate the Rational Choice and Routines Activity approaches (plus others) on the situational and offender side, with a view to form a consistent and all-encompassing conceptual framework and a unified terminology (Ekblom, 2017).

CCO offers a twin perspective both examining the proximal causes of criminal events, and possible interventions against those causes, to reduce the eventsʼ likelihood and/or harm.

Under CCO, a criminal event happens when an offender who is predisposed, ready and equipped to offend (and lacking the resources to avoid offending) encounters, seeks or creates a situation containing a target that is valuable, attractive or provocative, in an enclosure and/or wider environment that is tactically insecure and perhaps motivating in some way, facilitated by the absence of ready and able preventers and perhaps too by the presence of deliberate or inadvertent promoters. When these preconditions are perceived to be met the offender decides to proceed. When they are blocked, weakened or diverted by security intervention, the offender either cannot so act, or decides that the perceived reward is not worth the effort and risk (Ekblom, 2017, p. 355).

5.2 The role of technology

CCO provides valuable perspectives to understanding the role of technology in SCP (Ekblom, 2010; 2011), i.a. by including offendersʼ resources for committing crime. This includes the offendersʼ technology resources, such as a fast internet connection, storage capacity, webcam, and software for recording video and still footage, voice modifier, etc., all of which may enable and support the offenders in committing online CSEA. In SCP “opportunity” is typically considered as an attribute of the situation, while in CCO, it is an ecological interaction between situation and offender, often technology-mediated. It relates to “how agents encounter, seek or create a set of circumstances in which their resources enable them to cope with the hazards and exploit the possibilities in order to achieve their multiple goals” (Ekblom, 2017 p. 360).

The opportunity concept within CCO seems to be quite similar to the term “affordances”, which has been used by several scholars concerning SCP (e. g. Garwood, 2011; Quayle 2020; Taylor & Quayle, 2006). Wortley (2012) explains affordances as the uses to which an environment, or an object within an environment, could be put in order to allow an individual to perform certain behaviours. Systematic and detailed mapping of the relationship between opportunity and technology will thus be important both for understanding the causes of crime – in our case online CSEA – and for developing effective prevention strategies.

5.3 The criminogenic qualities of public chat rooms

According to Ekblom (2017), the target “enclosure” must be identified, which is where the targets of crime are contained. Online public chat rooms may be regarded as such enclosures in so far as they are designed for or frequented by children. Such chatrooms constitute affordances or opportunities (from a CCO perspective) for those who have a sexual interest in children. An open, unregulated, and/or not actively managed chatroom may have what is referred to as criminogenic qualities (e.g. Cornish & Clarke, 2003; Quayle, 2020) which can lead to crime, due to the opportunities they entail for a potential offender with a sexual interest in children. PrevBOT is aimed at assisting the police in identifying these target enclosures and classifying them as problematic spaces that should be more closely monitored due to the CSEA risk factors.

Criminogenic qualities of information systems are characterised by elements referred to by the acronym SCAREM (Newman & Clarke, 2003). Exemplified in the setting where PrevBOT may be used, the SCAREM elements are Stealth (potential offenders may seek information about the child for grooming purposes, while being invisible to the child or guardians); Challenge (a potential CSEA offender may see the grooming process as a challenge to frame the child); Anonymity (online environments are inherently anonymous); Reconnaissance (the internet erases space obstacles of traditional environments, and makes it possible to move fast between online spaces); Escape (the stealth and anonymity aspects of the internet makes it easy to avoid detection); and, Multiplicity (the potential CSEA offender may approach and commit crimes in multiple spaces and to multiple children simultaneously).

5.4 A problem-oriented approach

SCP methods are usually selected, implemented, and evaluated through a problem-oriented approach (e.g. Bullock et al., 2006; Clarke & Eck 2003; Goldstein, 1990). Here, the problem is CSEA, and the strategy to counter the problem is threefold. First, PrevBOT will assist the police in identifying online spaces which provide opportunities for online CSEA to occur (Problematic Spaces where PPs are present). Second, to interrupt the anonymity of the potential CSEA offenders (PPs) present in the Problematic Spaces. Third, to use various means to impact the decision-making of the potential CSEA offender, through notification of detection (e.g. your IP address is logged by the police) and by offering help to stop contacting children for sexual purposes.

The aim of PrevBOT is opportunity reduction, having as its main conceptual peg increasing the risk run by potential CSEA offenders, through extending guardianship to online target enclosures and strengthening the policeʼs surveillance by using technology to detect potential CSEA offenders. As explained, this is achieved by automatic detection of CSEA risk features (see Section 3.1). By detecting former CSEA offenders in digital target enclosures, PrevBOT also reduces anonymity. It can also be argued that interrupting the grooming process may be a means to reduce the rewards for a potential CSEA offender, and that the police obtaining information about problematic or possibly unlawful behaviour, is a measure to remove excuses. Although personalised intervention programs have only shown equivocal evidence of effect (Quayle, 2020), the police have an ethical obligation to offer a sexual offender who wants help to stop offending information about the programs available. For perspective, this has been done, i.a. in the Europol initiative Police2Peer (Europol, 2021) as a measure to reduce frustration or emotional arousal.

5.5 Extension theory

While technology may increase the opportunity for offending, it may conversely extend the ability to perform opportunity reduction of CSEA, through technical activities implemented in crime prevention. Here it is relevant to draw on Extension theory, which states that a technical activity is an activity that harnesses the intrinsic causal powers of material artefacts in order to extend human capabilities (Lawson, 2010). Together, technological extensions of human capabilities and technologically modified situations in which those capabilities are exercised, engender many opportunities both for crime and prevention (Ekblom, 2017).

Technical activities performed by CSEA offenders are aided by software and hardware-based resources, enabling them to exploit misplaced trust – sometimes to an expert degree. Conversely, being an effective crime preventer means being equipped with appropriate applications and systems. As noted, increased presence could be done without PrevBOT, by engaging more police officers to patrol chatrooms. PrevBOT however extends the ability of presence into more spaces than otherwise possible, allowing the entering and monitoring of more chatrooms per police officer. PrevBOT also extends the ability to recognize features of known CSEA offenders in chatrooms. Without technology – the ability to detect known offenders would be very limited, due to the constraints of human cognitive capacity in terms of memory and information processing. Detection of the CSEA risk indicators and the linguistic fingerprints is for the same reasons impossible for a human to perform at all, let alone effectively. Implementing PrevBOT as a tool to assist online policing of chatrooms may thus be perceived as a technical activity for effective crime prevention.

We emphasize that the aim is not to discern unknown patterns based on big data analytics, nor to pursue theoretical assumptions about criminal behaviour, such as deviance (Kaufmann et al., 2018). PrevBOT addresses behaviour that appears normal per se, but indicates risk of CSEA due to its form (sexualized speech etc.), the use of false attributes (age/gender), recurring activity, or indications that the PP is a former CSEA offender.

While the extension perspective is optimistic, one must also acknowledge that the use of technology entails interaction effects between technology and other social and physical circumstances. This may generate unforeseen consequences, perhaps even neutralising the benefits (Ekblom, 2017; Tenner, 1996). It is therefore also for this reason, necessary to monitor any new technology-mediated crime prevention measure closely to possibly identify unwanted effects.

6. Summary

Concerning our overarching objective – identifying the multifarious conditions which must be dealt with to develop and deploy PrevBOT in a sound, fair and lawful manner – we consider the PrevBOT concept to be well justified also from a criminological perspective, enriched with extension theory. To summarise, we suggest that using PrevBOT in crime prevention is a technical activity that may extend the capability of the police to prevent CSEA in public chat rooms more precisely and effectively. It may thus be a useful tool for online CSEA prevention.

In Part I, we have now laid the conceptual cornerstones of PrevBOT in terms of phenomenological, technological, and criminological analysis. In Part II (Sunde & Sunde, 2021) we turn to the second research question, which requires a legal analysis and discussion.

We are grateful to the anonymous reviewers whose helpful comments, suggestions, and constructive criticism, have improved the content and sharpened the focus of this article. We also want to thank members of the PIDS research group (Policing in a Digitized Society) at the Norwegian Police University College for useful comments on an earlier version of this article.

International legal instruments


  • Aakervik, A-L. (2020, 2 November). AIBA – Avslører cybergrooming. https://ntnudiscovery.no/aiba-avslorer-cybergrooming/.
  • Aanerød, L. M. T., & Mossige, S. (2018). Nettovergrep mot barn i Norge 2015–2017. NOVA Rapport 10/18. http://www.hioa.no/content/download/148980/4145940/file/Nettutg-NOVA-Rapport-Nettovergrep-10-2018.pdf
  • Asquith, A., & Horsman, G. (2019). Let the robots do it! – Taking a look at Robotic Process Automation and its potential application in digital forensics. Forensic Science International: Reports, 1, 100007. https://doi.org/10.1016/j.fsir.2019.100007.
  • Bendiksen, J. (2019). Automated detection of perpetrators in grooming conversations in Norwegian (Masterʼs thesis). Norwegian University of Science and Technology.
  • Berson, I. R. (2003). Grooming cybervictims. The psychosocial effects of online exploitation for youth. Journal of School Violence, 2(1), 5–18. https://doi.org/10.1300/J202v02n01_02.
  • Borj, P. R., & Bours, P. (2019, October). Predatory conversation detection. In 2019 International Conference on Cyber Security for Emerging Technologies (CSET) (pp. 1–6). IEEE. https://doi.org/10.1109/cset.2019.8904885.
  • Briggs, P., Simon, W. T., & Simonsen, S. (2011). An exploratory study of Internet-initiated sexual offences and the chat room sex offender: Has the Internet enabled a new typology of sex offender? Sexual Abuse: A Journal of Research and Treatment, 23(1), 72–91. https://doi.org/10.1177/1079063210384275
  • Broadhurst, R. (2020). Child sex abuse images and exploitation materials. In R. Leukfeldt & T. J. Holt (Eds.). The human factor of cybercrime (pp. 310-336). Routledge.
  • Broome, L. J., Izura, C. & Lorenzo-Dus, N. (2018). A systematic review of fantasy driven vs. contact driven internet-initiated sexual offences: discrete or overlapping typologies? Child Abuse & Neglect, 79, 434–444. https://doi.org/10.1016/j.chiabu.2018.02.021.
  • Bullock, K., Erol, R., & Tilley, N. (2006). Problem-oriented policing and partnerships: Implementing an evidence-based approach to crime reduction. Taylor & Francis.
  • Clarke, R. V. & Eck, J. (2003). Become a problem-solving crime analyst. Jill Dando Institute of Crime Science, University College London.
  • Cohen, L. E. & Felson, M. (1979). Social change and crime rate trends: A routine activity approach. American Sociological Review, 588–608.
  • Cooper, A. (1998). Sexuality and the Internet: Surfing into the new millennium. CyberPsychology & Behavior, 1(2), 187–193.
  • Cornish, D. B. (1994). The procedural analysis of offending and its relevance for situational prevention. Crime Prevention Studies, 3, 151–196.
  • Cornish, D.B. & Clarke, R.V. (2003). Opportunities, precipitators and criminal decisions: A reply to
  • Wortleyʼs critique of situational crime prevention. In M. J. Smith & D. B. Cornish (Eds.) Theory for practice in situational crime prevention, Vol. 16. (pp. 41-96). Criminal Justice Press
  • Cornish, D. B. & Clarke, R. V. (Eds.). (1986). The reasoning criminal: Rational choice perspectives on offending. Springer-Verlag.
  • Council of the European Union (2017). Council conclusions on setting the EUʼs priorities for the fight against organized and serious international crime between 2018 and 2021. https://data.consilium.europa.eu/doc/document/ST-9450-2017-INIT/en/pdf (
  • Craven, S., Brown, S., & Gilchrist, E. (2006). Sexual grooming of children: Review of literature and theoretical considerations. Journal of Sexual Aggression, 12, 287–299. https://doi:10.1080/13552600601069414.
  • Dilek, S., Cakır, H. & Aydın, M. (2015). Applications of artificial intelligence techniques to combating cyber crimes: A review. IJAIA, 6(1), 21–39.
  • Dowdell, E. B., Burgess, A. W. & Flores, J. R. (2011). Online social networking patterns among adolescents, young adults, and sexual offenders. American Journal of Nursing, 111(7), 28–36.
  • ECPAT (2016). Terminology guidelines for the protection of children from sexual exploitation and sexual abuse. February 28, 2016. Luxembourg: ECPAT International.
  • ECPAT (2018). Trends in online child sexual abuse material. Bangkok, April 2018: ECPAT International.
  • ECPAT (2020). Online child sexual exploitation. https://www.ecpat.org/what-we-do/online-child-sexual-exploitation/
  • Ekblom, P. (2010). The conjunction of criminal opportunity theory. In B. S. Fischer & S. P. Lab (Eds.), Encyclopedia of victimology and crime prevention, 1, (pp. 139–146). Sage.
  • Ekblom, P. (2011). Crime prevention, security and community safety using the 5Is framework. Palgrave Macmillan.
  • Ekblom, P. (2017). Crime, situational prevention and technology. In T. McGuire & T. J. Holt (Eds.). The Routledge handbook of technology, crime and justice (pp. 353–374). Routledge.
  • Ekblom, P. & Tilley, N. (2000). Going equipped. Criminology, situational crime prevention and the resourceful offender. British Journal of Criminology, 40(3), 376–398.
  • European Commission (2019). Ethics guidelines for trustworthy AI. April 8, 2019. Brussels: European Commission.
  • Europol (2021). Police2peer. Targeting file sharing of child sexual abuse material. https://www.europol.europa.eu/partners-agreements/police2peer
  • Europol (2014) – (2019). Internet organized crime threat assessment (iOCTA). Annual report. Europol/EC3.
  • Farrell, G. (2010). Situational crime prevention and its discontents: Rational choice and harm reduction versus ‘cultural criminologyʼ. Social Policy & Administration, 44(1), 40-66.
  • Ferreira, F., Martins, P. & Gonçalves, R. (2011, June). Online sexual grooming: a cross-cultural perspective on online child grooming victimization. Presentation during the 20th World Congress for Sexual Health, Glasgow, United Kingdom. http://hdl.handle.net/1822/16540.
  • Finkelhor, D., Mitchell, K. J. & Wolak, J. (2000). Online victimization: A report on the nationʼs youth. Alexandria. National Center for Missing & Exploited Children. http://www.unh.edu/ccrc/pdf/jvq/CV38.pdf.
  • Furberg, K. (2019, 29 October). NTNU-teknologi kan avsløre overgripere på nett. https://www.universitetsavisa.no/nyheter/ntnu-teknologi-kan-avslore-overgripere-pa-nett/116629.
  • Garwood, J. (2011). A quasi-experimental investigation of self-reported offending and perception of criminal opportunity in undergraduate students. Security Journal, 24(1), 37–51.
  • Gill, M. (2005). Reducing the capacity to offend: Restricting resources for offending. In N. Tilley (Ed.), Handbook of crime prevention and community safety, (pp. 306-328). Willan.
  • Goldstein, H. (1990). Problem-oriented policing. Temple University Press.
  • Grosskopf, A. (2010). Online interactions involving suspected paedophiles who engage male children. Trends and Issues in Crime and Criminal Justice, 403, 1–6.
  • Guyt, H. (2019). Foreword. In S. van der Hof, I. Georgieva, B. Schermer, & B. J. Koops (Eds.), Sweetie 2.0. Information Technology and Law Series, vol 31 (pp. vii-ix). Springer. https://doi.org/10.1007/978-94-6265-288-0_1.
  • Hayward, K. (2007). Situational crime prevention and its discontents: Rational choice theory versus the ‘culture of nowʼ. Social Policy & Administration, 41(3), 232–250.
  • Inches, G. & Crestani, F. (2012, September). Overview of the International Sexual Predator Identification Competition at PAN-2012. In CLEF (Online working notes/labs/workshop) (Vol. 30).
  • INTERPOL (2017, Januar 9). INTERPOL network identifies 10,000 child sexual abuse victims. https://www.interpol.int/News-and-Events/News/2017/INTERPOL-network-identifies-10-000-child-sexual-abuse-victims
  • INTERPOL & UNICRI (2019). Artificial intelligence and robotics for law enforcement.
  • Ishihara, S. (2017). Strength of linguistic text evidence: A fused forensic text comparison system. Forensic Science International 278 (2017) 184–197. https://doi.org/10.1016/j.forsciint.2017.06.040.
  • Juola, P. (2007). Future trends in authorship attribution. In P. Craiger & S. Shenoi (Eds.), Advances in digital forensics III (pp. 119–132). Springer. https://doi.org/10.1007/978-0-387-73742-3_8.
  • Kaufmann, M. (2017). The co-construction of crime predictions: Dynamics between digital data, software and human beings. In N. R. Fyfe, H. O. I. Gundhus, K. V. Rønn, & N. Fyfe. Moral issues in intelligence-led policing, (pp. 143–160). Routledge. https://doi.org/10.4324/9781315231259-8.
  • Kaufmann, M. (2019). Who connects the dots? Agents and agency in predictive policing. In Hoijtink, M. & Leese, M. (Eds.) Technology and agency in international relations, (pp. 141-163). Routledge.
  • Kaufmann, M., Egbert, S. & Leese, M. (2018). Predictive policing and the politics of patterns. British Journal of Criminology. 59(3), 674-692. https://doi.org/10.1093/bjc/azy060.
  • Kloess, J. A., Seymore-Smith, S., Long, M. L., Shipley, D. & Beech, A. R. (2017). A qualitative analysis of offendersʼ modus operandi in sexually exploitative interactions with children online. Sexual Abuse, 29(6), 563–591. https://doi.org/10.1177/1079063215612442.
  • Kloppen, K., Haugland, S., Svedin, C. G., Mæhle, M. & Breivik, K. (2016). Prevalence of child sexual abuse in the Nordic countries: A literature review, Journal of Child Sexual Abuse 25(1), 37-55. https://doi.org/10.1080/10538712.2015.1108944.
  • Kononenko, I. & Kukar, M. (2007). Machine learning and data mining. Horwood Publishing.
  • Koppel, M., Schler, J. & Argamon, S. (2009). Computational methods in authorship attribution. Journal of the American Society for information Science and Technology, 60(1), 9–26. https://doi.org/10.1002/asi.20961.
  • Kripos (2019). Seksuell utnyttelse av barn og unge på internett. [NCIS (2019) Online sexual exploitation of children and young people]. Kripos.
  • Krueger, R. B., Kaplan, M. S., & First, M. B. (2009). Sexual and other Axis I diagnoses of 60 males arrested for crimes against children involving the Internet. CNS Spectrums, 14, 623–631. http://www.cnsspectrums.com/aspx/articledetail.aspx?articleid=2568.
  • Lanning, K. V. (2010). Child molesters: A behavioral analysis. For professionals investigating the sexual exploitation of children (5th ed.). National Center for
  • Missing and Exploited Children. (Report no NC70). http://www.missingkids.com/missingkids/servlet/ProxySearchServlet?keys=lanning.
  • Leander, L., Christianson, S. Å. & Granhag, P. Ä. (2008). Internet-initiated sexual abuse: Adolescent victimsʼ report about on- and off-line sexual activities. Applied Cognitive Psychology, 22, 1260-1274. https://doi.org/10.1002/acp.1433.
  • Li, G., Borj, P. R., Bergeron, L. & Bours, P. (2019, May). Exploring keystroke dynamics and stylometry features for gender prediction on chat data. In 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (pp. 1049–1054). IEEE. https://doi.org/10.23919/mipro.2019.8756740.
  • Li, S.-T., Kuo, S.-C. & Tsai, F.-C. (2010). An intelligent decision-support model using FSOM and rule extraction for crime prevention. Expert Systems with Applications, 37(10), 7108–7119. https://doi.org/10.1016/j.eswa.2010.03.004.
  • Lin, Y. L., Chen, T. Y. & Yu, L. C. (2017). Using machine learning to assist crime prevention. Presented at the 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) (pp. 1029–1030).
  • Locker, A. (2019). Because the computer said so!. Journal of Language Works – Sprogvidenskabeligt Studentertidsskrift, 4(1), 23–37.
  • Malesky, L. A. (2007). Predatory online behavior: Modus operandi of convicted sex offenders in identifying potential victims and contacting minors over the Internet. Journal of Child Sexual Abuse, 16, 23–32. https://doi.org/10.1300/J070v16n02_02.
  • Malmasi, S., Evanini, K., Cahill, A., Tetreault, J., Pugh, R., Hamill, C., Napolitano, D. & Qian, Y. (2017, September). A report on the 2017 native language identification shared task. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 62–75). https://doi.org/10.18653/v1/w17-5007.
  • Marcum, C. (2007). Interpreting the intentions of Internet predators: An examination of online predatory behavior. Journal of Child Sexual Abuse, 16, 99–114. https://doi.org/10.1300/J070v16n04_06.
  • McClendon, L. & Meghanathan, N. (2015). Using machine learning algorithms to analyze crime data. MLAIJ, 2(1), 1–12. https://doi.org/10.5121/mlaij.2015.2101.
  • Meyer, M. (2015). Machine learning to detect online grooming, Uppsala University Masterʼs Thesis, http://uu.diva-portal.org/smash/get/diva2:846981/ FULLTEXT01.pdf.
  • Mitchell, K. J., Finkelhor, D. & Wolak, J. (2007). Online requests for sexual pictures from youth: Risk factors and incident characteristics. Journal of Adolescent Health, 41, 196–203. https://doi.org/10.1016/j.jadohealth.2007.03.013.
  • Mitchell, K. J., Finkelhor, D., Jones, L. M. & Wolak, J. (2010a). Use of social networking sites in online sex crimes against minors: An examination of national incidence and means of utilization. Journal of Adolescent Health, 47(2), 183–190. https://doi.org/10.1016/j.jadohealth.2010.01.007.
  • Mitchell, K. J., Finkelhor, D., Jones, L. M. & Woalk, J. (2010b). Growth and change in undercover online child exploitation investigations. Policing & Society, 20(4), 2000–2006. https://doi.org/10.1080/10439463.2010.523113.
  • Newman, G. & Clarke, R. V. (2003). Superhighway robbery: Crime prevention and e-commerce. Willan Publishing.
  • OʼConnell, R. (2003). A typology of child cybersexploitation and online grooming practices. http://www.jisc.ac.uk/ uploaded_documents/lis_PaperJPrice.pdf.
  • Olson, L. N., Daggs, J. L., Ellevold, B. L. & Rogers, T. K. K. (2007). Entrapping the innocent: Towards a theory of child sexual predatorsʼ luring communication. Communication and Theory, 17(3), 23–251. https://doi.org/10.1111/j.1468-2885.2007.00294.x.
  • Olsson, J. (2008). Forensic linguistics: An introduction to language, crime and the law. Bloomsbury Publishing.
  • Olsson, J. (2010). Word crime: Solving crime through forensic linguistics. Continuum International Publishing Group.
  • Omar, A. & Deraan, A. B. (2019). Towards a linguistic stylometric model for the authorship detection in cybercrime investigations. International Journal of English Linguistics, 9(5). https://doi.org/10.5539/ijel.v9n5p182.
  • Pendar, N. (2007, September). Toward spotting the pedophile: Telling victim from predator in text chats. In International Conference on Semantic Computing (ICSC 2007) (pp. 235–241). IEEE. https://doi.org/10.1109/icsc.2007.32.
  • Quayle, E. (2020, September). Prevention, disruption and deterrence of online child sexual exploitation and abuse. In ERA Forum (pp. 1–19). Springer.
  • Quayle, E., Allegro, S., Hutton, L., Sheath, M. & Lööf, L. (2012). Online behaviour related to child sexual abuse. Creating a private space in which to offend – interviews with online child sex offenders. Council of the Baltic Sea States, Stockholm: ROBERT project. http://www.childcentre.info/robert/
  • Rangel, F., Rosso, P., Potthast, M. & Stein, B. (2017). Overview of the 5th author profiling task at PAN 2017: Gender and language variety identification in Twitter. Working Notes Papers of the CLEF, 1613–0073.
  • Ringenberg, T., Misra, K., Seigfried-Spellar, K. C. & Rayz, J. T. (2019, February). Exploring automatic identification of fantasy-driven and contact-driven sexual solicitors. In 2019 Third IEEE International Conference on Robotic Computing (IRC) (pp. 532–537). IEEE. https://doi.org/10.1109/irc.2019.00110.
  • Sboev, A., Litvinova, T., Gudovskikh, D., Rybka, R. & Moloshnikov, I. (2016). Machine learning models of text categorization by author gender using topic-independent features. Procedia Computer Science, 101, 135–142. https://doi.org/10.1016/j.procs.2016.11.017.
  • Schermer, B.W., Georgieva, I., van der Hof, S. & Koops, B. J. (2019). Legal Aspects of Sweetie 2.0. In S. van der Hof, I. Georgieva, B. Schermer, & B. J. Koops (Eds.), Sweetie 2.0. Information Technology and Law Series, vol 31 (pp. 1-94). Springer. https://doi.org/10.1007/978-94-6265-288-0_1.
  • Seto, M. C. (2013). Internet sex offenders. Washington, American Psychological Association. https://doi.org/10.1037/14191-000.
  • Seto, M. C., Wood, J. M., Babchishin, K. M. & Flynn, S. (2012). Online solicitation offenders are different from child pornography offenders and lower risk contact sexual offenders. Law & Human Behavior, 36(4), 320–330. https://doi.org/10.1037/h0093925.
  • Shannon, D. (2008). Online sexual grooming in Sweden—Online and offline sex offences against children as described in Swedish police data. Journal of Scandinavian Studies in Criminology and Crime Prevention, 9(2), 160–180. https://doi.org/10.1080/14043850802450120.
  • Shelton, J., Eakin, J., Hoffer, T., Muirhead, Y. & Owens, J. (2016). Online child sexual exploitation: An investigative analysis of offender characteristics and offending behavior. Aggression and Violent Behavior, 30, 15–23. https://doi.org/10.1016/j.avb.2016.07.002.
  • Sunde, I. M. (2019). Sweetie, et politibarn eller en politistyrke på internett? [Sweetie, a police child or a police force on the internet?] In I. M. Sunde & N. Sunde (Eds.) Det digitale er et hurtigtog! Vitenskapelige perspektiver på politiarbeid, digitalisering og teknologi [The digital is a high speed train! Scientific perspectives on police work, digitization and technology] (pp. 177-205). Fagbokforlaget.
  • Sunde, I. M. (2020). Fra grooming til seksuell utpressing: Behov for mer effektivt vern mot nettovergrep [From grooming to sextortion – The need for strengthening the protection against online CSEA]. Tidsskrift for Strafferett, 2020/2. https://doi.org/10.18261/issn.0809-9537-2020-02-01
  • Sunde, I. M. & Sunde, N. (2021). Conceptualizing an AI-based police robot for preventing online child sexual exploitation and abuse: Part 2 – Legal analysis of PrevBOT (submitted to journal).
  • Swain, S., Mishra, G. & Sindhu, C. (2017, April). Recent approaches on authorship attribution techniques—An overview. In 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA) (Vol. 1, pp. 557–566). IEEE. https://doi.org/10.1109/iceca.2017.8203599.
  • Taylor, M. & Quayle, E. (2006). The Internet and abuse images of children: Search, pre-criminal situations and opportunity. Crime Prevention Studies, 19, 169.
  • Tenner, E. (1996). Why things bite back: Technology and the revenge of unintended consequences. Vintage.
  • Terre de Hommes. (2015, 26 January). Sweetie 2.0 Chat robots. https://www.terredeshommes.org/sweetie-2-0-chat-robots/
  • van der Hof, S., Georgieva I., Schermer B. & Koops B. J. (Eds.). (2019). Sweetie 2.0. Information Technology and Law Series, vol 31. Springer.
  • Vestby, A. & Vestby, J. (2019). Machine learning and the police: Asking the right questions. Policing: A Journal of Policy and Practice, 15(1), 44-58. https://doi.org/10.1093/police/paz035.
  • Webster, S., Davidson, J., Bifulco, A., Gottschalk, P., Caretti V., Pham, T., Grove-Hills, J., Turley, C., Tompkins, C., Ciulla, S., Milazzo, V., Schimmenti, A. & Craparo, G. (2012). European online grooming project (Final report). http://www.europeanonlinegroomingproject.com/wp-content/fileuploads/European-Online-Grooming-Project-Final-Report.pdf.
  • Whitty, M. (2002). Liar, liar! An examination of how open, supportive and honest
  • people are in chat rooms. Computers in Human Behavior, 18(4), 343–352. https://doi.org/10.1016/S0747-5632(01)00059-0.
  • Wolak, J., Finkelhor, D. & Mitchell, K. J. (2004). Internet-initiated sex crimes against minors: Implications for prevention based on findings from a national study. Journal of Adolescent Health, 35, 424.e11–424.e20. https://doi.org/10.1016/j.jadohealth.2004.05.006.
  • Wolak, J., Mitchell, K. J. & Finkelhor, D. (2006). Online victimization of youth: Five years later. National Center for Missing & Exploited Children. USA. http://www.unh.edu/ccrc/pdf/CV138.pdf.
  • Wortley, R. (2012). Affordance and situational crime prevention: Implications for counter terrorism. In M. Taylor & P. Currie (Eds.). Terrorism and affordance: New directions in terrorism studies (pp. 17–32). Continuum. http://dx.doi.org/10.5040/9781501301155.ch-002.
  • Yar, M. & Steinmetz, K. F. (2019). Cybercrime and society. Third edition. Sage.
  • Zhao, Y. & Zobel, J. (2007, January). Searching with style: Authorship attribution in classic literature. In Proceedings of the thirtieth Australasian conference on Computer Science – Volume 62 (pp. 59–68). Australian Computer Society, Inc.
  • CoC (1979): Code of Conduct for Law Enforcement Officials, adopted by UN General Assembly resolution 34/169, 17 December, 1979.
  • LC (2007):Convention on the Protection of Children against Sexual Exploitation and Sexual Abuse, Council of Europe, October 25, 2007 (CETS 201) – the Lanzarote Convention.
  • UDHR (1948): Universal Declaration of Human Rights, adopted by the UN General Assembly 10 December, 1948.
Copyright © 2021 Author(s)

CC BY-NC 4.0