Detecting Fraud through Textual Analysis

BAN439 Detecting Fraud through Textual Analysis

Autumn 2022

  • Topics

    In recent years we have observed many data leaks such as the Luxleaks, the Panama papers, the Paradise papers, the Pandora papers, etc. These leaks represent enormous quantities of data. However, most of this data is in textual form, making it difficult to extract relevant information for the average analyst. In this course we want to teach skills relevant to reviewing text data with the purpose of uncovering evidence for illegal activity.

    We will focus on how to obtain raw data through web scraping. Then, we will format the raw data into a final dataset that can be used to answer relevant real-world questions. We will analyze data from famous cases such as Enron and Pandora papers.

    This course could be useful for:

    • Students who are interested in obtaining advance knowledge of the application in textual analysis
    • Fraud analysts who want to find evidence for links between different users
    • Journalists that are interested in obtaining skills in finding the story through the large data source

    This course is an extension to:

    BAN 432 Applied Textual Data Analysis for Business and Finance

    BUS 465 Detecting Corporate Crime

  • Learning outcome

    KNOWLEDGE - The candidate will…

    • know how to apply tools for obtaining relevant information from textual data

    SKILLS - The candidate will be able to…

    • employ different techniques in order to obtain textual data, e.g. web scraping
    • prepare textual data for analysis by pre-processing it
    • apply appropriate tools from Natural Language Processing with the aim of identifying corporate crime
    • write an on-point report on the findings

    COMPETENCE - The candidate will be able to...

    • investigate fraud using textual analysis
    • present evidence for fraud
    • understand the uses and limits of detection strategies
    • discern reliable information for building a case in the process of investigation

  • Teaching

    In this course, lectures are combined with applied examples in R. While central concepts of textual analysis and crime detection are presented in the lectures, the practical part will focus on their implementation in R.

  • Recommended prerequisites

    Previous knowledge of R.

    Since this course is at the intersection between the courses BAN432 and BUS465, it is not absolutely necessary to have knowledge of both areas (Textual analysis and fraud detection). For the group work, however, it is assumed that the knowledge from both areas is available with different members in a group.

  • Compulsory Activity


  • Assessment

    Group project, with group size 2-4 people. The group project is developed during the course and submitted one week afterwards.

  • Grading Scale


  • Computer tools

    R, R Studio

  • Literature

    Literature will be made available on Canvas


ECTS Credits
Teaching language

Autumn. Offered Autumn 2022 (first time).