FIE453 Big Data with Applications to Finance
Businesses have entered the age of Big Data. Now that computers and the Internet have become so central to modern commerce, businesses are awash with large amounts of data about their customers. Electronic trading has made handling large amounts of data central to financial firms. Big Data has created both challenges and opportunities. The biggest opportunity is to extract useful information from the masses of data using statistics and machine learning. Businesses that can extract such information and act on it automatically have a substantial competitive advantage. Some of the world's most successful companies, such as Google and Facebook, have built themselves around Big Data. Banks and insurance companies also have increased the use of big data and predictive analytics over the last years to accelerate digitalisation.
This course will focus on the applications of predictive analytics and Big Data in finance (as well as other industries where applicable). Examples of such applications include:
- Credit scoring: Determining which loans are likely to go bad
- Anti-money laundering: Identifying unlawful transactions
- Customer analytics: Various propensity models to model e.g. the likelihood of accepting an offer and the probability of churning
- Pricing models: Pricing in non-life insurance and the estimation of risk premium
This course will take students to the frontier of big data analysis in finance. Students will learn how to:
- Work with large datasets
- Handle real-world data issues to enable analysis
- Analyse data using the current state-of-the art algorithms for supervised and unsupervised learning
- Understand and avoid overfitting
- Apply what they have learned to real data and business problems
- Students will learn how to analyze large datasets using both supervised and unsupervised machine learning techniques
- Students will get a better understanding of regulation such as PSD II and GDPR that impacts the use of big data and predictive analytics in the finance
- Introduction to statistical computing. Students will learn the basics of how to use computers to analyze data in R
- Working with data. Data is rarely found in perfectly usable form. Students will learn how to structure and clean data to make it usable in machine learning algorithms
- Handling datasets that are too large to fit on your computer’s RAM.
- Regression and classification. Regression and classification are two basic supervised machine learning problems, and linear and logistic regression are two classical techniques for these problems
- Fitting and overfitting. Big Data allows fitting very flexible models, which permits learning subtle features of the data. This creates the danger of overfitting, where the model fails out-of-sample. Controlling for overfitting is one of the central tasks when developing predictive models.
- Supervised machine learning techniques. Techniques for regression and classification have become very advanced. We cover the leading techniques, such as decisions trees and boosting.
- Unsupervised learning techniques, with applications in e.g. anomaly detection.
- Structure and analyse data in R
- Machine learning
Teaching will be carried out using:
- Sessions where students will practice techniques on their laptop (programming in R)
The class will conclude with a final project using real data where the students develop machine learning models in R. Project deliverables include both a final report as well as a presentation.
Previous knowledge of linear regression is recommended. Some background in finance or experience with R helpful.
Requirements for course approval
Each student has to submit an assignment during the course to obtain course approval. This assignment will be graded approved or not-approved. Approval is a prerequisite to deliver the final project.
The final grade is based on a group report (60 % weight) and a following group presentation (40% weight). All material must be in English. Group size must be approved by the lecturer.
The grade awarded may not be appealed due to the nature of the assessment.
Grading scale A-F.
Laptop with working installation of R.
Gareth James, Daniella Witten, Trevor Hastie, Robert Tibshirani (2013). An Introduction to Statistical Learning. Springer Texts in Statistics.
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer
Martin Schmalz, Uri Bram (2019). The Business of Big Data: How to Create Lasting Value in the Age of AI. Capara Books.
The two first books are freely available electronically. The latter is available on Amazon.
The first book is compulsory, while the two latter are recommended literature in the course.
- ECTS Credits
- Teaching language
Autumn. Offered Autumn 2020.
Please note: Due to the present corona situation, please expect parts of this course description to be changed before the autumn semester starts. Particularly, but not exclusively, this relates to teaching methods, mandatory requirements and assessment.
Jan-Magnus Moberg, Director, PwC.