FIE453 Big Data with Applications to Finance
Businesses have entered the age of Big Data. Now that computers and the Internet have become so central to modern commerce, businesses are awash with large amounts of data about their customers. Electronic trading has made handling large amounts of data central to financial firms. Big Data has created both challenges and opportunities. The biggest opportunity is to extract useful information from the masses of data using statistics and machine learning. Businesses that can extract such information and act on it automatically have a substantial competitive advantage. Some of the world's most successful companies, such as Google and Facebook, have built themselves around Big Data. Banks and insurance companies also have increased the use of big data and predictive analytics over the last years to accelerate digitalization.
This course will focus on the applications of predictive analytics and Big Data in finance (as well as other industries where applicable). Examples of such applications include:
- Credit scoring: Determining which loans are likely to go bad
- Anti-money laundering: Identifying unlawful transactions
- Customer analytics: Various propensity models to model e.g. the likelihood of accepting an offer and the probability of churning
- Pricing models: Pricing in non-life insurance and the estimation of risk premium
This course will take students to the frontier of big data analysis in finance. Students will learn how to:
- Work with large datasets
- Handle real-world data issues to enable analysis
- Analyse data using the current state-of-the art algorithms for supervised and unsupervised learning
- Understand and avoid overfitting
- Apply what they have learned to real data and business problems
- Students will learn how to analyze large datasets using both supervised and unsupervised machine learning techniques
- Introduction to statistical computing. Students will learn the basics of how to use computers to analyze data in R
- Working with data. Data is rarely found in perfectly usable form. Students will learn how to structure and clean data to make it usable in machine learning algorithms
- Handling datasets that are too large to fit on your computer’s RAM.
- Regression and classification. Regression and classification are two basic supervised machine learning problems, and linear and logistic regression are two classical techniques for these problems
- Fitting and overfitting. Big Data allows fitting very flexible models, which permits learning subtle features of the data. This creates the danger of overfitting, where the model fails out-of-sample. Controlling for overfitting is one of the central tasks when developing predictive models.
- Supervised machine learning techniques. Techniques for regression and classification have become very advanced. We cover the leading techniques, such as decisions trees and boosting.
- Unsupervised learning techniques, with applications in e.g. anomaly detection.
- Structure and analyze data in R
- Machine learning
Teaching will be carried out using:
- Sessions where students will practice techniques on their laptop (programming in R)
- Class student presentations
The class will conclude with a group final project using real data. The purpose of the class presentations will be for students to present preliminary work leading up to the final project. Examples of topics for such presentations are: describe the data, describe steps taken to clean the data, preliminary efforts to analyze the data using techniques taught in lectures.
It will be possible to follow the course without being physically present in class
Previous knowledge of linear regression is recommended. Some background in finance or experience with R helpful.
Requirements for course approval
Individual multiple choice test.
The class is graded on a portfolio of class presentations (approximately 35%), a group final paper (approximately 60%) and peer review (approximately 5%). The grade is based on the whole portfolio. All material must be in English.
For class presentations, students will be expected to present multiple times through the semester, either singly or in groups. Group size must be approved by the lecturer. The final 5% is peer review for all group work. At the end of the term, students will evaluate the contribution of his/her team members. The peer review requires you to behave in a responsible and respectful manner. If I deem a student to deviate from such behavior, I may overrule the peer review.
The grade awarded may not be appealed due to the nature of the assessments.
Laptop with working installation of R.
Gareth James, Daniella Witten, Trevor Hastie, Robert Tibshirani (2013). An Introduction to Statistical Learning. Springer Texts in Statistics.
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer
Martin Schmalz, Uri Bram (2019). The Business of Big Data: How to Create Lasting Value in the Age of AI. Capara Books.
The two first books are freely available electronically. The latter is available on Amazon.
The first book is compulsory, while the two latter are recommended literature in the course.
- ECTS Credits
- Teaching language
Autumn. Offered Autumn 2021.
Associate Professor Walter Pohl, Department of Finance, NHH