FIE453 Big Data with Applications to Finance
Businesses have entered the age of Big Data. Now that computers and the Internet have become so central to modern commerce, businesses are awash with large amounts of data about their customers. Electronic trading has made handling large amounts of data central to financial firms. Big Data has created both challenges and opportunities. The biggest opportunity is to extract useful information from the masses of data using statistics and machine learning. Businesses that can extract such information and act on it automatically have a substantial competitive advantage. Some of the world's most successful companies, such as Google and Facebook, have built themselves around Big Data. Banks and insurance companies also have increased the use of big data and predictive analytics over the last years to accelerate digitalization.
This course will focus on the applications of predictive analytics and Big Data in finance (as well as other industries where applicable). Examples of such applications include:
- Credit scoring: Determining which loans are likely to go bad
- Anti-money laundering: Identifying unlawful transactions
- Customer analytics: Various propensity models to model e.g. the likelihood of accepting an offer and the probability of churning
- Pricing models: Pricing in non-life insurance and the estimation of risk premium
This course will take students to the frontier of big data analysis in finance. Students will learn how to:
- Work with large datasets
- Handle real-world data issues to enable analysis
- Analyse data using the current state-of-the art algorithms for supervised and unsupervised learning
- Understand and avoid overfitting
- Apply what they have learned to real data and business problems
- Students will know how to analyze large datasets using both supervised and unsupervised machine learning techniques
- Introduction to statistical computing. Students will be able to use the basics computers to analyze data in R
- Working with data. Data is rarely found in perfectly usable form. Students will be able to structure and clean data to make it usable in machine learning algorithms
- The student will know how to handle datasets that are too large to fit on your computer’s RAM.
- The student will know about regression and classification. Regression and classification are two basic supervised machine learning problems, and linear and logistic regression are two classical techniques for these problems
- The student will understand fitting and overfitting. Big Data allows fitting very flexible models, which permits learning subtle features of the data. This creates the danger of overfitting, where the model fails out-of-sample. Controlling for overfitting is one of the central tasks when developing predictive models.
- Supervised machine learning techniques. Techniques for regression and classification have become very advanced. We cover the leading techniques, and the student will know techniques such as decisions trees and boosting.
- The student will know about Unsupervised learning techniques, with applications in e.g. anomaly detection.
- Structure and analyze data in R
- Machine learning
Teaching will be carried out using:
- Sessions where students will practice techniques on their laptop (programming in R)
- Class student presentations
The class will conclude with a group final project using real data. The purpose of the class presentations will be for students to present preliminary work leading up to the final project. Examples of topics for such presentations are: describe the data, describe steps taken to clean the data, preliminary efforts to analyze the data using techniques taught in lectures.
Previous knowledge of linear regression is recommended. Some background in finance or experience with R is helpful.
An individual multiple choice test,
This class is graded as a portfolio of work. Portfolio assessment presupposes that the student's development along the way is included in the assessment. The class is graded on a portfolio of class presentations, a group final paper and peer review. The final grade is based on the whole portfolio as a unified whole. Students will also submit a reflection note that summarizes their learning and development. All material must be in English.
For class presentations, students will be expected to present multiple times through the semester, either singly or in groups. Group size must be approved by the lecturer. The final peer review is for all group work. At the end of the term, students will evaluate the contribution of his/her team members. The peer review requires you to behave in a responsible and respectful manner. If I deem a student to deviate from such behavior, I may overrule the peer review.
The grade awarded may not be appealed due to the nature of the assessments.
Laptop with working installation of R.
Gareth James, Daniella Witten, Trevor Hastie, Robert Tibshirani (2013). An Introduction to Statistical Learning. Springer Texts in Statistics.
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer
The two first books are freely available electronically.
- ECTS Credits
- Teaching language
Autumn. Offered Autumn 2023.
Associate Professor Walter Pohl, Department of Finance, NHH