FIE453 Big Data with Applications to Finance
Businesses have entered the age of Big Data. Now that computers and the Internet have become so central to modern commerce, businesses are awash with large amounts of data about their customers. Electronic trading has made handling large amounts of data central to financial firms. Big Data has created both challenges and opportunities. The biggest opportunity is to extract useful information from the masses of data using statistics and machine learning. Businesses that can extract such information and act on it automatically have a substantial competitive advantage. Some of the world's most successful companies, such as Google and Facebook, have built themselves around Big Data.
This course will focus on the analysis of Big Data, and will focus on applications to finance. (We will touch on other application areas as appropriate.) Examples of such applications include:
- Automated trading and portfolio management.
- Credit scoring: determining which loans are likely to go bad.
- Catching financial criminals in the act.
- Sales: recommending new financial products to customers.
This course will take students to the frontier of Big Data analysis. Students will learn how to:
- Work with large datasets.
- Fix problems in real-world data to enable analysis.
- Analyze data using the current state-of-the art algorithms for supervised and unsupervised learning.
- Understand and avoid overfitting.
- Apply what they have learned to real data.
- Students will learn how to practically analyze large-scale business data using machine learning techniques.
- Introduction to statistical computing. Students will learn basics of using computers to analyze data, with a special emphasis on R.
- Working with data. Data is rarely found in perfectly usable form. Students will learn how to clean the data to make it usable.
- Regression and classification. Regression and classification are two basic supervised learning problems, and linear and logistic regression are two classical techniques for these problems.
- Fitting and overfitting. Big Data allows fitting very flexible models, which permits learning subtle features of the data. This creates the danger of overfitting, where the fit fails out of sample. Controlling overfitting is one of the central tasks in analysis of Big Data.
- Supervised learning techniques. Techniques for regression and classification have become very advanced. We cover the leading techniques, such as decision trees, support vector machines, and boosting.
- Unsupervised learning techniques. Clustering and market basket analysis.
- Analyze data in R.
- Machine learning.
Teaching will use a range of techniques:
- Lab sessions where students will practice techniques on their laptop.
- Class student presentations.
The class will conclude with a group final project using real data. The purpose of the class presentations will be for students to present preliminary work leading up to the final project. Examples of topics for such presentations are: describe the data, describe steps taken to clean the data, preliminary efforts to analyze the data using techniques taught in lectures.
Previous knowledge of regression recommended. Some background in finance or experience with R helpful.
Requirements for course approval
Individual multiple choice test
The class is graded on a portfolio of class presentations (approximately 35%), a group final paper (approximately 60%) and peer review (approximately 5%). The grade is based on the whole portfolio. All material must be in English.
For class presentations, students will be expected to present multiple times through the semester, either singly or in groups. Group size must be approved by the lecturer.
The final 5% is peer review for all group work. At the end of the term, students will evaluate the contribution of his/her team members. The peer review requires you to behave in a responsible and respectful manner. If I deem a student to deviate from such behavior, I may overrule the peer review.
The grade awarded may not be appealed due to the nature of the assessments.
Grading scale A-F.
Laptop with working installation of R.
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani (2013). An Introduction to Statistical Learning. Springer Texts in Statistics.
Jerome Friedman Trevor Hastie Robert Tibshirani (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
Both books are freely available electronically
- ECTS Credits
- Teaching language
Autumn. Offered Autumn 2018
Associate Professor Walter Pohl, Department of Finance, NHH