FIE453 Big Data with Applications to Finance
The following topics will be covered:
- Introduction to statistical computing. Students will learn basics of using computers to analyze data, with a special emphasis on R.
- Working with data. Data is rarely found in perfectly usable form. Students will learn how to clean the data to make it usable.
- Regression and classification. Regression and classification are two basic supervised learning problems, and linear and logistic regression are two classical techniques for these problems.
- Fitting and overfitting. Big Data allows fitting very flexible models, which permits learning subtle features of the data. This creates the danger of overfitting, where the fit fails out of sample. Controlling overfitting is one of the central tasks in analysis of Big Data.
- Supervised learning techniques. Techniques for regression and classification have become very advanced. We cover the leading techniques, such as decision trees, support vector machines, and boosting.
Unsupervised learning techniques. Clustering and market basket analysis.
Businesses have entered the age of Big Data. Now that computers and the Internet have become so central to modern commerce, businesses are awash with large amounts of data about their customers. Electronic trading has made handling large amounts of data central to financial firms. Big Data has created both challenges and opportunities. The biggest opportunity is to extract useful information from the masses of data. Businesses that can extract such information and act on it automatically have a substantial competitive age. Some of the world's most successful companies, such as Google and Facebook, have built themselves around Big Data.
This course will focus on the analysis of Big Data, and will focus on applications to finance. (We will touch on other application areas as appropriate.) Examples of such applications include:
- Automated trading and portfolio management.
- Credit scoring: determining which loans are likely to go bad.
- Catching financial criminals in the act.
- Sales: recommending new financial products to customers.
This course will take students to the frontier of Big Data analysis. Students will learn how to:
- Work with large datasets.
- Fix problems in real-world data to enable analysis.
- Analyze data using the current state-of-the art algorithms for supervised and unsupervised learning.
- Understand and avoid overfitting.
- Apply what they have learned to real data.
Teaching will use a range of techniques:
- Lab sessions where students will practice techniques on their laptop.
- Class student presentations.
The class will conclude with a group final project using real data. The purpose of the class presentations will be for students to present preliminary work leading up to the final project. Examples of topics for such presentations are: describe the data, describe steps taken to clean the data, preliminary efforts to analyze the data using techniques taught in lectures.
Previous knowledge of regression recommended. Some background in finance or experience with R helpful.
50% class presentations (students will be expected to present multiple times through the semester). 50% group final paper. All material must be in English.
Grading scale A-F for presentations and the final paper.
Laptop with working installation of R
Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani (2013 ¿ 103). An Introduction to Statistical Learning. Springer Texts in Statistics.
Jerome Friedman Trevor Hastie Robert Tibshirani (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
Both books are freely available electronically
- ECTS Credits
- Teaching language