TECH6 Data Science and Econometrics

Autumn 2026

Topics
TECH6 is taught in the fifth semester, which is focused on market and business analytics. It builds on students’ knowledge and skills in mathematics (TECH1 and TECH4), statistics (TECH3), and programming (TECH2 and TECH5). Its main goal is to strengthen core competencies needed in data-driven analyses of economic activity and familiarize students with a battery of methods used for prediction and establishing causal relationships in experimental and observational data. TECH6 is also a bridge to the sixth semester on data-driven decisions for sustainable value creation and a foundation for more advanced courses in econometrics, time series analysis, and machine learning taught at the Master’s level.
The first part of this course familiarizes students with econometric techniques at an intermediate level. It provides a deep understanding of modern econometric methods that are widely applied across many fields of economics, with a strong emphasis on both theoretical foundations and practical empirical work. All empirical exercises and assignments will be conducted using R, giving students hands-on experience with one of the most widely used programming languages in applied economics research.
The course covers core topics in causal inference and program evaluation, including identification, randomization, instrumental variables, difference-in-differences, regression discontinuity, and matching methods. Students will develop an understanding of when and how to apply each method, as well as the assumptions underlying their validity. The course also addresses more specialized applied topics such as bad controls, measurement error, and clustering, equipping students with the tools to critically assess empirical work.
Finally, students will explore real-world applications across areas such as labour markets, education, and urban economics, reinforcing the practical relevance of the methods covered throughout the course.
The second part of this course provides a comprehensive and practical introduction to supervised and unsupervised machine learning. Students will learn how to handle large datasets and automate the retrieval of new data from internet sources using techniques such as web scraping. The curriculum explores foundational models for both regression and classification, equipping students with the theoretical understanding and practical tools needed to evaluate model performance across a variety of metrics.
All practical implementations in this part of the course are conducted in Python, with a strong emphasis on industry-standard libraries including pandas and scikit-learn (sklearn). To prepare students for modern, real-world data science workflows, the course also covers essential environment management and version control tools, focusing on Anaconda, GitHub, and VS Code. Additionally, students will learn how to optimize their models through hyperparameter tuning and cross-validation, and will explore the integration of AI tools within VS Code to maximize their coding effectiveness and efficiency.
Course outline:
Part 1: Econometric Methods and Applications
- Core topics in causal inference and program evaluation: identification, randomization, instrumental variables, difference-in-differences, regression discontinuity, and matching.
- Testing and inference.
- Hands-on empirical exercises using R.
- Real-world applications in labour markets, education, and urban economics.
Part 2: Introduction to supervised and unsupervised machine learning
- Methods for unsupervised and supervised machine learning implemented in scikit-learn
- Regression models (Ridge, Lasso, Elastic Net)
- Classification models (Logistic regression, Support Vector Machines, Random Forest)
Learning outcome
Upon completion of the course, the student can:
Knowledge
- identify the assumptions needed to interpret identification strategies and estimates as relevant for policy and decision making
- describe the central concepts and terminology of econometrics and machine learning
- explain the core principles, capabilities, and differences between supervised and unsupervised machine learning models
- discuss various metrics and methods used for evaluating the performance of regression and classification models
Skills
- use R to import, store and manage data in different formats, conduct econometric analyses, build machine learning models and produce tables and figures
- design simple experiments to gather data in business environments
- present results of empirical analyses in a clear and convincing manner to various audiences
- leverage Python libraries, specifically pandas and scikit-learn, to implement machine learning algorithms and handle large datasets
General competence
- provide data-driven answers to business-oriented and policy questions
- interpret and critically assess empirical work in business, economics and policy settings
- communicate and manage data science workflows effectively using modern development environments and tools, including GitHub, Anaconda, and VS Code, and AI tools
Teaching

This course is taught using a combination of hands-on lectures to develop new concepts and weekly workshops in which students are asked to apply these concepts to solve problems in R or Python on their own laptops.
We put emphasis on research-based teaching by encouraging students to work on their own term papers. The students will present the results of their own analyses, both in written form and in front of an audience.
Recommended prerequisites

TECH1, TECH2, TECH3 and TECH4.
Compulsory Activity

Students have to individually complete two assignments, one for the first part of the course and one for the second part.
Assessment
The assessment for this course consists of two components:
- Term paper (70%)
- 30-minute presentation of the term paper (30%)
The presentation and term paper will be done in groups of 2-3 students. Students can work alone if they have a good reason that needs to be approved by the course responsible.
The term paper will be handed out towards the end of the teaching period, and students have 4 weeks to complete it. The term paper and presentation must be passed in the same semester, and both components must be answered in English.
Grading Scale

A-F.
Computer tools
- R and RStudio
- Access to secure AI tools (eg KI-chat)
- Anaconda Python distribution
- Git version control
- Visual Studio Code, GitHub Copilot
Literature

Part 1
Cunningham, Scott (2021). Causal Inference: The Mixtape (Yale University Press, USA)
Part 2
"The Elements of Statistical Learning: Data Mining, Inference, and Prediction." 2nd edition (freely available online) and lecture notebooks handed out by the course responsible.
Retake

Retake is not offered in the non-teaching semester. The assessment may be retaken the next time the course is offered.
For detailed information regarding the retake policy, please visit our website: https://www.nhh.no/en/for-students/examinations/retake-of-exams/ (copy url).