Data Science #10 - The original principal component analysis (PCA) paper by Harold Hotelling (1935)

Hotelling, Harold. "Analysis of a complex of statistical variables into principal components." Journal of educational psychology 24.6 (1933): 417. This seminal work by Harold Hotelling on PCA remains highly relevant to modern data science because PCA is still widely used for dimensionality reduction, feature extraction, and data visualization. The foundational concepts of eigenvalue decomposition and maximizing variance in orthogonal directions form the backbone of PCA, which is now automated through numerical methods such as Singular Value Decomposition (SVD). Modern PCA handles much larger datasets with advanced variants (e.g., Kernel PCA, Sparse PCA), but the core ideas from the paper—identifying and interpreting key components to reduce dimensionality while preserving the most important information—are still crucial in handling high-dimensional data efficiently today.

Om Podcasten

We discuss seminal mathematical papers (sometimes really old 😎 ) that have shaped and established the fields of machine learning and data science as we know them today. The goal of the podcast is to introduce you to the evolution of these fields from a mathematical and slightly philosophical perspective. We will discuss the contribution of these papers, not just from pure a math aspect but also how they influenced the discourse in the field, which areas were opened up as a result, and so on. Our podcast episodes are also available on our youtube: https://youtu.be/wThcXx_vXjQ?si=vnMfs