Upload failed. Maybe wrong permissions?

### Table of Contents

# References

This page provides references for classic papers and books in various field of machine learning, big data analysis. For the beginner and new students, it provides foundations for your own research. For senior students, we expect you to get inspired by previous work of other pioneers in your field.

## Machine Learning

- For an overview:
- Hastie, Tibshirani, Friedman; Elements of Statistical Learning.
- Duda, Hart, Stork; Pattern Recognition.

- More theoretical material:
- Devroye, Györfi, Lugosi; A Probabilistic Theory of Pattern Recognition.
- Mohri, Rostamizadeh, Talwalkar; Foundations of Machine Learning (Adaptive Computation and Machine Learning series)
- Lugosi, Massart, Boucheron; Concentration Inequalities: A Nonasymptotic Theory of Independence

- Penalized estimation:
- Liu, Roeder, Wasserman; Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

### General Surveys

- Domingos; (For beginner) A few useful things to know about machine learning

### Spectral Methods

- Spectral clustering:
- Ng, Jordan, Weiss; On Spectral Clustering: Analysis and an Algorithm.
- Shi, Malik; Normalized Cuts and Image Segmentation.

- Laplacian Eigenmaps: http://www.cse.ohio-state.edu/~mbelkin/papers/papers.html
- Luxburg, Belkin, Bousquet; Consistency of Spectral Clustering

- Diffusion maps:
- Coifman, Lafon; Diffusion maps
- Lafon, Keller, Coifman; Data Fusion and Multicue Data Matching by Diffusion Maps
- Nadler, Lafon, Coifman, Kevrekidis; Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

- (Semi-)supervised learning:
- Costa, Hero; Classification constrained dimensionality reduction
- Raich, Costa, Damelin, Hero; Classification constrained dimensionality reduction
- Zhou, Li; Semi-supervised learning by disagreement

### Dimensionality Estimation

- Maaten, Postma, Herik Dimensionality reduction: A comparative review

### Online learning and Boosting

- Shalev-Shwartz; Online Learning and Online Convex Optimization
- Murata, Takenouchi, Kanamori, Eguchi; Information geometry of U-Boost and Bregman divergence

### Sparse coding, dictionary learning and matrix factorization

- Dictionary learning:
- Aharon, Elad, Bruckstein; K-SVD and its non-negative variant for dictionary design
- Mairal, Bach, Ponce; Task-Driven Dictionary Learning
- Mairal, Bach, Ponce, Sapiro; Online learning for matrix factorization and sparse coding

- Sparse coding and compressed sensing:
- Candes, Wakin; Enhancing Sparsity by Reweighted L1 Minimization

### Deep learning, neural network, feature learning

- Deep learning:

- Feature learning:
- Lee, Grosse, Ranganath, Ng; Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
- Bengio, Courville, Vincent; Representation Learning: A Review and New Perspectives

### Learning from multiple sources

- Multi-view semi-supervised learning:
- Xu, Tao, Xu; A survey of multi-view learning
- Crammer, Keams, Wortman; Learning from Multiple Sources

- Multi-task learning:
- Argyriou, Evgeniou, and Pontil; Convex multi-task feature learning

## Random Geometric Graphs and Networks

- (Generalized) BHH theorem and application:

- Percolation theory:
- Penrose; Random Geometric Graphs

- Random network theory:

## Differential Geometry in statistics, information theory and learning

- J. Manton http://arxiv.org/abs/1302.0430, A Primer on Stochastic Differential Geometry for Signal Processing
- Amari, Nagaoka; Methods of Information Geometry (Translations of Mathematical Monographs) (Tanslations of Mathematical Monographs) Classic in information geometry (need to know differential geometry first)
- Murata, Takenouchi, Kanamori, Eguchi; Information geometry of U-Boost and Bregman divergence

### Information Divergence Estimation and Applications

- Graph-Based Approaches
- Henze, Penrose; On the multivariate runs test

- K-NN Methods
- Moon, Hero; Ensemble estimation of multivariate f-divergence

- KDE Plug-in Methods
- Moon, Sricharan, Greenewald, Hero; Improving convergence of divergence functional ensemble estimators
- Kandasamy, Krishnamurthy, Poczos, Wasserman, Robins; Nonparametric von Mises Estimators for Entropies, Divergences and Mutual Informations

- Other Methods
- Nguyen, Wainwright, Jordan; Estimating divergence functionals and the likelihood ratio by convex risk minimization

- Bayes Error Bounds
- Berisha, Wisler, Hero, Spanias; Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure
- Moon, Delouille, Hero; Meta learning of bounds on the Bayes classifier error

## Target Detection/Tracking/Localization

- Localization in wireless sensor networks:
- Rangarajan, Raich, and Hero; Sparse multidimensional scaling for blind tracking in sensor networks

- An overview of tracking algorithms, including Kalman filters, extensions to Kalman filters, and particle filters:
- Arulampalam, et. al.; A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking

- An overview on linearization of the particle filter proposal density:
- Doucet, et. al.; On sequential Monte Carlo sampling methods for Bayesian filtering

- Multiple target tracking using particle filters:
- Kreucher, Kastella, and Hero; Multitarget Tracking using the Joint Multitarget Probability Density

## Adaptive Sensing

- Bashan, Raich, and Hero; Optimal two-stage search for sparse targets using convex criteria
- Chong, Kreucher, and Hero; Partially Observable Markov Decision Process Approximations for Adaptive Sensing
- Hero, Kreucher, and Blatt; “Information theoretic approaches to sensor management”, Ch.3 in Foundations and Applications of Sensor Management

## LaTeX tools

* TIKZ and PGF for drawing within LaTeX slides