# Tianpei (Luke) Xie

## Summary

I received my PhD degree in ECE department in University of Michigan, Ann Arbor. My area of research is robust prediction and classification from multiple source data, under the advisory of Prof. Alfred O. Hero. My working area includes robust learning, multi-view information fusion, manifold learning and network analysis.

I have more than $5$ years experience in machine learning and data science, with several publications in top conferences. Being familiar with programming languages such as C$++$($>$ 8 yrs) and Python ($>$3 yrs) and database language such as SQL, I also have solid background in both machine learning and statistics (holding double degree). I am a trustworthy friend and I am willing to be a valuable member for any team.

* Specialties:

• Machine Learning
• Deep Learning
• Natural Language Processing

* Programming Languages:

• Python, Matlab, C++, Spark, SQL;
• Linux, Amazon AWS

The following slides summarize my research activities:

My personal webpage

## Research Area and Projects

### Sub-network topology learning using decayed-influence latent Gaussian graphical models

Time: Oct 2016 - Apr 2017

Consider a social network where only a few nodes (agents) have meaningful interactions in the sense that the conditional dependency graph over node attribute variables (behaviors) is sparse. A company that can only observe the interactions between its own customers will generally not be able to accurately estimate its customers’ dependency subgraph: it is blinded to any external interactions of its customers and this blindness creates false edges in its subgraph.

In this paper we address the semiblind scenario where

• the company has access to a noisy summary of the complementary subgraph connecting external agents, e.g., provided by a consolidator.
• The proposed framework applies to other applications as well, including field estimation from a network of awake and sleeping sensors and privacy-constrained information sharing over social subnetworks.
• We propose a penalized likelihood approach in the context of a graph signal obeying a Gaussian graphical models (GGM).
• We use a convex-concave iterative optimization algorithm to maximize the penalized likelihood. The effectiveness of our approach is demonstrated through numerical experiments and comparison with state-of-the-art GGM and latent-variable (LV-GGM) methods.
• Publications
• Xie, Tianpei, Sijia Liu, and Alfred O. Hero. “Semiblind Subgraph Reconstruction in Gaussian Graphical Models.”, The fifth IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2017

### Multi-view learning on Statistical Manifolds

Time: Fall 2014 - Fall 2015

In many situations, data come in the form of histogram, e.g. bag-of-words in Natural Language Processing, SIFT in image recognition or database indexing.

• In this project, we consider a multi-view classification problem where data $\mathbf{x}$ come from multiple different sources (, called views, e.g. text $+$ image, acoustic sensor $+$ seismic sensor, video $+$ audio ).
• Label is given as histogram $p(y|\mathbf{x})$
• The set of label histograms form an non-Euclidean space called statistical manifold.
• The goal is learn classifier from multiple source data on statistical manifold.
• Different from the conventional feature fusion and Bayesian fusion approaches, an alternative model fusion approach, called COM-MED, is presented that learns a consensus view to fuse predictive information from different views.
• Using information-theoretic divergences as a stochastic consensus measure, COM-MED takes into account the intrinsic non-Euclidean geometry of the statistical manifold and are insensitive to both noise corruption in single views and between-view inconsistency.
• Publications :
• Xie, Tianpei, Nasser M. Nasrabadi, and Alfred O. Hero. “Multi-view learning on statistical manifold via stochastic consensus constraints.” in preparation.
• Xie, Tianpei, Nasser M. Nasrabadi, and Alfred O. Hero. “Semi-supervised multi-sensor classification via consensus-based Multi-View Maximum Entropy Discrimination.” In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp. 1936-1940. IEEE, 2015. .pdf

### Robust training on approximated minimal-entropy set

Time: Fall 2013 - Fall 2014

Data clean is a long, boring procedure but is necessary for many learning algorithms. Traditional machine leaning algorithms such as support vector machine or adaboost often fail to achieve reliable performance if training data are corrupted.

• In this project, we develop robust classification algorithm that could effectively learn from corrupted training set to produce accurate and reliable predictions.
• a robust maximum entropy discrimination method, referred as GEM-MED, is proposed that minimizes the generalization error of the classifier with respect to the nominal data distribution. Here data are nominal if they are not corrupted.
• The proposed method exploits the non-parametric property of the kernel method in combination with the concept of minimal-entropy-set, a concept which is only used in anomaly detection to achieve both classification and detection.
• GEM-MED is convex so it can be solved with unique solution.
• In synthesis dataset and ARL-Footstep dataset, we demonstrate the superiority of GEM-MED compared to other state-of-the-art robust training methods
• Publications :
• Xie, Tianpei, Nasser M. Nasrabadi, and Alfred O. Hero. “Learning to classify with possible sensor failures.” IEEE Transaction on Signal Processing, 2016 .pdf
• Xie, Tianpei, Nasser M. Nasrabadi, and Alfred O. Hero. “Learning to classify with possible sensor failures.” In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pp. 2395-2399. IEEE, 2014. .pdf

## Other project experience

### Drunk Driver Prediction using Fatality Analysis Reporting System (FARS)

Time: Jan. 2016- Mar. 2016

Drunken driver is a major cause of fatality in car accident. In this project, our goal is to predict if a drunken driver is involved in a car accident, using data from the Fatality Analysis Reporting System (FARS) in the Federal Crash Databases. The result is intended to help in policy making and traffic control.

• Constructed a demographic network with node being counties and edge being road that join counties.
• Developed a gradient-boosting based method on network to jointly make feature selection and prediction and is robust in the presence of outliers and missing values.
• We proposed a fast learning algorithm that handles over 2 million samples in the data set with an overall accuracy of 86$\%$.
• Improve over graphical models and boosting method by 13$\%$. Achieved ranked 4th in the data science competition with more than 1000 lines of python codes.

### Springleaf Marketing Response- Kaggle competition (Michigan Data Science Team (MDST))

Time: Aug. 2015- Dec. 2015

Springleaf is a company that offers personal and auto loans. In this project, I work with a team of four people to help them to determine whether to send a direct mail offer to a potential customers.

• Propose a boosting-based feature selection algorithm to perform both feature selection and classification
• My duty in the team is to build the learning framework and develop the large-scale optimization toolbox for further development.
• With over 2,000 characteristic features to describe each customer and over 120,000 possible customers across United States, a highly memory-efficient prediction strategy is proposed based on gradient boosting framework.
• Scripts with C++ and Python that are $>$ 10,000 lines in total are available.Ranked top 20$\%$ in the public leaderboard and the 3rd in Michigan Data Science Competition.\hspace{-5pt}

## Publications

• Xie, Tianpei, Sijia Liu, and Alfred O. Hero. “Semiblind Subgraph Reconstruction in Gaussian Graphical Models.”, The fifth IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2017
• Xie, Tianpei, Nasser M. Nasrabadi, and Alfred O. Hero. “Multi-view learning on statistical manifold via stochastic consensus constraints.” in preparation.
• Xie, Tianpei, Nasser M. Nasrabadi, and Alfred O. Hero. “Learning to classify with possible sensor failures.” IEEE Transaction on Signal Processing, 2016 .pdf
• Xie, Tianpei, Nasser M. Nasrabadi, and Alfred O. Hero. “Semi-supervised multi-sensor classification via consensus-based Multi-View Maximum Entropy Discrimination.” In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp. 1936-1940. IEEE, 2015. .pdf
• Xie, Tianpei, Nasser M. Nasrabadi, and Alfred O. Hero. “Learning to classify with possible sensor failures.” In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pp. 2395-2399. IEEE, 2014. .pdf