Professor YY's Machine Learning Talk

Hongtao Hao / 2021-04-13

I learned a lot from Professor YY’s talk on Machine Learning . The following is my notes on the lecture.

1. What is machine learning #

Build a computer program that can learn from data (experiences).

1.1 Examples of machine learning tasks #

1.2 Topics in machine learning #

2. Primary domains of machine learning #

2.1 Supervised learning #

2.2 Unsupervised learning #

An example task: given many documents, you use ML algorithms to find out how language is structured.

2.3 Reinforcement learning #

The agent receives infrequent feedback (e.g., rewards or punishments) and learn how to perform a task. The feedback is infrequent or occasional in the sense that you cannot know the result until the end, for example, in a Go game .

A prime example of reinforcement learning is AlphaGo.

3. Regression models and machine learning #

Your regression model is indeed doing machine learning.

Machine learning is task-oriented. You can use any tools to achieve this goal. Since statistical models are good at revealing patterns in data, they are widely used in machine learning. That said, tools not grounded in statistics (e.g., evolutionary algorithms, Deep Learning) are also used in machine learning, as long as they work.

Machine learning is more about “solving a task” whereas statistics is more about understanding something.

3.1 Social sciences and machine learning #

Models in social sciences should be (1) simple, and (2) be based on theory. You cannot just throw everything into the model and you should care about the interpretation of your model.

Machine learning is different. Simplicity and interpretation of the models are much less important than their performance.

4. Bias-Variance trade off #

Machine learning prioritizes prediction, so it is prone to overfitting. Applying thousands of features or billions of parameters to machine learning is not unusual.

4.1 How to avoid overfitting: Split the data #

5. Social scientists V.S. ML scientists #

Suppose the topic of a research project is “poverty of an individual”.

5.1 Social scientists #

The following is how social scientists approach this topic:

5.2 ML scientists #

The following is how machine learning scientists approach this topic:

5.3 A cool paper combining these two methods #

Blumenstock, J., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile phone metadata . Science, 350(6264), 1073-1076.

6. Some Machine Learning Methods (non-regression ones) #

  1. KNN (k-nearest neighbors)

  2. Decision Tree

  3. Random forest (ensemble learning rocks)

This is the go-to method if you are unsure which methods to use.

  1. XGBoost

Many Kaggle winners use this method.

  1. Topic modeling (Bayesian inference)

7. Deep Learning #

Image recognition, language tasks, protein folding, go, word embedding, etc.

8. Summary #

Not really. ML is all about making machines to learn. Statistical models are just one tool in its toolbox.

Yes. With ML, you can (1) extract richer information from diverse datasets (texts, images, networks, etc.), (2) try other models, (3) think about generalizability of the model out-of-sample, (4) try incorporating many features or letting machines learn features, and (5) pay attention to prediction performance of the model.

(1) Understand the variables and models, rather than just throwing in complex models; (2) Have some domain knowledge; (3) Collect quality data on your own; (4) Think about social implications of the task

8.1 How to speak MLese #

9. Other information #

The 3blue1brown channel on YouTube is highly recommended.

ML is not magic. There is lots of work behind the scenes: preparing data, finding out different models, tweaking hyperparameters. Lots of tweaking.

Last modified on 2021-10-05