ML Strategy


  1. 13.png
  2. Importance of having a single real number evaluation metric: F1 score for Precision and Recall, Satisficing and optimizing metrics
  3. Comparing with human level performance
    1. 2.png
    2. 3.png
    3. As we approach human level performance, it is harder to detect which one of bias/variance needs improvement
    4. Human level performance as a proxy for Bayes error
    5. 3.png
  4. Setting up training, dev and test sets properly
    1. Always make sets come from same distribution
    2. Size of test set should be big enough for confidence in final performance of system
    3. Changing evaluation metrics and shifting targets in between project
    4. Specifically giving more weight to extreme wrong examples
  5. 14.png
  6. If doing well on current metric + dev/test set does not correspond to doing well on final application, we should change the metric or the dev/test set

Error Analysis

  1. Manually determining why our model is failing by analyzing wrong inferences.
  2. Determine category based error rate to focus improvements.
  3. 1.png
  4. Random errors vs Systematic errors in training set
  5. Important to consider examples which we got wrong and right too
  6. Setup train/dev/test sets and build your first system quickly and then iterate according to Bias/Variance and Error Analysis

Mismatched training and dev/test sets

  1. Data set must reflect the target to hit, don’t target PC when user is on mobile
  2. Set training set as (M+m), dev/test set as (m) – dev set might be much harder
  3. Training-dev set = Representative of training set to determine if network is flawed [high bias / variance] or (dev) data is too hard to process [data mismatch problem]
    1. Fundamental take away is that performance must be compared across data sets coming from same distribution
  4. Transfer learning as sober first, drunk later
  5. 2.png

Addressing data mismatch

  1. When you train on perfect pictures and dev/test on blurry pictures and your model performs very poorly, rather than when your data labels are wrong
  2. Artificial data synthesis
    1. How distribution of data in training and dev set differs, make training data more similar to dev/test sets
    2. Careful about over-fitting to noise(just 1 hour of noise)

Transfer Learning

  1. When we have
    1. Lot of data for original problem A and less data for target problem B
    2. Same input data for both problems
    3. Low level features from A and B are preferably common
  2. 1
  3. Deleting final layer, expanding new network, deciding to train initial layers or not and how deep to keep level of training

Multi-task learning

  1. Training one network to classify 4 things > Training 4 networks to classify 1 thing
  2. 1
  3. 2

End-to-end Learning

  1. Replacing multiple stages of learning with a single network. Requires a very large amount of data to compete with the traditional pipeline approach.
  2. A network learns from the data, and from the hand designed components(which reflect the human knowledge)
  3. Key question to ask is if we have enough data to learn a function of the complexity needed to map x to y?




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s