We begin by talking about linear regression…the ancestor of neural nets. We look at how linear regression can use simple matrix operations to learn from data. We gurgle with delight as we see why one initial assumption leads inevitably to the decision to try to minimize sum squared error. We then explore an alternative way to compute linear parameters—gradient descent. And then we exploit gradient descent to allow classifiers in addition to regressors, and finally to allow highly non-linear models—full neural nets in all their glory.