Machine Learning:
A branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
Machine Learning vs Statistics:
Statistics insists on proper and rigorous methodology, and is comfortable with making and noting assumptions. It cares about how the data was collected, the resulting properties of the estimator or experiment (e.g. p-value, unbiased estimators), and the kinds of properties you would expect if you did a procedure many times.
ML is happy to treat the algorithm as a black box as long as it works. Prediction and decision-making is king, and the algorithm is only a means to an end. It's very important in ML to make sure that your performance would improve (and not take an absurd amount of time) with more data.
Type of Machine Learning Algorithm:
Linear Regression:
In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables denoted X.
Gradient Descent in Python:
import numpy
import pandas
def compute_cost(features, values, theta):
"""
Compute the cost of a list of parameters, theta, given a list of features (input
data points) and values (output data points).
"""
m = len(values)
sum_of_square_errors = numpy.square(numpy.dot(features, theta) - values).sum()
cost = sum_of_square_errors / (2*m)
return cost
def gradient_descent(features, values, theta, alpha, num_iterations):
"""
Perform gradient descent given a data set with an arbitrary number of features.
"""
# Write code here that performs num_iterations updates to the elements of theta.
# times. Every time you compute the cost for a given list of thetas, append it
# to cost_history.
cost_history = []
m = len(values)
theta -= (alpha / m) * numpy.dot((predicted_values - values), features)
cost_history.append(compute_cost(features, values, theta))
return theta, pandas.Series(cost_history) # leave this line for the grader
Coefficient of determination:
In statistics, the coefficient of determination, denoted R2 and pronounced R squared, indicates how well data points fit a statistical model – sometimes simply a line or curve.
Calculating R Squared:
import numpy as np
def compute_r_squared(data, predictions):
# Write a function that, given two input numpy arrays, 'data', and 'predictions,'
# returns the coefficient of determination, R^2, for the model that produced
# predictions.
#
# Numpy has a couple of functions -- np.mean() and np.sum() --
# that you might find useful, but you don't have to use them.
# YOUR CODE GOES HERE
mean = np.mean(data)
SSr = np.sum(np.square(data - predictions))
SSt = np.sum(np.square(data - mean))
r_squared = 1.0 - (SSr / SSt)
return r_squared
A branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
Machine Learning vs Statistics:
Statistics is about drawing valid conclusions
It cares deeply about how the data was collected, methodology, and statistical properties of the estimator. Much of Statistics is motivated by problems where you need to know precisely what you're doing (clinical trials, other experiments).
Statistics insists on proper and rigorous methodology, and is comfortable with making and noting assumptions. It cares about how the data was collected, the resulting properties of the estimator or experiment (e.g. p-value, unbiased estimators), and the kinds of properties you would expect if you did a procedure many times.
Machine Learning is about prediction
It cares deeply about scalability and using the predictions to make decisions. Much of Machine Learning is motivated by problems that need to have answers (e.g. image recognition, text inference, ranking, computer vision, medical and healthcare, search engines.)ML is happy to treat the algorithm as a black box as long as it works. Prediction and decision-making is king, and the algorithm is only a means to an end. It's very important in ML to make sure that your performance would improve (and not take an absurd amount of time) with more data.
Type of Machine Learning Algorithm:
- Supervised learning algorithms are trained on labelled examples, i.e., input where the desired output is known. The supervised learning algorithm attempts to generalize a function or mapping from inputs to outputs which can then be used speculatively to generate an output for previously unseen inputs.
- Unsupervised learning algorithms operate on unlabelled examples, i.e., input where the desired output is unknown. Here the objective is to discover structure in the data ( clustering), not to generalize a mapping from inputs to outputs.
Linear Regression:
In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables denoted X.
Gradient Descent in Python:
import numpy
import pandas
def compute_cost(features, values, theta):
"""
Compute the cost of a list of parameters, theta, given a list of features (input
data points) and values (output data points).
"""
m = len(values)
sum_of_square_errors = numpy.square(numpy.dot(features, theta) - values).sum()
cost = sum_of_square_errors / (2*m)
return cost
def gradient_descent(features, values, theta, alpha, num_iterations):
"""
Perform gradient descent given a data set with an arbitrary number of features.
"""
# Write code here that performs num_iterations updates to the elements of theta.
# times. Every time you compute the cost for a given list of thetas, append it
# to cost_history.
cost_history = []
m = len(values)
for i in range(num_iterations):
predicted_values = numpy.dot(features, theta)theta -= (alpha / m) * numpy.dot((predicted_values - values), features)
cost_history.append(compute_cost(features, values, theta))
return theta, pandas.Series(cost_history) # leave this line for the grader
Coefficient of determination:
In statistics, the coefficient of determination, denoted R2 and pronounced R squared, indicates how well data points fit a statistical model – sometimes simply a line or curve.
Calculating R Squared:
import numpy as np
def compute_r_squared(data, predictions):
# Write a function that, given two input numpy arrays, 'data', and 'predictions,'
# returns the coefficient of determination, R^2, for the model that produced
# predictions.
#
# Numpy has a couple of functions -- np.mean() and np.sum() --
# that you might find useful, but you don't have to use them.
# YOUR CODE GOES HERE
mean = np.mean(data)
SSr = np.sum(np.square(data - predictions))
SSt = np.sum(np.square(data - mean))
r_squared = 1.0 - (SSr / SSt)
return r_squared
No comments:
Post a Comment