STA 142A Statistical Learning I - Discussion 3 - Python

Some Packages

numpy: The fundamental package for scientific computing with Python
scikit-learn: Machine Learning in Python
matploblib: Visualization with Python
seaborn: Statistical data visualization
statsmodels: statistical models, hypothesis tests, and data exploration

Array vs. Numpy Array

a = [1,2,3]

a * 3

[1, 2, 3, 1, 2, 3, 1, 2, 3]

a + 3

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-4-d390b6b495e8> in <module>
----> 1 a + 3

TypeError: can only concatenate list (not "int") to list

import numpy as np
b = np.array([1,2,3])

b * 3

array([3, 6, 9])

b + 3

array([4, 5, 6])

c = np.array([4,5,6])

b * c

array([ 4, 10, 18])

# 1d array
b.shape

(3,)

# 2d array
d1 = b.reshape(-1,1)
d1

array([[1],
       [2],
       [3]])

d1.shape

(3, 1)

d2 = b.reshape(1,-1)
d2

array([[1, 2, 3]])

d2.shape

(1, 3)

# matrix multiplication
d1 @ d2

array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])

d2 @ d1

array([[14]])

Random Sampling

https://numpy.org/doc/stable/reference/random/index.html

from numpy.random import default_rng
rng = default_rng()
vals = rng.standard_normal(10)

vals

array([-0.77521747, -0.84113655,  0.61197545, -0.13576524,  0.70373505,
        0.61064352, -0.01271956, -0.29483544, -2.47983637,  1.08520542])

Linear Regression

import matplotlib.pyplot as plt
import seaborn as sns
 
n = 100
b0, b1 = 5, 3

# Generate X and epsilon
e = rng.normal(0,1,n)
X = rng.uniform(0,1,n)
Y = b0+b1*X+e
X = X.reshape(-1,1)


# Seaborn
sns.regplot(X, Y, ci=None, scatter_kws={'color':'r', 's':9})
plt.show()

png

# Sklearn
from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(X, Y)

# Make predictions using the testing set
Y_pred = regr.predict(X)

# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(Y, Y_pred))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(Y, Y_pred))

# Plot outputs
plt.scatter(X, Y, color='black')
plt.plot(X, Y_pred, color='blue', linewidth=3)

plt.show()

Coefficients: 
 [2.40581767]
Mean squared error: 1.04
Coefficient of determination: 0.32

png