STA 142A Statistical Learning I - Discussion 3 - Python

Some Packages

  • numpy: The fundamental package for scientific computing with Python
  • scikit-learn: Machine Learning in Python
  • matploblib: Visualization with Python
  • seaborn: Statistical data visualization
  • statsmodels: statistical models, hypothesis tests, and data exploration

Array vs. Numpy Array

a = [1,2,3]
a * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]
a + 3
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-4-d390b6b495e8> in <module>
----> 1 a + 3


TypeError: can only concatenate list (not "int") to list
import numpy as np
b = np.array([1,2,3])
b * 3
array([3, 6, 9])
b + 3
array([4, 5, 6])
c = np.array([4,5,6])
b * c
array([ 4, 10, 18])
# 1d array
b.shape
(3,)
# 2d array
d1 = b.reshape(-1,1)
d1
array([[1],
       [2],
       [3]])
d1.shape
(3, 1)
d2 = b.reshape(1,-1)
d2
array([[1, 2, 3]])
d2.shape
(1, 3)
# matrix multiplication
d1 @ d2
array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])
d2 @ d1
array([[14]])

Random Sampling

https://numpy.org/doc/stable/reference/random/index.html

from numpy.random import default_rng
rng = default_rng()
vals = rng.standard_normal(10)
vals
array([-0.77521747, -0.84113655,  0.61197545, -0.13576524,  0.70373505,
        0.61064352, -0.01271956, -0.29483544, -2.47983637,  1.08520542])

Linear Regression

import matplotlib.pyplot as plt
import seaborn as sns
 
n = 100
b0, b1 = 5, 3

# Generate X and epsilon
e = rng.normal(0,1,n)
X = rng.uniform(0,1,n)
Y = b0+b1*X+e
X = X.reshape(-1,1)


# Seaborn
sns.regplot(X, Y, ci=None, scatter_kws={'color':'r', 's':9})
plt.show()

png

# Sklearn
from sklearn import linear_model
from sklearn.metrics import mean_squared_error, r2_score

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(X, Y)

# Make predictions using the testing set
Y_pred = regr.predict(X)

# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print('Mean squared error: %.2f'
      % mean_squared_error(Y, Y_pred))
# The coefficient of determination: 1 is perfect prediction
print('Coefficient of determination: %.2f'
      % r2_score(Y, Y_pred))

# Plot outputs
plt.scatter(X, Y, color='black')
plt.plot(X, Y_pred, color='blue', linewidth=3)

plt.show()
Coefficients: 
 [2.40581767]
Mean squared error: 1.04
Coefficient of determination: 0.32

png