Gen AI Developer Week 2 — Day 2

3 min readJust now

Linear regression is a simple yet powerful method in machine learning and statistics. Many applications that use data are based on predictions and insights made possible by modeling the connection between variables that are dependent and independent. As part of my trip through Generative AI Developer Week, we discuss the fundamental ideas, real-world applications, and mathematical underpinnings of linear regression in this piece.

Let’s recap what is Linear Regression?
Linear Regression predicts a target y based on a linear combination of input features x1,x2,…,xn.

We’re going to train a Linear Regression Model with the Diabetes data.

Load Dataset & Create a DataFrame

from sklearn.datasets import load_diabetes # Get Diabetes Dataset
import pandas as pd

data = load_diabetes()

# Create a dataframe with pandas
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

print(df.head())

Split dataset to Training and Testing Data using Train-Test Split

# Train Test Data Split
from sklearn.model_selection import train_test_split

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Output of Train Test Split

Train a Linear Regression Model

# Preparing a Model
from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X_train, y_train)

print("Coeff: ", model.coef_)
print("Intercept: ", model.intercept_)

Coefficient: The dependent variable’s change for every unit change in the independent variable is indicated by the line’s slope.
Intercept: The value of the dependent variable at the point where the line crosses the y-axis and the independent variable is zero.

Make Predictions

# Predict on the test data
y_pred = model.predict(X_test)

print("Predictions:", y_pred[:5])
print("Actual values:", y_test.values[:5])

Evaluate the Model

from sklearn.metrics import mean_squared_error, r2_score

# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R^2 Score:", r2)

Mean Squared Error (MSE):
calculates the mean squared variation between the expected and actual values. Better model fit is indicated by a lower MSE.
R-squared:
shows the percentage of the dependent variable’s variance that the model can account for, A better fit is indicated by a higher R-squared, which ranges from 0 to 1.

Task — Plot the predicted vs. actual values for the test set.

import matplotlib.pyplot as plt

# Plot predicted vs actual values
plt.scatter(y_test, y_pred, alpha=0.7)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Predicted vs. Actual")
plt.show()

Deliverables for the day:
Linear Regression model trained on the Diabetes dataset.
MSE and R2R²R2-score of the model.
A scatter plot of predicted vs. actual values.

Happy Learning!😊.. For any questions or support, feel free to message me on LinkedIn.

Gen AI Developer Week 2 — Day 2

Written by Sai Chinmay Tripurari

No responses yet