Gen AI Developer Week 1 — Day 3
Welcome to the “Intro to Pandas” session in the GenAI Developer Series! Pandas is a powerful Python library for data analysis and manipulation, making it easy to clean, explore, and transform datasets. In this session, you’ll learn the basics of working with DataFrames and Series, importing and cleaning data, and performing operations like filtering and grouping. Whether you’re new to data science or enhancing your skills, Pandas will empower you to handle data efficiently. Let’s dive in and unlock its potential!
Let’s first install pandas using the below pip command installation.
# Install pandas using the below command
pip install pandas
Now lets import pandas and get started for this day!
import pandas as pd
Key Concepts
Creating and Viewing DataFrames — Create a DataFrame from a dictionary
# Created a dictonary data below
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"Score": [85, 90, 88]
}
df = pd.DataFrame(data) #DataFrame is like a table with rows and columns
print(df)
# View data from Dataframe
print(df.head()) # First 5 rows
print(df.tail()) # Last 5 rows
print(df.info()) # Overview of the DataFrame
print(df.describe()) # Summary statistics
Now we’ll take an example of Titanic Dataset and perform tasks on it.
Analyze a Dataset
- Download a sample dataset, like
titanic.csv
Kaggle Titanic Dataset - Load the dataset and display its first 5 rows.
# Read CSV and print the first 5 rows of the file
data = pd.read_csv('titanic.csv') # Read the CSV File
print(data.head()) # First 5 rows
Filter Data
Filter rows where passengers are older than 30 and survived.
# Filter the data who survived over 30 years
over_30 = data[(data["Age"] > 30) & (data["Survived"] == 1)]
print(over_30)
Add a New Column
Add a column to the Titanic dataset indicating whether the fare was “High” (above 30) or “Low” (30 or below).
# Add a new Column if the fare is above 30 add High else Low
data["Fare_Category"] = ["High" if fare > 30 else "Low" for fare in data["Fare"]]
print(data[["Fare", "Fare_Category"]].head())
Group and Aggregate
Group passengers by their embarkation point and find the average fare for each group.
# Group and aggregate
avg_fare_by_embark = data.groupby("Embarked")["Fare"].mean()
print(avg_fare_by_embark)
Save Filtered Data
Save the rows of passengers who paid a fare greater than 50 to a new CSV file.
# Saving the filtered data
high_fare = data[data["Fare"] > 50]
high_fare.to_csv("high_fare_passengers.csv", index=False)
Happy Learning!😊