Gen AI Developer Week 1 — Day 3

3 min readDec 30, 2024

Welcome to the “Intro to Pandas” session in the GenAI Developer Series! Pandas is a powerful Python library for data analysis and manipulation, making it easy to clean, explore, and transform datasets. In this session, you’ll learn the basics of working with DataFrames and Series, importing and cleaning data, and performing operations like filtering and grouping. Whether you’re new to data science or enhancing your skills, Pandas will empower you to handle data efficiently. Let’s dive in and unlock its potential!

Let’s first install pandas using the below pip command installation.

# Install pandas using the below command
pip install pandas

Now lets import pandas and get started for this day!

import pandas as pd

Key Concepts

Creating and Viewing DataFrames — Create a DataFrame from a dictionary

# Created a dictonary data below
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "Score": [85, 90, 88]
}

df = pd.DataFrame(data) #DataFrame is like a table with rows and columns
print(df)

# View data from Dataframe
print(df.head())  # First 5 rows
print(df.tail())  # Last 5 rows
print(df.info())  # Overview of the DataFrame
print(df.describe())  # Summary statistics

Now we’ll take an example of Titanic Dataset and perform tasks on it.

Analyze a Dataset

Download a sample dataset, like titanic.csvKaggle Titanic Dataset
Load the dataset and display its first 5 rows.

# Read CSV and print the first 5 rows of the file
data = pd.read_csv('titanic.csv') # Read the CSV File
print(data.head()) # First 5 rows

Filter Data

Filter rows where passengers are older than 30 and survived.

# Filter the data who survived over 30 years
over_30 = data[(data["Age"] > 30) & (data["Survived"] == 1)]
print(over_30)

Add a New Column

Add a column to the Titanic dataset indicating whether the fare was “High” (above 30) or “Low” (30 or below).

# Add a new Column if the fare is above 30 add High else Low 
data["Fare_Category"] = ["High" if fare > 30 else "Low" for fare in data["Fare"]]
print(data[["Fare", "Fare_Category"]].head())

Group and Aggregate

Group passengers by their embarkation point and find the average fare for each group.

# Group and aggregate
avg_fare_by_embark = data.groupby("Embarked")["Fare"].mean()
print(avg_fare_by_embark)

Save Filtered Data

Save the rows of passengers who paid a fare greater than 50 to a new CSV file.

# Saving the filtered data
high_fare = data[data["Fare"] > 50]
high_fare.to_csv("high_fare_passengers.csv", index=False)

Happy Learning!😊