Understanding the Fundamentals of Dataframe Basics in Python
- Chathura Madhusanka
- Dec 11, 2025
- 3 min read
Dataframes are one of the most powerful tools in Python for handling and analyzing data. Whether you are a beginner or someone looking to refresh your knowledge, understanding the basics of dataframes is essential for working efficiently with data. This post breaks down the core concepts of dataframes, how to create and manipulate them, and practical examples to help you get started.
What Is a Dataframe?
A dataframe is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). Think of it as a spreadsheet or a SQL table in Python. It allows you to store data in rows and columns, making it easier to analyze and manipulate.
The most popular library for working with dataframes in Python is Pandas. It provides a wide range of functions to create, modify, and analyze dataframes.
Creating a Dataframe
You can create a dataframe in several ways, but the most common method is by using a dictionary or a list of lists.
Here is a simple example using a dictionary:
```python
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
```
This code creates a dataframe with three columns: Name, Age, and City. Each key in the dictionary becomes a column, and the values become the rows.
Accessing Data in a Dataframe
Once you have a dataframe, you often need to access specific data. Here are some common ways to do that:
Access a column by its name:
```python
ages = df['Age']
```
Access multiple columns:
```python
subset = df[['Name', 'City']]
```
Access rows by position using `.iloc`:
```python
first_row = df.iloc[0]
```
Access rows by label using `.loc`:
```python
row = df.loc[0]
```
Adding and Removing Columns
You can easily add a new column to a dataframe by assigning a list or a series to a new column name:
```python
df['Salary'] = [70000, 80000, 90000]
```
To remove a column, use the `drop` method:
```python
df = df.drop('Salary', axis=1)
```
The `axis=1` parameter tells Pandas to drop a column (axis=0 would drop rows).
Filtering Data
Filtering rows based on conditions is a common task. For example, to get all rows where Age is greater than 28:
```python
filtered_df = df[df['Age'] > 28]
```
You can combine multiple conditions using `&` (and) or `|` (or):
```python
filtered_df = df[(df['Age'] > 28) & (df['City'] == 'Chicago')]
```
Handling Missing Data
Real-world data often contains missing values. Pandas provides methods to detect and handle these:
Check for missing values:
```python
df.isnull()
```
Drop rows with missing values:
```python
df = df.dropna()
```
Fill missing values with a specific value:
```python
df = df.fillna(0)
```
Sorting Data
Sorting dataframes by one or more columns helps organize data:
```python
df_sorted = df.sort_values(by='Age')
```
To sort by multiple columns:
```python
df_sorted = df.sort_values(by=['City', 'Age'])
```
Practical Example: Analyzing Sales Data
Imagine you have sales data for a small store:
```python
sales_data = {
'Product': ['Apples', 'Bananas', 'Cherries', 'Dates'],
'Quantity': [10, 15, 7, 20],
'Price': [1.2, 0.5, 2.5, 3.0]
}
sales_df = pd.DataFrame(sales_data)
```
You can calculate the total sales for each product by creating a new column:
```python
sales_df['Total'] = sales_df['Quantity'] * sales_df['Price']
print(sales_df)
```
This will add a Total column showing the revenue per product.
To find products with sales over $20:
```python
high_sales = sales_df[sales_df['Total'] > 20]
print(high_sales)
```
This example shows how dataframes make it easy to perform calculations and filter data.
Summary
Dataframes are essential for data analysis in Python. They provide a simple way to organize, access, and manipulate data. By mastering dataframe basics such as creation, data access, filtering, and sorting, you can handle a wide range of data tasks efficiently.


Comments