🥳🎉 Two IBM certificates and some geospatial data

celebration
geospatial data
Author

Adam Cseresznye

Published

August 5, 2024

I’m happy to share that I’ve recently completed both IBM’s Data Analyst and Data Science Professional Certificates within the past month. The course content was well-structured, and I learned a great deal from these programs. For instance, I’ve always been interested in learning SQL, and this was the perfect chance to start exploring it.

If you’re curious about these certificates, you can find more information through the links provided below. But my learning journey doesn’t stop here—I’m planning to tackle most of the courses listed in the Data Science learning path on Coursera, so there’s more to come.

While I’m at it, I wanted to introduce you to a neat library called Folium, which is fantastic for working with geospatial data. I came across Folium during the capstone project of the Data Science Specialization, where we had a fun task of predicting and visualizing the success of SpaceX rocket launches.

In this post, I’ll briefly share what I’ve learned about this library. I hope you’ll find it useful too. Let’s dive in!

Code
import folium
import pandas as pd
import os
from folium import plugins

We’ll be utilizing the dataset made available by https://open.toronto.ca/. This dataset includes the locations of bicycles installed on sidewalks and boulevards across the City of Toronto, wherever there’s a requirement for public bicycle parking facilities. By the way, I discovered this dataset through the Awesome Public Datasets repository on GitHub. If you haven’t already, I recommend checking them out.

Code
# Let's read in the file

for file in os.listdir():
    if file.endswith(".csv"):
        toronto_df = pd.read_csv(file)

        print(f"{file} read in as pandas dataframe")
Street bicycle parking data - 4326.csv read in as pandas dataframe

Considering the original dataset has over 17,300 entries, we’ll keep things light by working with just 500 rows for now. It’s all for the sake of a demonstration, after all!

Code
toronto_df = toronto_df.sample(n=500)
toronto_df.head()
_id OBJECTID ID ADDRESSNUMBERTEXT ADDRESSSTREET FRONTINGSTREET SIDE FROMSTREET DIRECTION SITEID WARD BIA ASSETTYPE STATUS SDE_STATE_ID X Y LONGITUDE LATITUDE geometry
784 4481427 10424 BP-05283 15 Dundonald St NaN NaN Dundonald St NaN NaN 13.0 NaN Ring Temporarily Removed 0 NaN NaN NaN NaN {'type': 'MultiPoint', 'coordinates': [[-79.38...
3297 4483940 15253 BP-35603 49 Harbour Sq Queens Quay W South Harbour Sq West NaN 10.0 The Waterfront Ring Existing 0 NaN NaN NaN NaN {'type': 'MultiPoint', 'coordinates': [[-79.37...
13971 4494614 31121 BP-22492 200 Elizabeth St Elizabeth St West La Plante Ave West NaN 11.0 NaN Ring Existing 0 NaN NaN NaN NaN {'type': 'MultiPoint', 'coordinates': [[-79.38...
5139 4485782 17465 BP-40070 70 Peter St King St W North Peter St West NaN 10.0 Toronto Downtown West Ring Existing 0 NaN NaN NaN NaN {'type': 'MultiPoint', 'coordinates': [[-79.39...
7635 4488278 20375 BP-27153 39 Prince Arthur Ave Prince Arthur Ave South Bedford Rd East NaN 11.0 NaN Ring Existing 0 NaN NaN NaN NaN {'type': 'MultiPoint', 'coordinates': [[-79.39...

The geometry column holds the longitude and latitude information, but before we dive in, we need to extract the valuable details. No worries – we’ll make use of pandas’ str.extract for this task.

Code
pattern = r"(-?\d+\.\d+),\s*(-?\d+\.\d+)"

toronto_df_processed = toronto_df.assign(
    LONGITUDE=lambda df: df.geometry.str.extract(pattern)[0],
    LATITUDE=lambda df: df.geometry.str.extract(pattern)[1],
).loc[:, ["ASSETTYPE", "STATUS", "LONGITUDE", "LATITUDE"]]
toronto_df_processed.head()
ASSETTYPE STATUS LONGITUDE LATITUDE
784 Ring Temporarily Removed -79.38378423783222 43.6660359833018
3297 Ring Existing -79.3774934493851 43.6407633657936
13971 Ring Existing -79.386799735149 43.6589303889453
5139 Ring Existing -79.3926661761316 43.6460273003346
7635 Ring Existing -79.3973838724551 43.6693038734947

Creating the map and displaying it

Here’s an example of how to create a map without any overlaid data points.

Code
toronto_map = folium.Map(
    location=[43.651070, -79.347015], zoom_start=11, tiles="OpenStreetMap"
)
toronto_map
Make this Notebook Trusted to load map: File -> Trust Notebook
Figure 1: The City of Toronto

Superimposing bike locations on the map with FeatureGroup

After instantiating FeatureGroup, we can easily add the bike locations using the add_child method. It is really easy!

Code
# let's start with a clean copy of the map of Toronto
toronto_map = folium.Map(
    location=[43.651070, -79.347015], zoom_start=11, tiles="OpenStreetMap"
)

# instantiate a feature group 
bike_stations = folium.map.FeatureGroup()

# loop through the bike stations
for lat, long in zip(toronto_df_processed.LATITUDE, toronto_df_processed.LONGITUDE):
    bike_stations.add_child(
        folium.features.CircleMarker(
            [lat, long],
            radius=5,
            color="red",
            fill=True,
            fill_color="yellow",
            fill_opacity=1,
        )
    )
# add bike stations to the map
toronto_map.add_child(bike_stations)
Make this Notebook Trusted to load map: File -> Trust Notebook
Figure 2: The City of Toronto with available bike locations

Adding pop-up text with relevant information

We can also enhance this by adding a pop-up box that displays custom text of our choice.

Code
# let's start with a clean copy of the map of Toronto
toronto_map = folium.Map(
    location=[43.651070, -79.347015], zoom_start=11, tiles="OpenStreetMap"
)

# instantiate a feature group 
bike_stations = folium.map.FeatureGroup()

# loop through the bike stations
for lat, long in zip(toronto_df_processed.LATITUDE, toronto_df_processed.LONGITUDE):
    bike_stations.add_child(
        folium.features.CircleMarker(
            [lat, long],
            radius=5,
            color="grey",
            fill=True,
            fill_color="white",
            fill_opacity=1,
        )
    )

# add pop-up text to each marker on the map
latitudes = list(toronto_df_processed.LATITUDE)
longitudes = list(toronto_df_processed.LONGITUDE)
labels = list(toronto_df_processed.STATUS)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(toronto_map)

# add bike stations to map
toronto_map.add_child(bike_stations)
Make this Notebook Trusted to load map: File -> Trust Notebook
Figure 3: The City of Toronto with available bike locations

Clustering the rental locations with MarkerCluster

And the best part, which happens to be my favorite, is that we can also integrate a MarkerCluster. This comes in handy when we’re dealing with numerous data points clustered closely together on the map. With a MarkerCluster, you get to see their combined values instead of each one individually. It’s a fantastic feature!

Code
# let's start with a clean copy of the map of Toronto
toronto_map = folium.Map(
    location=[43.651070, -79.347015], zoom_start=11, tiles="OpenStreetMap"
)

# instantiate a mark cluster object 
bike_stations_cluster = plugins.MarkerCluster().add_to(toronto_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(
    toronto_df_processed.LATITUDE,
    toronto_df_processed.LONGITUDE,
    toronto_df_processed.STATUS,
):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(bike_stations_cluster)

# display map
toronto_map
Make this Notebook Trusted to load map: File -> Trust Notebook
Figure 4: Aggregated Bike Locations in the City of Toronto

That’s a wrap! I hope these examples have been helpful. Feel free to use these techniques in your next data science or geospatial project. Until next time, happy exploring!