Part 1. Introduction to GLOBE Data Visualization#

This lesson shows how to investigate GLOBE mosquito & land cover data, calculate statistics, and create charts & maps at global, state, and county levels.

Introduction to the Tools#

  • Python (a popular programming knowledge for analyzing data, automating tasks, creating software, training machine learning models, and more) to analyze GLOBE Mosquito Habitat Mapper and Land Cover data.

  • Google Colab (a free, cloud-based coding environment) to run Python on the browser. No setup or downloads necessary!

To run the code:

Each block of code is called a cell. To run a cell, hover over it and click the arrow in the top left of the cell, or click inside of the cell and press Shift + Enter.

Note: When you run a block of code for the first time, Google Colab will say Warning: This notebook was not authored by Google. Please click Run Anyway.

# Import libraries
import pandas as pd                           # For working with data
pd.set_option("display.max_columns", None)    # Lets us see all columns of the data instead of just a preview
import geopandas as gpd                       # For working with spatial data
import numpy as np                            # For working with numbers
from datetime import date                     # For working with dates
import matplotlib.pyplot as plt               # For making graphs
from matplotlib.colors import to_rgb          # For getting colors
import branca.colormap as cm                  # For creating color scales
import seaborn as sns                         # For more options to load colors and create plots
import folium                                 # For creating interactive maps
from PIL import Image                         # For getting and displaying images from links
import requests                               # For getting information from links
from io import BytesIO                        # For working with types of input and output

In the code above, we imported Python libraries, which expand the options we have with our code. Each library comes with many functions designed to accomplish specific tasks, like loading a dataset, performing calculations, making a chart, and more.

# This is a comment (added using # at the start), used by programmers to explain their code
# Comments do not impact how the code runs

# Print today's date
today = date.today()
print(f"Today's date is {today}.")

A great feature of Google Colab is that you are able to write Python code and see the output directly on your browser!

Mosquitoes Around the GLOBE#

Let’s load the data directly from the link (everything stays in Google Colab on the browser; nothing gets downloaded to your computer)! This is a processed version of the data that has the columns renamed and some errors corrected.

The data comes from GLOBE Observer accessed through the API. You can see all the steps taken to process and clean the data in chapter 1 of the EMERGE curriculum.

mosquito = gpd.read_file('https://github.com/geo-di-lab/emerge-lessons/raw/refs/heads/main/docs/data/globe_mosquito.zip')
mosquito.head()

See a list of the columns:

mosquito.info()

How many rows are in the dataset?

len(mosquito)

There were 43,012 citizen science contributions from 2018 to 2024. Now, let’s see the number of countries where people submitted data.

len(mosquito['CountryCode'].unique())

Let’s see the types of the habitats (water sources) the citizen scientists recorded.

# General water source types
mosquito['WaterSourceType'].value_counts()

These are the general types of water sources that citizen scientists reported to NASA GLOBE. It looks like most of the data collected were about artificial containers. Let’s see some of the more specific types in the other column:

# More specific water source types
mosquito['WaterSource'].value_counts()

Let’s make a pie chart using the more general types, WaterSourceType

# Here are some options for color palettes
display(sns.color_palette(palette='rainbow'))
display(sns.color_palette(palette='CMRmap'))
display(sns.color_palette(palette='BrBG'))
display(sns.color_palette(palette='cubehelix'))
display(sns.color_palette(palette='Set2'))
display(sns.color_palette(palette='Set3'))
display(sns.color_palette(palette='tab20'))
# Pie chart of water types
types = mosquito[['SiteId', 'WaterSourceType']].groupby('WaterSourceType', as_index=False).count()

# Create pie chart
plt.figure(figsize=(5, 5))
patches, texts = plt.pie(colors = sns.color_palette('Set2'), # Enter the name of the color palette you want to use
                         x = types['SiteId'])
plt.title("GLOBE Mosquito Sightings: Water Source Types (General)")
plt.legend(patches, types['WaterSourceType'],
           loc = 'center left', bbox_to_anchor=(1, 0.5), frameon=False)
plt.show()

What do these sources look like? Show 10 of these photos below:

# Get 10 photo URLs from the dataset
rows_with_photos = mosquito.dropna(subset=['WaterSourcePhotoUrls']) \
    .loc[mosquito['WaterSourcePhotoUrls'] != 'rejected'].reset_index().sample(n=10, random_state=1)
url_list = rows_with_photos['WaterSourcePhotoUrls'].str.split('; ').str[0].tolist()
source_list = rows_with_photos['WaterSource'].tolist()

# Plot all of the photos
plt.figure(figsize=(20, 6))

for i, (url, title) in enumerate(zip(url_list, source_list)):
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))

    # Create plot with 2 rows, 5 columns
    plt.subplot(2, 5, i + 1)
    plt.imshow(img)
    plt.title(title)
    plt.axis('off')

plt.tight_layout()
plt.show()

What is the average larvae count by country?

mosquito_avg = mosquito.groupby('CountryCode')['LarvaeCountProcessed'].mean()
mosquito_avg

Let’s make a map showing the larvae count by country. The country boundaries (generalized) are from Esri, Garmin, and U.S. Central Intelligence Agency (The World Factbook). The boundaries are generalized to allow data processing and visualizations to load faster. The ISO alpha-3 codes come from the World Countries layer from Esri, Garmin, U.S. Central Intelligence Agency (The World Factbook), and International Organization for Standardization (ISO).

# Load country boundaries
countries = gpd.read_file('https://github.com/geo-di-lab/emerge-lessons/raw/refs/heads/main/docs/data/world_countries_general.geojson').to_crs(epsg=4326)

# Add mosquito data for each country
mosquito_avg = countries.merge(mosquito_avg, left_on='iso3', right_on='CountryCode', how='left')
fig, ax = plt.subplots(figsize = (10, 4))

# Create plot of average larvae count by country
mosquito_avg.plot(column = 'LarvaeCountProcessed', cmap = 'viridis',
                     legend = True, vmin = 0, vmax = 50, ax = ax,
                     missing_kwds = {'color': 'lightgrey'})
plt.title('GLOBE Mosquito Sightings: Average Larvae Count')
ax.axis('off')
plt.show()

Now, we’ll make an interactive map showing total GLOBE observations by country.

# Get total GLOBE observations by country
mosquito_obs = mosquito.groupby('CountryCode').size() \
                       .reset_index(name='GLOBE_Observations')
mosquito_obs = countries.merge(mosquito_obs, left_on='iso3', right_on='CountryCode', how='left').dropna(subset=['GLOBE_Observations'])
# Load a color scale, starting at 1 and ending at 100
# Blue indicates 100 or more GLOBE observations, while green/yellow indicate closer to 1 observation
colors = cm.linear.YlGnBu_03.scale(1, 100)
colors
map = folium.Map(location=[0, 0], zoom_start=3, tiles="CartoDB positron")

# Create interactive map of GLOBE observations by country
folium.GeoJson(
    geo_data = mosquito_obs.to_json(),
    data = mosquito_obs,
    key_on = "feature.properties.name",
    tooltip = folium.features.GeoJsonTooltip(
        fields = ['name', 'GLOBE_Observations'],
        aliases = ['Country:', 'Observations:']
    ),
    style_function = lambda feature: {
        "fillColor": colors(feature['properties']['GLOBE_Observations']),
        'fillOpacity': 0.8,
        'color': 'grey',
        'weight': 1
    }
).add_to(map)

display(map)

Land Cover Around the GLOBE#

Like the mosquito data, we’ll load in the GLOBE land cover data directly from the link.

land_cover = gpd.read_file('https://github.com/geo-di-lab/emerge-lessons/raw/refs/heads/main/docs/data/globe_land_cover.zip')
land_cover.head()
len(land_cover)

A helpful part of the land cover dataset is the MUC classifications. MUC (Modified UNESCO Classification) is a classification system with different land use types that helps us understand habitats around the world.

What are the most common MUC codes by country?

# Find the most common MUC for each country
muc = land_cover.groupby('CountryCode')['MucDescription'] \
    .apply(lambda x: x.value_counts().idxmax() if not x.value_counts().empty else None).reset_index(name='MucDescription')

# Add a column for the number of GLOBE observations with the MUC code
muc['Count'] = land_cover.groupby('CountryCode')['MucDescription'] \
    .apply(lambda x: x.value_counts().max()).values

# Add a column for the total number of GLOBE observations
muc['GLOBE_Observations'] = land_cover.groupby('CountryCode').size().values

muc
# Add this data to the country boundaries
muc = countries.merge(muc, left_on='iso3', right_on='CountryCode', how='left')

# Create general categories
muc_list = ['Barren', 'Closed Forest', 'Cultivated', 'Herbaceous', 'Open Water', 'Trees', 'Urban', 'Wetlands', 'Woodland']

# Simplify some names into the categories listed above
for muc_code in muc_list:
    muc.loc[muc['MucDescription'].str.contains(muc_code, na=False), 'MucDescriptionShort'] = muc_code
fig, ax = plt.subplots(figsize = (11, 5))

# Create a map of the most common MUC codes by country
muc.plot(column = 'MucDescriptionShort', cmap = 'viridis',
                     legend = True, ax = ax,
                     missing_kwds = {'color': 'lightgrey', 'label': 'No Data'},
                     legend_kwds={'loc': 'lower left', 'frameon': False})
plt.title('GLOBE Land Cover: Most Common MUC Codes')
plt.show()

Notice the gray areas. There were no GLOBE observations recorded for these countries, so we’ll remove these from the dataset to make it easier to visualize the data.

muc = muc.dropna(subset=['GLOBE_Observations'])

A powerful feature of the Land Cover dataset is that users submit pictures of the area. Let’s view some these images.

# Get the first observation where all photos were submitted
entry = land_cover.dropna(subset=['DownwardPhotoUrl', 'EastPhotoUrl', 'NorthPhotoUrl', 'SouthPhotoUrl', 'WestPhotoUrl', 'UpwardPhotoUrl',
                            'Feature1PhotoUrl', 'Feature2PhotoUrl', 'Feature3PhotoUrl', 'Feature4PhotoUrl']).head(1)

url_list = []
col_list = []

for col in entry.columns:
  if 'Url' in col:
    print(f'{col}: {entry[col].values[0]}')
    url_list.append(entry[col].values[0])
    col_list.append(col)

display(entry)
# Plot all of the images
plt.figure(figsize=(20, 6))

for i, (url, title) in enumerate(zip(url_list, col_list)):
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))

    # Create plot with 2 rows, 5 columns
    plt.subplot(2, 5, i + 1)
    plt.imshow(img)
    plt.title(title)
    plt.axis('off')

plt.tight_layout()
plt.show()

Challenge: Create an interactive map for land cover#

Like the mosquito observation map above, let’s make a map of the number of GLOBE Observations for land cover. In addition to the name of the country and the number of GLOBE Observations, add the most common MUC code (MucDescriptionShort) to the text box that pops up when you hover over each country.

Use the following columns:

  • name for the name of the country (pop-up)

  • MucDescriptionShort for the most common MUC code (pop-up)

  • GLOBE_Observations for the number of GLOBE observations recorded (pop-up and color of each country)

In summary, each country’s color on the map will be based on GLOBE_Observations and the pop-up should include the country name, number of GLOBE observations, and the most common MUC.

Feel free to reference the code for making an interactive map of the mosquito observations (which we’ve pasted below). If you get stuck, click Open for Answer to see one way to create this map!

map = folium.Map(location=[0, 0], zoom_start=3, tiles="CartoDB positron")

# Create interactive map of GLOBE observations by country
folium.GeoJson(
    geo_data = mosquito_obs.to_json(),
    data = mosquito_obs,
    key_on = "feature.properties.name",
    tooltip = folium.features.GeoJsonTooltip(
        fields = ['name', 'GLOBE_Observations'],
        aliases = ['Country:', 'Observations:']
    ),
    style_function = lambda feature: {
        "fillColor": colors(feature['properties']['GLOBE_Observations']),
        'fillOpacity': 0.8,
        'color': 'grey',
        'weight': 1
    }
).add_to(map)

display(map)
map = folium.Map(location=[0, 0], zoom_start=3, tiles="CartoDB positron")

# Replace ? with your code below
folium.GeoJson(
    geo_data = ?,
    data = ?,
    key_on = "feature.properties.name",
    tooltip = folium.features.GeoJsonTooltip(
        fields = ?,
        aliases = ?
    ),
# Replace ? with your code above
    style_function = lambda feature: {
        "fillColor": colors(feature['properties']['GLOBE_Observations']),
        'fillOpacity': 0.8,
        'color': 'grey',
        'weight': 1
    }
).add_to(map)

display(map)

Answer Below#

map = folium.Map(location=[0, 0], zoom_start=3, tiles="CartoDB positron")

# Replace ? with your code below
folium.GeoJson(
    geo_data = muc.to_json(),
    data = muc,
    key_on = "feature.properties.name",
    tooltip = folium.features.GeoJsonTooltip(
        fields = ['name', 'GLOBE_Observations', 'MucDescriptionShort'],
        aliases = ['Country:', 'Observations:', 'Most common MUC:']
    ),
# Replace ? with your code above
    style_function = lambda feature: {
        "fillColor": colors(feature['properties']['GLOBE_Observations']),
        'fillOpacity': 0.8,
        'color': 'grey',
        'weight': 1
    }
).add_to(map)

display(map)

Mosquitoes in Your State#

Now, we’ll view the GLOBE data for your state and later on, your county. Enter the name of your state below.

The boundary files from the U.S. Census. We will use the same mosquito data as the first part of this code notebook (which comes from GLOBE Observer accessed through the API from chapter 1 of our digital textbook).

state_name = "Your State Name"

# For example,
# state_name = "Florida"
# Load county boundaries
counties = gpd.read_file('https://github.com/geo-di-lab/emerge-lessons/raw/refs/heads/main/docs/data/us_counties.zip').to_crs('EPSG:4326')

# Filter to all counties in state
state_counties = counties.loc[counties['STATE_NAME'] == state_name]

# Get state boundary by combining the counties together
state = state_counties[['geometry']].dissolve()

state.plot()
# Create empty map zoomed to the state
map = folium.Map(tiles="Cartodb dark_matter")

# Get all GLOBE mosquito data within the state
mosquito_state = gpd.sjoin(mosquito, state, how="inner", predicate='intersects') \
          .drop(columns=['index_right']) \
          .reset_index(drop=True)

# Add county boundaries to the map
folium.GeoJson(
    state_counties.to_json(),
    name='County Boundaries',
    tooltip=folium.features.GeoJsonTooltip(fields=['NAMELSAD'], aliases=['County:']),
    style_function=lambda feature: {
        'fillColor': 'transparent',
        'color': 'grey',
        'weight': 1
    }
).add_to(map)

# Add each point as a green circle on the map
for idx, row in mosquito_state.iterrows():
    popup_content = f"<b>Date:</b> {row['MeasuredDate']}<br><b>Water Source:</b> {row['WaterSourceType']}<br><b>Longitude:</b> {row['MeasurementLongitude']}<br><b>Latitude:</b> {row['MeasurementLatitude']}"
    folium.CircleMarker(
        location=[row.geometry.y, row.geometry.x],
        popup=folium.Popup(popup_content, max_width=300),
        radius=6,
        color='black',
        weight=1,
        fillColor='lightgreen',
        fillOpacity=0.5
    ).add_to(map)

# Add an option to hide the county boundaries
folium.LayerControl().add_to(map)

# Zoom to the state
minx, miny, maxx, maxy = state.bounds.values[0]
bounds = [[miny, minx], [maxy, maxx]]
map.fit_bounds(bounds)

# Display the map
display(map)

See if there are any points in your county! If not, find a nearby county that has points. Enter the name of your chosen county below:

county_name = "Your County Name"

# Make sure it ends in the word "County"

# For example,
# county_name = "Broward County"

Mosquitoes in Your County#

Get the specific outline of your county:

county = state_counties.loc[counties['NAMELSAD'] == county_name]
county.plot()

Now, we use those boundaries to filter for all GLOBE points within your county, from 2018 to 2024.

data_county = mosquito.sjoin(county, how="inner", predicate="within")

num_total = len(data_county)

print(f"There were {num_total} GLOBE points recorded within {county_name} from 2018-2024 by community scientists.")

num_eliminated = len(data_county[data_county['BreedingGroundEliminated'] == 'true'])
print(f"Of those points, {num_eliminated} ({round(num_eliminated * 100 / num_total)}%) were successfully mitigated by the community scientists, which reduces the risk for mosquitoes inhabiting that location in the future.")

If there were no GLOBE points in your county, please choose a nearby county that has at least one point. You can check by looking at the map of your state above.

data_county.head()

How many mosquito observations have been submitted to GLOBE Observer each year in your county? We’ll make a bar plot to figure this out.

# Add a new column for year
data_county['MeasuredYear'] = data_county['MeasuredAt'].dt.year

# Make histogram of mosquito sightings each year
years = data_county[['SiteId', 'MeasuredYear']].groupby('MeasuredYear', as_index=False).count()
plt.bar(years['MeasuredYear'], years['SiteId'])
plt.title("Mosquito Sightings by Year", loc = 'left')
plt.title(county_name, loc = 'right')
plt.show()

Let’s make a pie chart of the water source types (both general and specific) where mosquitoes were reported in this county.

# Get counts of each water source (general)
types = data_county[['SiteId', 'WaterSourceType']].groupby('WaterSourceType', as_index=False).count()

# Create pie chart
plt.figure(figsize=(5, 5))
patches, texts = plt.pie(x = types['SiteId'],
                         colors = sns.color_palette('Set2'))
plt.title(f"GLOBE Mosquito Sightings in {county_name}: Water Source Types (General)")
plt.legend(patches, types['WaterSourceType'],
           loc = 'center left', bbox_to_anchor=(1, 0.5), frameon=False)
plt.show()
# Get counts of each water source (specific)
types = data_county[['SiteId', 'WaterSource']].groupby('WaterSource', as_index=False).count()

# Create pie chart
plt.figure(figsize=(5, 5))
patches, texts = plt.pie(x = types['SiteId'],
                         colors = sns.color_palette('Set2'))
plt.title(f"GLOBE Mosquito Sightings in {county_name}: Water Source Types (Specific)")
plt.legend(patches, types['WaterSource'],
           loc = 'center left', bbox_to_anchor=(1, 0.5), frameon=False)
plt.show()