Mosquito & Land Cover Stats#
This lesson shows how to investigate the GLOBE data, calculate statistics, and create charts & maps.
import pandas as pd
pd.set_option("display.max_columns", None)
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import folium
Mosquito#
Let’s load the data directly from the link (no need to download anything to your computer).
mosquito = gpd.read_file('https://github.com/geo-di-lab/emerge-lessons/raw/refs/heads/main/docs/data/globe_mosquito.zip')
mosquito.head()
| CountryCode | CountryName | Elevation | AbdomenCloseupPhotoUrls | BreedingGroundEliminated | Comments | DataSource | ExtraData | Genus | GlobeTeams | LarvaFullBodyPhotoUrls | LarvaeCount | LastIdentifyStage | LocationAccuracyM | LocationMethod | MeasuredAt | MeasurementElevation | MeasurementLatitude | MeasurementLongitude | MosquitoAdults | MosquitoEggs | MosquitoHabitatMapperId | MosquitoPupae | Species | Userid | WaterSource | WaterSourcePhotoUrls | WaterSourceType | OrganizationId | OrganizationName | Protocol | SiteId | SiteName | MeasuredDate | LarvaeCountProcessed | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BRA | Brazil | 6.3 | None | false | None | GLOBE Observer App | LarvaeVisibleNo | None | [COLUNSLZ] | None | 0 | None | 13 | automatic | 2024-12-31 17:16:00 | 0 | -2.5617 | -44.2657 | None | None | 46287 | false | None | 137422629 | ovitrap | https://data.globe.gov/system/photos/2024/12/3... | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 371514 | 23MNT816168 | 2024-12-31 | 0.0 | POINT (-44.26597 -2.56197) |
| 1 | BRA | Brazil | 6.3 | None | false | None | GLOBE Observer App | LarvaeVisibleNo | None | [COLUNSLZ] | None | 0 | None | 13 | automatic | 2024-12-31 17:20:00 | 0 | -2.5617 | -44.2657 | None | None | 46290 | false | None | 137422629 | ovitrap | https://data.globe.gov/system/photos/2024/12/3... | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 371514 | 23MNT816168 | 2024-12-31 | 0.0 | POINT (-44.26597 -2.56197) |
| 2 | BRA | Brazil | 7.4 | None | true | None | GLOBE Observer App | LarvaeVisibleNo | None | [COLUNSLZ] | None | 0 | None | 51 | automatic | 2024-12-31 22:32:00 | 0 | -2.5163 | -44.3023 | None | None | 46482 | false | None | 137420190 | cement, metal or plastic tank | None | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 372864 | 23MNT775218 | 2024-12-31 | 0.0 | POINT (-44.30288 -2.51676) |
| 3 | BRA | Brazil | 20.6 | None | true | None | GLOBE Observer App | LarvaeVisibleNo | None | [COLUNSLZ] | None | 0 | None | 66 | automatic | 2024-12-31 00:05:00 | 0 | -2.8639 | -44.0549 | None | None | 46203 | false | None | 137419937 | can or bottle | None | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 373085 | 23MPS050834 | 2024-12-31 | 0.0 | POINT (-44.05526 -2.86396) |
| 4 | BRA | Brazil | 20.6 | None | true | None | GLOBE Observer App | LarvaeVisibleNo | None | [COLUNSLZ] | None | 0 | None | 28 | automatic | 2024-12-31 00:23:00 | 0 | -2.8639 | -44.0550 | None | None | 46223 | false | None | 137419937 | lake | None | still: lake/pond/swamp | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 373085 | 23MPS050834 | 2024-12-31 | 0.0 | POINT (-44.05526 -2.86396) |
See the list of columns:
mosquito.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 43012 entries, 0 to 43011
Data columns (total 36 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CountryCode 42928 non-null object
1 CountryName 42928 non-null object
2 Elevation 43012 non-null object
3 AbdomenCloseupPhotoUrls 874 non-null object
4 BreedingGroundEliminated 42951 non-null object
5 Comments 4002 non-null object
6 DataSource 43012 non-null object
7 ExtraData 12448 non-null object
8 Genus 4364 non-null object
9 GlobeTeams 16259 non-null object
10 LarvaFullBodyPhotoUrls 8632 non-null object
11 LarvaeCount 24691 non-null object
12 LastIdentifyStage 29911 non-null object
13 LocationAccuracyM 13814 non-null object
14 LocationMethod 18205 non-null object
15 MeasuredAt 43012 non-null datetime64[ms]
16 MeasurementElevation 42995 non-null object
17 MeasurementLatitude 42995 non-null float64
18 MeasurementLongitude 42995 non-null float64
19 MosquitoAdults 16852 non-null object
20 MosquitoEggs 16859 non-null object
21 MosquitoHabitatMapperId 43012 non-null object
22 MosquitoPupae 41210 non-null object
23 Species 1155 non-null object
24 Userid 43012 non-null object
25 WaterSource 43012 non-null object
26 WaterSourcePhotoUrls 34302 non-null object
27 WaterSourceType 43012 non-null object
28 OrganizationId 42928 non-null object
29 OrganizationName 42928 non-null object
30 Protocol 43012 non-null object
31 SiteId 43012 non-null object
32 SiteName 43012 non-null object
33 MeasuredDate 43012 non-null object
34 LarvaeCountProcessed 24688 non-null float64
35 geometry 43012 non-null geometry
dtypes: datetime64[ms](1), float64(3), geometry(1), object(31)
memory usage: 11.8+ MB
How many rows are in the dataset?
len(mosquito)
43012
There were 43,012 citizen science contributions from 2018 to 2024. Now, let’s see the number of countries where people submitted data.
len(mosquito['CountryCode'].unique())
95
Let’s see the types of the habitats (water sources) the citizen scientists recorded.
# Broader water source types
mosquito['WaterSourceType'].value_counts()
WaterSourceType
container: artificial 33167
still: lake/pond/swamp 6277
container: natural 2202
flowing: still water found next to river or stream 1366
Name: count, dtype: int64
These are the general types of water sources that citizen scientists reported to NASA. It looks like most data were collected about artificial containers. Let’s see some of the more specific types in the other column:
# More specific water source types
mosquito['WaterSource'].value_counts()
WaterSource
cement, metal or plastic tank 7528
dish or pot 4102
well or cistern 2790
jar 2399
fountain or bird bath 2350
ovitrap 2243
adult mosquito trap 2073
pond 2029
other 1915
can or bottle 1888
ditch 1886
tire 1885
animal trough or water bowl 1177
puddle or still water next to a creek, stream or river 993
flower or plant pot/tray 932
trash container 882
plant clumps (bamboo etc) 768
puddle, vehhicle or animal tracks 644
tree holes 571
public works - culvert, bridge, road 570
discarded: other 511
puddle, vehicle or animal tracks 475
swamp or wetland 467
plant husk (areca, coconut etc) 453
lake 415
rain gutter or other architectural feature 277
estuary 148
pool 128
old car or boat 123
reservoir 112
grill or outdoor appliance 108
refrigerator drainage 69
animal shell (tortoise, mollusk etc) 65
bay or ocean 36
Name: count, dtype: int64
Let’s make a pie chart using the broader column, WaterSourceType
# Here are some options for color palettes
display(sns.color_palette(palette='Set2'))
display(sns.color_palette(palette='twilight_shifted'))
display(sns.color_palette(palette='tab20'))
# Pie chart of water types
types = mosquito[['SiteId', 'WaterSourceType']].groupby('WaterSourceType', as_index=False).count()
plt.figure(figsize=(5, 5))
patches, texts = plt.pie(x = types['SiteId'],
colors = sns.color_palette('Set2'))
plt.title("GLOBE Mosquito Sightings: Water Source Types (General)")
plt.legend(patches, types['WaterSourceType'],
loc = 'center left', bbox_to_anchor=(1, 0.5), frameon=False)
plt.show()
What is the average larvae count by country?
mosquito_avg = mosquito.groupby('CountryCode')['LarvaeCountProcessed'].mean()
mosquito_avg
CountryCode
ARE 5.000000
ARG 116.108108
AUS 2.500000
BEL NaN
BEN 38.598198
...
UKR NaN
URY 0.043478
USA 667.935961
VNM 22.686567
ZAF 17.800000
Name: LarvaeCountProcessed, Length: 94, dtype: float64
Let’s make a map showing the larvae count by country. The country boundaries (generalized) are from Esri, Garmin, and U.S. Central Intelligence Agency (The World Factbook). The boundaries are generalized to allow data processing and visualizations to load faster. The ISO alpha-3 codes come from the World Countries layer from Esri, Garmin, U.S. Central Intelligence Agency (The World Factbook), and International Organization for Standardization (ISO).
countries = gpd.read_file('https://github.com/geo-di-lab/emerge-lessons/raw/refs/heads/main/docs/data/world_countries_general.geojson').to_crs(epsg=4326)
mosquito_avg = countries.merge(mosquito_avg, left_on='iso3', right_on='CountryCode', how='left')
fig, ax = plt.subplots(figsize = (10, 4))
mosquito_avg.plot(column = 'LarvaeCountProcessed', cmap = 'viridis',
legend = True, vmin = 0, vmax = 50, ax = ax,
missing_kwds = {'color': 'lightgrey'})
plt.title('GLOBE Mosquito Sightings: Average Larvae Count')
ax.axis('off')
plt.show()
Now, we’ll make an interactive map showing total GLOBE observations by country.
mosquito_obs = mosquito.groupby('CountryCode').size() \
.reset_index(name='GLOBE_Observations')
mosquito_obs = countries.merge(mosquito_obs, left_on='iso3', right_on='CountryCode', how='left')
map = folium.Map(location=[0, 0], zoom_start=3, tiles="CartoDB positron")
# Create the map with a color scale for the number of observations submitted to GLOBE
folium.Choropleth(
geo_data=mosquito_obs.to_json(),
name="Choropleth",
data=mosquito_obs,
columns=['name', 'GLOBE_Observations'],
key_on="feature.properties.name",
fill_color="YlGnBu",
fill_opacity=0.7,
bins=[1, 50, 100, 500, 1000, 5000, 10000, 20000],
legend_name="Number of GLOBE Observations (2018-2024)",
).add_to(map)
# Add pop-up when you hover over the area
folium.GeoJson(
geo_data=mosquito_obs.to_json(),
data=mosquito_obs,
key_on="feature.properties.name",
tooltip=folium.features.GeoJsonTooltip(fields=['name', 'GLOBE_Observations'], aliases=['Country:', 'Observations:']),
style_function=lambda feature: {'color': 'white', 'weight': 1}
).add_to(map)
display(map)