Introduction to GLOBE Data#
We will examine two GLOBE datasets: Mosquito Habitat Mapper and Land Cover.
You can view an interactive dashboard of this data.
You can run the following code using Google Colab, which runs on your browser (no installations required).
First, we’ll load the Python packages, which gives us more options with our code.
import pandas as pd # For working with data
pd.set_option("display.max_columns", None) # Lets us see all columns of the data instead of just a preview
import geopandas as gpd # For working with spatial data
import numpy as np # For working with numbers
import matplotlib.pyplot as plt # For making graphs
from datetime import date # For formatting dates
from PIL import Image # For getting and displaying images from links
import requests # For getting information from links
from io import BytesIO # For working with types of input and output
end_date = "2024-12-31"
Mosquito Habitat Mapper#
Let’s use the GLOBE API, which allows us to get data directly without needing to download anything.
data = gpd.read_file(f"https://api.globe.gov/search/v1/measurement/?protocols=mosquito_habitat_mapper&datefield=measuredDate&startdate=2018-01-01&enddate={end_date}&geojson=TRUE&sample=FALSE")
If you get an error NameError: name 'gpd' is not defined, go to the top of this notebook and click the arrow next to the first code block starting with import geopandas as gpd. This will install the packages needed for the rest of the code.
View the first 10 rows of the data, which are the most recently collected entries submitted to the GLOBE Observer App.
data.head(10)
| countryCode | countryName | elevation | mosquitohabitatmapperAbdomenCloseupPhotoUrls | mosquitohabitatmapperBreedingGroundEliminated | mosquitohabitatmapperComments | mosquitohabitatmapperDataSource | mosquitohabitatmapperExtraData | mosquitohabitatmapperGenus | mosquitohabitatmapperGlobeTeams | mosquitohabitatmapperLarvaFullBodyPhotoUrls | mosquitohabitatmapperLarvaeCount | mosquitohabitatmapperLastIdentifyStage | mosquitohabitatmapperLocationAccuracyM | mosquitohabitatmapperLocationMethod | mosquitohabitatmapperMeasuredAt | mosquitohabitatmapperMeasurementElevation | mosquitohabitatmapperMeasurementLatitude | mosquitohabitatmapperMeasurementLongitude | mosquitohabitatmapperMosquitoAdults | mosquitohabitatmapperMosquitoEggCount | mosquitohabitatmapperMosquitoEggs | mosquitohabitatmapperMosquitoHabitatMapperId | mosquitohabitatmapperMosquitoPupae | mosquitohabitatmapperSpecies | mosquitohabitatmapperUserid | mosquitohabitatmapperWaterSource | mosquitohabitatmapperWaterSourcePhotoUrls | mosquitohabitatmapperWaterSourceType | organizationId | organizationName | protocol | siteId | siteName | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BRA | Brazil | 6.3 | null | false | null | GLOBE Observer App | LarvaeVisibleNo | null | [COLUNSLZ] | null | 0 | null | 13 | automatic | 2024-12-31 17:16:00 | 0 | -2.5617 | -44.2657 | null | null | null | 46287 | false | null | 137422629 | ovitrap | https://data.globe.gov/system/photos/2024/12/3... | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 371514 | 23MNT816168 | POINT (-44.26597 -2.56197) |
| 1 | BRA | Brazil | 6.3 | null | false | null | GLOBE Observer App | LarvaeVisibleNo | null | [COLUNSLZ] | null | 0 | null | 13 | automatic | 2024-12-31 17:20:00 | 0 | -2.5617 | -44.2657 | null | null | null | 46290 | false | null | 137422629 | ovitrap | https://data.globe.gov/system/photos/2024/12/3... | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 371514 | 23MNT816168 | POINT (-44.26597 -2.56197) |
| 2 | BRA | Brazil | 7.4 | null | true | null | GLOBE Observer App | LarvaeVisibleNo | null | [COLUNSLZ] | null | 0 | null | 51 | automatic | 2024-12-31 22:32:00 | 0 | -2.5163 | -44.3023 | null | null | null | 46482 | false | null | 137420190 | cement, metal or plastic tank | null | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 372864 | 23MNT775218 | POINT (-44.30288 -2.51676) |
| 3 | BRA | Brazil | 20.6 | null | true | null | GLOBE Observer App | LarvaeVisibleNo | null | [COLUNSLZ] | null | 0 | null | 66 | automatic | 2024-12-31 00:05:00 | 0 | -2.8639 | -44.0549 | null | null | null | 46203 | false | null | 137419937 | can or bottle | null | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 373085 | 23MPS050834 | POINT (-44.05526 -2.86396) |
| 4 | BRA | Brazil | 20.6 | null | true | null | GLOBE Observer App | LarvaeVisibleNo | null | [COLUNSLZ] | null | 0 | null | 28 | automatic | 2024-12-31 00:23:00 | 0 | -2.8639 | -44.055 | null | null | null | 46223 | false | null | 137419937 | lake | null | still: lake/pond/swamp | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 373085 | 23MPS050834 | POINT (-44.05526 -2.86396) |
| 5 | BRA | Brazil | 20.6 | null | true | null | GLOBE Observer App | LarvaeVisibleNo | null | [COLUNSLZ] | null | 0 | null | 19 | automatic | 2024-12-31 00:26:00 | 0 | -2.8639 | -44.055 | null | null | null | 46230 | false | null | 137419937 | plant husk (areca, coconut etc) | null | container: natural | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 373085 | 23MPS050834 | POINT (-44.05526 -2.86396) |
| 6 | BRA | Brazil | 20.6 | null | true | null | GLOBE Observer App | LarvaeVisibleNo | null | [COLUNSLZ] | null | 0 | null | 98 | automatic | 2024-12-31 00:31:00 | 0 | -2.8638 | -44.0548 | null | null | null | 46234 | false | null | 137419937 | lake | null | still: lake/pond/swamp | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 373085 | 23MPS050834 | POINT (-44.05526 -2.86396) |
| 7 | BRA | Brazil | 20.6 | null | true | null | GLOBE Observer App | LarvaeVisibleNo | null | [COLUNSLZ] | null | 0 | null | 98 | automatic | 2024-12-31 00:42:00 | 0 | -2.8638 | -44.0552 | null | null | null | 46261 | false | null | 137419937 | plant clumps (bamboo etc) | null | container: natural | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 373085 | 23MPS050834 | POINT (-44.05526 -2.86396) |
| 8 | BRA | Brazil | 20.6 | null | true | null | GLOBE Observer App | LarvaeVisibleNo | null | [COLUNSLZ] | null | 0 | null | 100 | automatic | 2024-12-31 00:46:00 | 0 | -2.8639 | -44.0552 | null | null | null | 46264 | false | null | 137419937 | pond | null | still: lake/pond/swamp | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 373085 | 23MPS050834 | POINT (-44.05526 -2.86396) |
| 9 | BRA | Brazil | 20.6 | null | true | null | GLOBE Observer App | LarvaeVisibleNo | null | [COLUNSLZ] | null | 0 | null | 98 | automatic | 2024-12-31 00:48:00 | 0 | -2.8638 | -44.0552 | null | null | null | 46266 | false | null | 137419937 | adult mosquito trap | null | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 373085 | 23MPS050834 | POINT (-44.05526 -2.86396) |
Let’s use the info() function to learn more about what the dataset contains.
data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 43345 entries, 0 to 43344
Data columns (total 35 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 countryCode 43259 non-null object
1 countryName 43259 non-null object
2 elevation 43345 non-null object
3 mosquitohabitatmapperAbdomenCloseupPhotoUrls 43345 non-null object
4 mosquitohabitatmapperBreedingGroundEliminated 43345 non-null object
5 mosquitohabitatmapperComments 43345 non-null object
6 mosquitohabitatmapperDataSource 43345 non-null object
7 mosquitohabitatmapperExtraData 43345 non-null object
8 mosquitohabitatmapperGenus 43345 non-null object
9 mosquitohabitatmapperGlobeTeams 16353 non-null object
10 mosquitohabitatmapperLarvaFullBodyPhotoUrls 43345 non-null object
11 mosquitohabitatmapperLarvaeCount 43345 non-null object
12 mosquitohabitatmapperLastIdentifyStage 43345 non-null object
13 mosquitohabitatmapperLocationAccuracyM 25004 non-null object
14 mosquitohabitatmapperLocationMethod 25004 non-null object
15 mosquitohabitatmapperMeasuredAt 43345 non-null datetime64[ms]
16 mosquitohabitatmapperMeasurementElevation 43328 non-null object
17 mosquitohabitatmapperMeasurementLatitude 43328 non-null object
18 mosquitohabitatmapperMeasurementLongitude 43328 non-null object
19 mosquitohabitatmapperMosquitoAdults 43345 non-null object
20 mosquitohabitatmapperMosquitoEggCount 43345 non-null object
21 mosquitohabitatmapperMosquitoEggs 43345 non-null object
22 mosquitohabitatmapperMosquitoHabitatMapperId 43345 non-null object
23 mosquitohabitatmapperMosquitoPupae 43345 non-null object
24 mosquitohabitatmapperSpecies 43345 non-null object
25 mosquitohabitatmapperUserid 43345 non-null object
26 mosquitohabitatmapperWaterSource 43345 non-null object
27 mosquitohabitatmapperWaterSourcePhotoUrls 43345 non-null object
28 mosquitohabitatmapperWaterSourceType 43345 non-null object
29 organizationId 43345 non-null object
30 organizationName 43259 non-null object
31 protocol 43345 non-null object
32 siteId 43345 non-null object
33 siteName 43345 non-null object
34 geometry 43345 non-null geometry
dtypes: datetime64[ms](1), geometry(1), object(33)
memory usage: 11.6+ MB
We see information about each column, how many non-null rows there are (non-null means not missing), and the type (a dtype of “object” means it is text). We see that we also have a datetime column, which would be great for analyzing the data over time. The geometry column gives us information about where the citizen scientist collected the data about mosquitoes.
However, there are some columns that are currently stored as an “object” when we want then to be stored as a “float” (decimal) or “int” (whole number). Also, for some rows, if the value is missing, there is the word “null,” which can be confused as an actual value. We’ll replace this with NAN, which Python reads as empty rather than an object.
Note: The column ‘mosquitohabitatmapperLarvaeCount’ stores numbers in addition to ranges like 1-25. However, we need to simplify these ranges to a single number to make them easier to store in the dataset and more consistent. So, if there is a range, we will replace it with the lower value of the range.
# Let's remove the "mosquitomapper" in front of all the column names to make it easier to see the names
new_column_names = data.columns.str.replace("mosquitohabitatmapper", "")
# Make the first letter of each column name capitalized (for consistency) except for geometry
new_column_names = [name[0].upper() + name[1:] if name != 'geometry' else name for name in new_column_names]
data.columns = new_column_names
data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 43345 entries, 0 to 43344
Data columns (total 35 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CountryCode 43259 non-null object
1 CountryName 43259 non-null object
2 Elevation 43345 non-null object
3 AbdomenCloseupPhotoUrls 43345 non-null object
4 BreedingGroundEliminated 43345 non-null object
5 Comments 43345 non-null object
6 DataSource 43345 non-null object
7 ExtraData 43345 non-null object
8 Genus 43345 non-null object
9 GlobeTeams 16353 non-null object
10 LarvaFullBodyPhotoUrls 43345 non-null object
11 LarvaeCount 43345 non-null object
12 LastIdentifyStage 43345 non-null object
13 LocationAccuracyM 25004 non-null object
14 LocationMethod 25004 non-null object
15 MeasuredAt 43345 non-null datetime64[ms]
16 MeasurementElevation 43328 non-null object
17 MeasurementLatitude 43328 non-null object
18 MeasurementLongitude 43328 non-null object
19 MosquitoAdults 43345 non-null object
20 MosquitoEggCount 43345 non-null object
21 MosquitoEggs 43345 non-null object
22 MosquitoHabitatMapperId 43345 non-null object
23 MosquitoPupae 43345 non-null object
24 Species 43345 non-null object
25 Userid 43345 non-null object
26 WaterSource 43345 non-null object
27 WaterSourcePhotoUrls 43345 non-null object
28 WaterSourceType 43345 non-null object
29 OrganizationId 43345 non-null object
30 OrganizationName 43259 non-null object
31 Protocol 43345 non-null object
32 SiteId 43345 non-null object
33 SiteName 43345 non-null object
34 geometry 43345 non-null geometry
dtypes: datetime64[ms](1), geometry(1), object(33)
memory usage: 11.6+ MB
# Add new column for date, not including the time
data['MeasuredDate'] = data['MeasuredAt'].dt.date
# The LarvaeCount column has some grouped entries like '1-25' that we will replace with the center value
data['LarvaeCountProcessed'] = data['LarvaeCount'].replace({
'1-25': 13,
'26-50': 38,
'51-100': 76,
'more than 100': 100,
'null': np.nan
})
# If the LarvaeCount is very long (more than 10 characters), then it is likely an error, so we'll replace with NAN
data.loc[data['LarvaeCountProcessed'].str.len() > 10, 'LarvaeCountProcessed'] = np.nan
numeric_cols = ['LarvaeCountProcessed', 'MeasurementLatitude', 'MeasurementLongitude']
data[numeric_cols] = data[numeric_cols].apply(pd.to_numeric)
# Drop the MosquitoEggCount column because it is all null
data = data.drop(columns=['MosquitoEggCount'])
# Replace the word 'null' with NAN to ensure it is stored as an empty value instead of a word
data = data.replace('null', np.nan)
data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 43345 entries, 0 to 43344
Data columns (total 36 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CountryCode 43259 non-null object
1 CountryName 43259 non-null object
2 Elevation 43345 non-null object
3 AbdomenCloseupPhotoUrls 886 non-null object
4 BreedingGroundEliminated 43284 non-null object
5 Comments 4040 non-null object
6 DataSource 43345 non-null object
7 ExtraData 12521 non-null object
8 Genus 4402 non-null object
9 GlobeTeams 16353 non-null object
10 LarvaFullBodyPhotoUrls 8697 non-null object
11 LarvaeCount 24897 non-null object
12 LastIdentifyStage 30148 non-null object
13 LocationAccuracyM 13877 non-null object
14 LocationMethod 18313 non-null object
15 MeasuredAt 43345 non-null datetime64[ms]
16 MeasurementElevation 43328 non-null object
17 MeasurementLatitude 43328 non-null float64
18 MeasurementLongitude 43328 non-null float64
19 MosquitoAdults 16994 non-null object
20 MosquitoEggs 16999 non-null object
21 MosquitoHabitatMapperId 43345 non-null object
22 MosquitoPupae 41529 non-null object
23 Species 1170 non-null object
24 Userid 43345 non-null object
25 WaterSource 43345 non-null object
26 WaterSourcePhotoUrls 34556 non-null object
27 WaterSourceType 43345 non-null object
28 OrganizationId 43259 non-null object
29 OrganizationName 43259 non-null object
30 Protocol 43345 non-null object
31 SiteId 43345 non-null object
32 SiteName 43345 non-null object
33 geometry 43345 non-null geometry
34 MeasuredDate 43345 non-null object
35 LarvaeCountProcessed 24894 non-null float64
dtypes: datetime64[ms](1), float64(3), geometry(1), object(31)
memory usage: 11.9+ MB
The final data cleaning step we will perform for now is to remove points that have invalid coordinates. Sometimes, the latitude and longitude may be incorrectly reported, placing the point at the middle of the ocean. We want to remove any points that are not on land. To do that, let’s load a publicly-available file of country boundaries. The country boundaries (excluding Antarctica) were downloaded from ArcGIS Data and Maps.
countries = gpd.read_file('https://github.com/geo-di-lab/emerge-lessons/raw/refs/heads/main/docs/data/world_countries.zip')[['COUNTRY', 'geometry']].to_crs(4326)
countries.plot()
<Axes: >
We will use a spatial join (sjoin) to get all of the data that “intersects” the countries layer, meaning it is either within or on the boundary of a country.
data = gpd.sjoin(data, countries, how="inner", predicate='intersects') \
.drop(columns=['index_right', 'COUNTRY']) \
.reset_index(drop=True)
data
| CountryCode | CountryName | Elevation | AbdomenCloseupPhotoUrls | BreedingGroundEliminated | Comments | DataSource | ExtraData | Genus | GlobeTeams | LarvaFullBodyPhotoUrls | LarvaeCount | LastIdentifyStage | LocationAccuracyM | LocationMethod | MeasuredAt | MeasurementElevation | MeasurementLatitude | MeasurementLongitude | MosquitoAdults | MosquitoEggs | MosquitoHabitatMapperId | MosquitoPupae | Species | Userid | WaterSource | WaterSourcePhotoUrls | WaterSourceType | OrganizationId | OrganizationName | Protocol | SiteId | SiteName | geometry | MeasuredDate | LarvaeCountProcessed | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BRA | Brazil | 6.3 | NaN | false | NaN | GLOBE Observer App | LarvaeVisibleNo | NaN | [COLUNSLZ] | NaN | 0 | NaN | 13 | automatic | 2024-12-31 17:16:00 | 0 | -2.561700 | -44.265700 | NaN | NaN | 46287 | false | NaN | 137422629 | ovitrap | https://data.globe.gov/system/photos/2024/12/3... | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 371514 | 23MNT816168 | POINT (-44.26597 -2.56197) | 2024-12-31 | 0.0 |
| 1 | BRA | Brazil | 6.3 | NaN | false | NaN | GLOBE Observer App | LarvaeVisibleNo | NaN | [COLUNSLZ] | NaN | 0 | NaN | 13 | automatic | 2024-12-31 17:20:00 | 0 | -2.561700 | -44.265700 | NaN | NaN | 46290 | false | NaN | 137422629 | ovitrap | https://data.globe.gov/system/photos/2024/12/3... | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 371514 | 23MNT816168 | POINT (-44.26597 -2.56197) | 2024-12-31 | 0.0 |
| 2 | BRA | Brazil | 7.4 | NaN | true | NaN | GLOBE Observer App | LarvaeVisibleNo | NaN | [COLUNSLZ] | NaN | 0 | NaN | 51 | automatic | 2024-12-31 22:32:00 | 0 | -2.516300 | -44.302300 | NaN | NaN | 46482 | false | NaN | 137420190 | cement, metal or plastic tank | NaN | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 372864 | 23MNT775218 | POINT (-44.30288 -2.51676) | 2024-12-31 | 0.0 |
| 3 | BRA | Brazil | 20.6 | NaN | true | NaN | GLOBE Observer App | LarvaeVisibleNo | NaN | [COLUNSLZ] | NaN | 0 | NaN | 66 | automatic | 2024-12-31 00:05:00 | 0 | -2.863900 | -44.054900 | NaN | NaN | 46203 | false | NaN | 137419937 | can or bottle | NaN | container: artificial | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 373085 | 23MPS050834 | POINT (-44.05526 -2.86396) | 2024-12-31 | 0.0 |
| 4 | BRA | Brazil | 20.6 | NaN | true | NaN | GLOBE Observer App | LarvaeVisibleNo | NaN | [COLUNSLZ] | NaN | 0 | NaN | 28 | automatic | 2024-12-31 00:23:00 | 0 | -2.863900 | -44.055000 | NaN | NaN | 46223 | false | NaN | 137419937 | lake | NaN | still: lake/pond/swamp | 17459532 | Brazil Citizen Science | mosquito_habitat_mapper | 373085 | 23MPS050834 | POINT (-44.05526 -2.86396) | 2024-12-31 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 43007 | ISR | Israel | 714.0 | NaN | false | بركة اصطناعية | GLOBE Observer App | NaN | NaN | None | NaN | 1-25 | identify-siphon-pecten | None | None | 2018-01-08 12:14:00 | 714 | 33.010806 | 35.331531 | false | false | 1340 | true | NaN | 31782720 | pond | https://data.globe.gov/system/photos/2018/01/0... | still: lake/pond/swamp | 2567205 | Horfish Elementary B School | mosquito_habitat_mapper | 104220 | 36SYB178549 | POINT (35.33153 33.01081) | 2018-01-08 | 13.0 |
| 43008 | ISR | Israel | 755.0 | NaN | false | بركة اصطناعية من مياة الامطار | GLOBE Observer App | NaN | NaN | None | https://data.globe.gov/system/photos/2018/01/0... | 1-25 | identify-analyze-siphon | None | None | 2018-01-07 10:53:00 | 755 | 33.013490 | 35.332672 | false | false | 1339 | true | NaN | 31782720 | pond | https://data.globe.gov/system/photos/2018/01/0... | still: lake/pond/swamp | 2567205 | Horfish Elementary B School | mosquito_habitat_mapper | 100667 | 36SYB179552 | POINT (35.33267 33.01349) | 2018-01-07 | 13.0 |
| 43009 | KOR | South Korea | 287.5 | NaN | false | NaN | GLOBE Observer App | NaN | NaN | None | NaN | NaN | NaN | None | None | 2018-01-07 03:00:00 | 287.5 | 44.962525 | -93.161031 | NaN | NaN | 1338 | NaN | NaN | 36916117 | fountain or bird bath | https://data.globe.gov/system/photos/2018/01/0... | container: artificial | 17479077 | Republic of Korea Citizen Science | mosquito_habitat_mapper | 104189 | 15TVK873788 | POINT (-93.16103 44.96252) | 2018-01-07 | NaN |
| 43010 | KOR | South Korea | 287.5 | NaN | false | NaN | GLOBE Observer App | NaN | NaN | None | NaN | 0 | identify-verify-larva | None | None | 2018-01-05 03:00:00 | 287.5 | 44.962525 | -93.161031 | false | false | 1337 | false | NaN | 36916117 | well or cistern | https://data.globe.gov/system/photos/2018/01/0... | container: artificial | 17479077 | Republic of Korea Citizen Science | mosquito_habitat_mapper | 104189 | 15TVK873788 | POINT (-93.16103 44.96252) | 2018-01-05 | 0.0 |
| 43011 | KOR | South Korea | 287.5 | NaN | true | NaN | GLOBE Observer App | NaN | NaN | None | NaN | NaN | NaN | None | None | 2018-01-03 03:00:00 | 287.5 | 44.962525 | -93.161031 | NaN | NaN | 1336 | NaN | NaN | 36916117 | trash container | https://data.globe.gov/system/photos/2018/01/0... | container: artificial | 17479077 | Republic of Korea Citizen Science | mosquito_habitat_mapper | 104189 | 15TVK873788 | POINT (-93.16103 44.96252) | 2018-01-03 | NaN |
43012 rows × 36 columns
A copy of this dataset is on GitHub, and we will use this in future chapters. You do not need to download the dataset to your computer, as we will load the data directly using the link.
Explore the Data#
Let’s make a simple graph showing the number of contributions submitted by citizen scientists over time!
data_by_day = data[['SiteId', 'MeasuredDate']].groupby(['MeasuredDate'], as_index=False).count()
plt.figure(figsize=(8,4))
plt.plot(data_by_day['MeasuredDate'], data_by_day['SiteId'])
plt.xlabel("Date")
plt.ylabel("Daily Contribution")
plt.title("Daily Mosquito Habitat Mapper Contributions Over Time (2018-2025)")
plt.show()
Let’s plot the number of contributions by country:
data_by_county = data[['SiteId', 'CountryName']].groupby(['CountryName'], as_index=False).count().sort_values(by='SiteId').tail(10)
plt.barh(data_by_county['CountryName'], data_by_county['SiteId'])
plt.xlabel("Total Contributions")
plt.ylabel("Country")
plt.title("Mosquito Mapper Contributions by Country (2018-2025)")
plt.show()
Land Cover#
Now, we’ll review the Land Cover dataset. In a similar way, we’ll get the data from the GLOBE API using the same date range from the Mosquito dataset.
data = gpd.read_file(f"https://api.globe.gov/search/v1/measurement/?protocols=land_covers&datefield=measuredDate&startdate=2018-01-01&enddate={end_date}&geojson=TRUE&sample=FALSE")
View a list of the columns in the dataset:
data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 52038 entries, 0 to 52037
Data columns (total 63 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 countryCode 49631 non-null object
1 countryName 49631 non-null object
2 elevation 52038 non-null object
3 landcoversDataSource 52038 non-null object
4 landcoversDownwardCaption 52038 non-null object
5 landcoversDownwardExtraData 52038 non-null object
6 landcoversDownwardPhotoUrl 52038 non-null object
7 landcoversDryGround 52038 non-null object
8 landcoversEastCaption 52038 non-null object
9 landcoversEastClassifications 52038 non-null object
10 landcoversEastExtraData 52038 non-null object
11 landcoversEastPhotoUrl 52038 non-null object
12 landcoversFeature1Caption 52038 non-null object
13 landcoversFeature1ExtraData 52038 non-null object
14 landcoversFeature1PhotoUrl 52038 non-null object
15 landcoversFeature2Caption 52038 non-null object
16 landcoversFeature2ExtraData 52038 non-null object
17 landcoversFeature2PhotoUrl 52038 non-null object
18 landcoversFeature3Caption 52038 non-null object
19 landcoversFeature3ExtraData 52038 non-null object
20 landcoversFeature3PhotoUrl 52038 non-null object
21 landcoversFeature4Caption 52038 non-null object
22 landcoversFeature4ExtraData 52038 non-null object
23 landcoversFeature4PhotoUrl 52038 non-null object
24 landcoversFieldNotes 52038 non-null object
25 landcoversGlobeTeams 23991 non-null object
26 landcoversLandCoverId 52038 non-null object
27 landcoversLeavesOnTrees 52038 non-null object
28 landcoversLocationAccuracyM 52038 non-null object
29 landcoversLocationMethod 52038 non-null object
30 landcoversMeasuredAt 52038 non-null datetime64[ms]
31 landcoversMeasurementElevation 43920 non-null object
32 landcoversMeasurementLatitude 43920 non-null object
33 landcoversMeasurementLongitude 43920 non-null object
34 landcoversMucCode 52038 non-null object
35 landcoversMucDescription 52038 non-null object
36 landcoversMucDetails 52038 non-null object
37 landcoversMuddy 52038 non-null object
38 landcoversNorthCaption 52038 non-null object
39 landcoversNorthClassifications 52038 non-null object
40 landcoversNorthExtraData 52038 non-null object
41 landcoversNorthPhotoUrl 52038 non-null object
42 landcoversRainingSnowing 52038 non-null object
43 landcoversSnowIce 52038 non-null object
44 landcoversSouthCaption 52038 non-null object
45 landcoversSouthClassifications 52038 non-null object
46 landcoversSouthExtraData 52038 non-null object
47 landcoversSouthPhotoUrl 52038 non-null object
48 landcoversStandingWater 52038 non-null object
49 landcoversUpwardCaption 52038 non-null object
50 landcoversUpwardExtraData 52038 non-null object
51 landcoversUpwardPhotoUrl 52038 non-null object
52 landcoversUserid 52038 non-null object
53 landcoversWestCaption 52038 non-null object
54 landcoversWestClassifications 52038 non-null object
55 landcoversWestExtraData 52038 non-null object
56 landcoversWestPhotoUrl 52038 non-null object
57 organizationId 52038 non-null object
58 organizationName 49632 non-null object
59 protocol 52038 non-null object
60 siteId 52038 non-null object
61 siteName 52038 non-null object
62 geometry 52038 non-null geometry
dtypes: datetime64[ms](1), geometry(1), object(61)
memory usage: 25.0+ MB
# Let's remove the "landcovers" in front of all the column names to make it easier to see the names
new_column_names = data.columns.str.replace("landcovers", "")
# Make the first letter of each column name capitalized (for consistency) except for geometry
new_column_names = [name[0].upper() + name[1:] if name != 'geometry' else name for name in new_column_names]
data.columns = new_column_names
data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 52038 entries, 0 to 52037
Data columns (total 63 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CountryCode 49631 non-null object
1 CountryName 49631 non-null object
2 Elevation 52038 non-null object
3 DataSource 52038 non-null object
4 DownwardCaption 52038 non-null object
5 DownwardExtraData 52038 non-null object
6 DownwardPhotoUrl 52038 non-null object
7 DryGround 52038 non-null object
8 EastCaption 52038 non-null object
9 EastClassifications 52038 non-null object
10 EastExtraData 52038 non-null object
11 EastPhotoUrl 52038 non-null object
12 Feature1Caption 52038 non-null object
13 Feature1ExtraData 52038 non-null object
14 Feature1PhotoUrl 52038 non-null object
15 Feature2Caption 52038 non-null object
16 Feature2ExtraData 52038 non-null object
17 Feature2PhotoUrl 52038 non-null object
18 Feature3Caption 52038 non-null object
19 Feature3ExtraData 52038 non-null object
20 Feature3PhotoUrl 52038 non-null object
21 Feature4Caption 52038 non-null object
22 Feature4ExtraData 52038 non-null object
23 Feature4PhotoUrl 52038 non-null object
24 FieldNotes 52038 non-null object
25 GlobeTeams 23991 non-null object
26 LandCoverId 52038 non-null object
27 LeavesOnTrees 52038 non-null object
28 LocationAccuracyM 52038 non-null object
29 LocationMethod 52038 non-null object
30 MeasuredAt 52038 non-null datetime64[ms]
31 MeasurementElevation 43920 non-null object
32 MeasurementLatitude 43920 non-null object
33 MeasurementLongitude 43920 non-null object
34 MucCode 52038 non-null object
35 MucDescription 52038 non-null object
36 MucDetails 52038 non-null object
37 Muddy 52038 non-null object
38 NorthCaption 52038 non-null object
39 NorthClassifications 52038 non-null object
40 NorthExtraData 52038 non-null object
41 NorthPhotoUrl 52038 non-null object
42 RainingSnowing 52038 non-null object
43 SnowIce 52038 non-null object
44 SouthCaption 52038 non-null object
45 SouthClassifications 52038 non-null object
46 SouthExtraData 52038 non-null object
47 SouthPhotoUrl 52038 non-null object
48 StandingWater 52038 non-null object
49 UpwardCaption 52038 non-null object
50 UpwardExtraData 52038 non-null object
51 UpwardPhotoUrl 52038 non-null object
52 Userid 52038 non-null object
53 WestCaption 52038 non-null object
54 WestClassifications 52038 non-null object
55 WestExtraData 52038 non-null object
56 WestPhotoUrl 52038 non-null object
57 OrganizationId 52038 non-null object
58 OrganizationName 49632 non-null object
59 Protocol 52038 non-null object
60 SiteId 52038 non-null object
61 SiteName 52038 non-null object
62 geometry 52038 non-null geometry
dtypes: datetime64[ms](1), geometry(1), object(61)
memory usage: 25.0+ MB
# Add new column for date
data['MeasuredDate'] = data['MeasuredAt'].dt.date
data.head(10)
| CountryCode | CountryName | Elevation | DataSource | DownwardCaption | DownwardExtraData | DownwardPhotoUrl | DryGround | EastCaption | EastClassifications | EastExtraData | EastPhotoUrl | Feature1Caption | Feature1ExtraData | Feature1PhotoUrl | Feature2Caption | Feature2ExtraData | Feature2PhotoUrl | Feature3Caption | Feature3ExtraData | Feature3PhotoUrl | Feature4Caption | Feature4ExtraData | Feature4PhotoUrl | FieldNotes | GlobeTeams | LandCoverId | LeavesOnTrees | LocationAccuracyM | LocationMethod | MeasuredAt | MeasurementElevation | MeasurementLatitude | MeasurementLongitude | MucCode | MucDescription | MucDetails | Muddy | NorthCaption | NorthClassifications | NorthExtraData | NorthPhotoUrl | RainingSnowing | SnowIce | SouthCaption | SouthClassifications | SouthExtraData | SouthPhotoUrl | StandingWater | UpwardCaption | UpwardExtraData | UpwardPhotoUrl | Userid | WestCaption | WestClassifications | WestExtraData | WestPhotoUrl | OrganizationId | OrganizationName | Protocol | SiteId | SiteName | geometry | MeasuredDate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ITA | Italy | 489.2 | GLOBE Observer App | null | null | https://data.globe.gov/system/photos/2024/12/3... | true | null | null | null | https://data.globe.gov/system/photos/2024/12/3... | Snag C. sativa, 40 cm, cl 2, #01 #04 #12 | ((compassData.heading: 182, compassData.horizo... | https://data.globe.gov/system/photos/2024/12/3... | Log 70 cm, C. sativa, cl 2, #04 #01 #12 | ((compassData.heading: 182, compassData.horizo... | https://data.globe.gov/system/photos/2024/12/3... | Stump of C. sativa 230 cm | ((compassData.heading: null, compassData.horiz... | https://data.globe.gov/system/photos/2024/12/3... | null | null | null | Old Coppice of Castanea sativa | [Conservazione Natura Universita Tuscia] | 78608 | false | 8 | automatic | 2024-12-31 15:07:00.000 | 492.4 | 42.1818 | 12.1825 | null | null | false | null | null | null | https://data.globe.gov/system/photos/2024/12/3... | false | false | null | null | null | https://data.globe.gov/system/photos/2024/12/3... | false | null | null | https://data.globe.gov/system/photos/2024/12/3... | 128342138 | null | null | null | https://data.globe.gov/system/photos/2024/12/3... | 17453129 | Italy Citizen Science | land_covers | 376869 | 33TTG673738 | POINT (12.18229 42.18175) | 2024-12-31 | |
| 1 | MDG | Madagascar | 1350.1 | GLOBE Observer App | null | null | https://data.globe.gov/system/photos/2024/12/3... | true | null | 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... | null | https://data.globe.gov/system/photos/2024/12/3... | null | null | null | null | null | null | null | null | null | null | null | null | Arbres plantés par l'équipe GLOBE avec la comm... | [Africa 2024 Regional Meeting, Coordinating Of... | 77695 | true | 10 | automatic | 2024-12-31 11:29:00.000 | 1340.6 | -18.7576 | 47.5615 | M01 | Trees, Closely Spaced, Evergreen - Needle Leaved | n | false | null | 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... | null | https://data.globe.gov/system/photos/2024/12/3... | false | false | null | 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... | null | https://data.globe.gov/system/photos/2024/12/3... | false | null | null | https://data.globe.gov/system/photos/2024/12/3... | 2538037 | null | 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... | null | https://data.globe.gov/system/photos/2024/12/3... | 6508873 | Madagascar GLOBE v-School | land_covers | 373647 | 38KQE700240 | POINT (47.56096 -18.75807) | 2024-12-31 |
| 2 | MDG | Madagascar | 1324.7 | GLOBE Observer App | null | null | https://data.globe.gov/system/photos/2024/12/3... | true | null | 60% MUC 93 [Urban, Roads and Parking] | null | https://data.globe.gov/system/photos/2024/12/3... | null | null | null | null | null | null | null | null | null | null | null | null | (none) | [Africa 2024 Regional Meeting, Coordinating Of... | 77691 | true | 10 | automatic | 2024-12-31 12:07:00.000 | 1324.8 | -18.7944 | 47.5799 | M93 | Urban, Roads and Parking | false | null | 60% MUC 93 [Urban, Roads and Parking] | null | https://data.globe.gov/system/photos/2024/12/3... | false | false | null | 60% MUC 93 [Urban, Roads and Parking] | null | https://data.globe.gov/system/photos/2024/12/3... | false | null | null | https://data.globe.gov/system/photos/2024/12/3... | 2538037 | null | 60% MUC 93 [Urban, Roads and Parking] | null | https://data.globe.gov/system/photos/2024/12/3... | 6508873 | Madagascar GLOBE v-School | land_covers | 373642 | 38KQE719199 | POINT (47.57953 -18.79484) | 2024-12-31 | |
| 3 | USA | United States | 182.2 | GLOBE Data Entry Site Definition | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | None | 77689 | null | null | null | 2024-12-31 16:12:03.111 | None | None | None | M4 | Herbaceous Vegetation | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | 52107 | Crestwood High School | land_covers | 373628 | Hillcrest Elementary Trail | POINT (-83.27705 42.3465) | 2024-12-31 |
| 4 | GRC | Greece | 3.0 | GLOBE Observer App | null | null | https://data.globe.gov/system/photos/2024/12/3... | true | null | 90% MUC 91 [Urban, Residential Property]; 10% ... | null | https://data.globe.gov/system/photos/2024/12/3... | null | null | null | null | null | null | null | null | null | null | null | null | (none) | [tinycore lab, tinycorelab] | 77683 | false | null | manual | 2024-12-31 11:45:00.000 | 3.2 | 37.939 | 23.697 | M91 | Urban, Residential Property | false | null | 90% MUC 91 [Urban, Residential Property]; 10% ... | null | https://data.globe.gov/system/photos/2024/12/3... | false | false | null | 20% MUC 91 [Urban, Residential Property]; 80% ... | null | https://data.globe.gov/system/photos/2024/12/3... | false | null | null | https://data.globe.gov/system/photos/2024/12/3... | 65209921 | null | 10% MUC 91 [Urban, Residential Property]; 90% ... | null | https://data.globe.gov/system/photos/2024/12/3... | 6508393 | Greece GLOBE v-School | land_covers | 373613 | 34SGH370024 | POINT (23.6969 37.9383) | 2024-12-31 | |
| 5 | GRC | Greece | 15.5 | GLOBE Observer App | null | null | https://data.globe.gov/system/photos/2024/12/3... | true | null | 40% MUC 94 [Urban, Other]; 60% MUC 43 [Herbace... | null | https://data.globe.gov/system/photos/2024/12/3... | Park | ((compassData.heading: 137, compassData.horizo... | https://data.globe.gov/system/photos/2024/12/3... | null | null | null | null | null | null | null | null | null | (none) | [tinycore lab, tinycorelab] | 77681 | false | null | manual | 2024-12-31 11:01:00.000 | 16.7 | 37.9374 | 23.7051 | M43 | Herbaceous/Grassland, Short Grass | true | null | 10% MUC 94 [Urban, Other]; 90% MUC 43 [Herbace... | null | https://data.globe.gov/system/photos/2024/12/3... | false | false | null | 40% MUC 94 [Urban, Other]; 60% MUC 43 [Herbace... | null | https://data.globe.gov/system/photos/2024/12/3... | false | null | null | https://data.globe.gov/system/photos/2024/12/3... | 65209921 | null | 10% MUC 94 [Urban, Other]; 90% MUC 43 [Herbace... | null | https://data.globe.gov/system/photos/2024/12/3... | 6508393 | Greece GLOBE v-School | land_covers | 373609 | 34SGH377023 | POINT (23.70482 37.93722) | 2024-12-31 | |
| 6 | MDG | Madagascar | 1303.5 | GLOBE Observer App | null | null | https://data.globe.gov/system/photos/2024/12/3... | true | null | null | null | https://data.globe.gov/system/photos/2024/12/3... | null | null | null | null | null | null | null | null | null | null | null | null | (none) | [Africa 2024 Regional Meeting, Coordinating Of... | 77692 | true | 7 | automatic | 2024-12-31 11:53:00.000 | 1337.6 | -18.7639 | 47.5584 | M01 | Trees, Closely Spaced, Evergreen - Needle Leaved | n | false | null | 50% MUC 01 (n) [Trees, Closely Spaced, Evergre... | null | https://data.globe.gov/system/photos/2024/12/3... | false | false | null | null | null | https://data.globe.gov/system/photos/2024/12/3... | false | null | null | https://data.globe.gov/system/photos/2024/12/3... | 2538037 | null | null | null | https://data.globe.gov/system/photos/2024/12/3... | 6508873 | Madagascar GLOBE v-School | land_covers | 373644 | 38KQE697233 | POINT (47.55822 -18.76443) | 2024-12-31 |
| 7 | ITA | Italy | 440.2 | GLOBE Observer App | null | null | https://data.globe.gov/system/photos/2024/12/3... | true | null | null | null | https://data.globe.gov/system/photos/2024/12/3... | snag F. sylvatica 140 cm, cl 1, #4 #12 | ((compassData.heading: 174, compassData.horizo... | https://data.globe.gov/system/photos/2024/12/3... | Log F. sylvatica90 cm, cl2, #1 #4 #12 #14 | ((compassData.heading: 174, compassData.horizo... | https://data.globe.gov/system/photos/2024/12/3... | null | null | null | null | null | null | Old high forest of Fagus sylvatica | [Conservazione Natura Universita Tuscia] | 78607 | false | 12 | automatic | 2024-12-31 14:07:00.000 | 447.3 | 42.1797 | 12.1754 | null | null | false | null | null | null | https://data.globe.gov/system/photos/2024/12/3... | false | false | null | null | null | https://data.globe.gov/system/photos/2024/12/3... | false | null | null | https://data.globe.gov/system/photos/2024/12/3... | 128342138 | null | null | null | https://data.globe.gov/system/photos/2024/12/3... | 17453129 | Italy Citizen Science | land_covers | 373629 | 33TTG667735 | POINT (12.17516 42.17887) | 2024-12-31 | |
| 8 | MDG | Madagascar | 1461.9 | GLOBE Observer App | null | null | https://data.globe.gov/system/photos/2024/12/3... | true | null | 80% MUC 94 [Urban, Other] | null | https://data.globe.gov/system/photos/2024/12/3... | null | null | null | null | null | null | null | null | null | null | null | null | (none) | [Africa 2024 Regional Meeting, Coordinating Of... | 77686 | false | 6 | automatic | 2024-12-31 10:57:00.000 | 1463.8 | -18.7605 | 47.563 | M94 | Urban, Other | false | null | 80% MUC 94 [Urban, Other] | null | https://data.globe.gov/system/photos/2024/12/3... | false | false | null | 80% MUC 94 [Urban, Other] | null | https://data.globe.gov/system/photos/2024/12/3... | false | null | null | https://data.globe.gov/system/photos/2024/12/3... | 2538037 | null | 80% MUC 94 [Urban, Other] | null | https://data.globe.gov/system/photos/2024/12/3... | 6508873 | Madagascar GLOBE v-School | land_covers | 373618 | 38KQE702237 | POINT (47.5629 -18.76075) | 2024-12-31 | |
| 9 | MDG | Madagascar | 1450.3 | GLOBE Observer App | null | null | https://data.globe.gov/system/photos/2024/12/3... | true | null | 50% MUC 94 [Urban, Other] | null | https://data.globe.gov/system/photos/2024/12/3... | null | null | null | null | null | null | null | null | null | null | null | null | (none) | [Africa 2024 Regional Meeting, Coordinating Of... | 77696 | false | 8 | automatic | 2024-12-31 11:13:00.000 | 1455.8 | -18.7607 | 47.5623 | M94 | Urban, Other | false | null | 50% MUC 94 [Urban, Other] | null | https://data.globe.gov/system/photos/2024/12/3... | false | false | null | 50% MUC 94 [Urban, Other] | null | https://data.globe.gov/system/photos/2024/12/3... | false | null | null | https://data.globe.gov/system/photos/2024/12/3... | 2538037 | null | 50% MUC 94 [Urban, Other] | null | https://data.globe.gov/system/photos/2024/12/3... | 6508873 | Madagascar GLOBE v-School | land_covers | 373617 | 38KQE701237 | POINT (47.56195 -18.76076) | 2024-12-31 |
Replace null values:
data['FieldNotes'] = data['FieldNotes'].replace('(none)', np.nan)
data = data.replace('null', np.nan)
data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 52038 entries, 0 to 52037
Data columns (total 64 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CountryCode 49631 non-null object
1 CountryName 49631 non-null object
2 Elevation 52038 non-null object
3 DataSource 52038 non-null object
4 DownwardCaption 22648 non-null object
5 DownwardExtraData 22644 non-null object
6 DownwardPhotoUrl 40040 non-null object
7 DryGround 43973 non-null object
8 EastCaption 23889 non-null object
9 EastClassifications 14465 non-null object
10 EastExtraData 23884 non-null object
11 EastPhotoUrl 42897 non-null object
12 Feature1Caption 341 non-null object
13 Feature1ExtraData 1389 non-null object
14 Feature1PhotoUrl 1390 non-null object
15 Feature2Caption 158 non-null object
16 Feature2ExtraData 694 non-null object
17 Feature2PhotoUrl 694 non-null object
18 Feature3Caption 98 non-null object
19 Feature3ExtraData 463 non-null object
20 Feature3PhotoUrl 463 non-null object
21 Feature4Caption 78 non-null object
22 Feature4ExtraData 347 non-null object
23 Feature4PhotoUrl 347 non-null object
24 FieldNotes 21884 non-null object
25 GlobeTeams 23991 non-null object
26 LandCoverId 52038 non-null object
27 LeavesOnTrees 43973 non-null object
28 LocationAccuracyM 36779 non-null object
29 LocationMethod 43973 non-null object
30 MeasuredAt 52038 non-null datetime64[ms]
31 MeasurementElevation 43920 non-null object
32 MeasurementLatitude 43920 non-null object
33 MeasurementLongitude 43920 non-null object
34 MucCode 22725 non-null object
35 MucDescription 22722 non-null object
36 MucDetails 43973 non-null object
37 Muddy 43973 non-null object
38 NorthCaption 24182 non-null object
39 NorthClassifications 14498 non-null object
40 NorthExtraData 24182 non-null object
41 NorthPhotoUrl 43276 non-null object
42 RainingSnowing 43973 non-null object
43 SnowIce 43973 non-null object
44 SouthCaption 23750 non-null object
45 SouthClassifications 14463 non-null object
46 SouthExtraData 23748 non-null object
47 SouthPhotoUrl 42782 non-null object
48 StandingWater 43973 non-null object
49 UpwardCaption 23150 non-null object
50 UpwardExtraData 23143 non-null object
51 UpwardPhotoUrl 40807 non-null object
52 Userid 43973 non-null object
53 WestCaption 23718 non-null object
54 WestClassifications 14438 non-null object
55 WestExtraData 23728 non-null object
56 WestPhotoUrl 42689 non-null object
57 OrganizationId 49632 non-null object
58 OrganizationName 49632 non-null object
59 Protocol 52038 non-null object
60 SiteId 52038 non-null object
61 SiteName 52038 non-null object
62 geometry 52038 non-null geometry
63 MeasuredDate 52038 non-null object
dtypes: datetime64[ms](1), geometry(1), object(62)
memory usage: 25.4+ MB
Similar to the mosquito dataset, we will filter out any points that do not fall within country boundaries. If the point is over water, then we assume the coordinates were incorrectly reported and remove it from the final dataset.
data = gpd.sjoin(data, countries, how="inner", predicate='intersects') \
.drop(columns=['index_right', 'COUNTRY']) \
.reset_index(drop=True)
data
| CountryCode | CountryName | Elevation | DataSource | DownwardCaption | DownwardExtraData | DownwardPhotoUrl | DryGround | EastCaption | EastClassifications | EastExtraData | EastPhotoUrl | Feature1Caption | Feature1ExtraData | Feature1PhotoUrl | Feature2Caption | Feature2ExtraData | Feature2PhotoUrl | Feature3Caption | Feature3ExtraData | Feature3PhotoUrl | Feature4Caption | Feature4ExtraData | Feature4PhotoUrl | FieldNotes | GlobeTeams | LandCoverId | LeavesOnTrees | LocationAccuracyM | LocationMethod | MeasuredAt | MeasurementElevation | MeasurementLatitude | MeasurementLongitude | MucCode | MucDescription | MucDetails | Muddy | NorthCaption | NorthClassifications | NorthExtraData | NorthPhotoUrl | RainingSnowing | SnowIce | SouthCaption | SouthClassifications | SouthExtraData | SouthPhotoUrl | StandingWater | UpwardCaption | UpwardExtraData | UpwardPhotoUrl | Userid | WestCaption | WestClassifications | WestExtraData | WestPhotoUrl | OrganizationId | OrganizationName | Protocol | SiteId | SiteName | geometry | MeasuredDate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ITA | Italy | 489.2 | GLOBE Observer App | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | true | NaN | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | Snag C. sativa, 40 cm, cl 2, #01 #04 #12 | ((compassData.heading: 182, compassData.horizo... | https://data.globe.gov/system/photos/2024/12/3... | Log 70 cm, C. sativa, cl 2, #04 #01 #12 | ((compassData.heading: 182, compassData.horizo... | https://data.globe.gov/system/photos/2024/12/3... | Stump of C. sativa 230 cm | ((compassData.heading: null, compassData.horiz... | https://data.globe.gov/system/photos/2024/12/3... | NaN | NaN | NaN | Old Coppice of Castanea sativa | [Conservazione Natura Universita Tuscia] | 78608 | false | 8 | automatic | 2024-12-31 15:07:00.000 | 492.4 | 42.1818 | 12.1825 | NaN | NaN | false | NaN | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | false | false | NaN | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | false | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | 128342138 | NaN | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | 17453129 | Italy Citizen Science | land_covers | 376869 | 33TTG673738 | POINT (12.18229 42.18175) | 2024-12-31 | |
| 1 | MDG | Madagascar | 1350.1 | GLOBE Observer App | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | true | NaN | 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... | NaN | https://data.globe.gov/system/photos/2024/12/3... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Arbres plantés par l'équipe GLOBE avec la comm... | [Africa 2024 Regional Meeting, Coordinating Of... | 77695 | true | 10 | automatic | 2024-12-31 11:29:00.000 | 1340.6 | -18.7576 | 47.5615 | M01 | Trees, Closely Spaced, Evergreen - Needle Leaved | n | false | NaN | 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... | NaN | https://data.globe.gov/system/photos/2024/12/3... | false | false | NaN | 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... | NaN | https://data.globe.gov/system/photos/2024/12/3... | false | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | 2538037 | NaN | 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... | NaN | https://data.globe.gov/system/photos/2024/12/3... | 6508873 | Madagascar GLOBE v-School | land_covers | 373647 | 38KQE700240 | POINT (47.56096 -18.75807) | 2024-12-31 |
| 2 | MDG | Madagascar | 1324.7 | GLOBE Observer App | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | true | NaN | 60% MUC 93 [Urban, Roads and Parking] | NaN | https://data.globe.gov/system/photos/2024/12/3... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [Africa 2024 Regional Meeting, Coordinating Of... | 77691 | true | 10 | automatic | 2024-12-31 12:07:00.000 | 1324.8 | -18.7944 | 47.5799 | M93 | Urban, Roads and Parking | false | NaN | 60% MUC 93 [Urban, Roads and Parking] | NaN | https://data.globe.gov/system/photos/2024/12/3... | false | false | NaN | 60% MUC 93 [Urban, Roads and Parking] | NaN | https://data.globe.gov/system/photos/2024/12/3... | false | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | 2538037 | NaN | 60% MUC 93 [Urban, Roads and Parking] | NaN | https://data.globe.gov/system/photos/2024/12/3... | 6508873 | Madagascar GLOBE v-School | land_covers | 373642 | 38KQE719199 | POINT (47.57953 -18.79484) | 2024-12-31 | |
| 3 | USA | United States | 182.2 | GLOBE Data Entry Site Definition | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None | 77689 | NaN | NaN | NaN | 2024-12-31 16:12:03.111 | None | None | None | M4 | Herbaceous Vegetation | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 52107 | Crestwood High School | land_covers | 373628 | Hillcrest Elementary Trail | POINT (-83.27705 42.3465) | 2024-12-31 |
| 4 | GRC | Greece | 3.0 | GLOBE Observer App | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | true | NaN | 90% MUC 91 [Urban, Residential Property]; 10% ... | NaN | https://data.globe.gov/system/photos/2024/12/3... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | [tinycore lab, tinycorelab] | 77683 | false | NaN | manual | 2024-12-31 11:45:00.000 | 3.2 | 37.939 | 23.697 | M91 | Urban, Residential Property | false | NaN | 90% MUC 91 [Urban, Residential Property]; 10% ... | NaN | https://data.globe.gov/system/photos/2024/12/3... | false | false | NaN | 20% MUC 91 [Urban, Residential Property]; 80% ... | NaN | https://data.globe.gov/system/photos/2024/12/3... | false | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | 65209921 | NaN | 10% MUC 91 [Urban, Residential Property]; 90% ... | NaN | https://data.globe.gov/system/photos/2024/12/3... | 6508393 | Greece GLOBE v-School | land_covers | 373613 | 34SGH370024 | POINT (23.6969 37.9383) | 2024-12-31 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 50778 | OMN | Oman | 584.0 | GLOBE Data Entry Site Definition | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None | 20324 | NaN | NaN | NaN | 2018-01-16 12:52:15.524 | None | None | None | M823 | Cultivated Land, Non-Agriculture, Cemeteries | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 23062994 | Elayet feda basic school | land_covers | 104390 | yanqul park | POINT (56.43 23.43) | 2018-01-16 |
| 50779 | HRV | Croatia | 153.0 | GLOBE Data Entry Site Definition | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None | 20326 | NaN | NaN | NaN | 2018-01-16 20:19:21.378 | None | None | None | M1211 | Woodland, Mainly Deciduous, Drought-Deciduous,... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 177974 | II. osnovna skola Cakovec | land_covers | 104405 | Žabnik- Sv. Martin | POINT (16.377 46.528) | 2018-01-16 |
| 50780 | OMN | Oman | 27.0 | GLOBE Data Entry Site Definition | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None | 20321 | NaN | NaN | NaN | 2018-01-03 21:57:54.342 | None | None | None | M812 | Cultivated Land, Agriculture, Orchard and Hort... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 23061684 | Al duqom basic school | land_covers | 101771 | ALduqm shcool | POINT (57.37 19.37) | 2018-01-03 |
| 50781 | OMN | Oman | 27.0 | GLOBE Data Entry Site Definition | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None | 20322 | NaN | NaN | NaN | 2018-01-03 22:00:19.250 | None | None | None | M812 | Cultivated Land, Agriculture, Orchard and Hort... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 23061684 | Al duqom basic school | land_covers | 101771 | ALduqm shcool | POINT (57.37 19.37) | 2018-01-03 |
| 50782 | OMN | Oman | 0.0 | GLOBE Data Entry Site Definition | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None | 20320 | NaN | NaN | NaN | 2018-01-01 08:47:12.483 | None | None | None | M8 | Cultivated Land | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 36736232 | Madira Pasic School | land_covers | 104036 | MADERA BASIC SCHOOL | POINT (58.167 21.0717) | 2018-01-01 |
50783 rows × 64 columns
Like the mosquito data, a copy of this dataset is on GitHub, and we will use this in future chapters. You do not need to download the dataset to your computer, as we will load the data directly using the link.
Explore the Data#
data_by_day = data[['SiteId', 'MeasuredDate']].groupby(['MeasuredDate'], as_index=False).count()
plt.figure(figsize=(8,4))
plt.plot(data_by_day['MeasuredDate'], data_by_day['SiteId'])
plt.xlabel("Date")
plt.ylabel("Daily Contribution")
plt.title("Daily Land Cover Contributions Over Time (2018-2025)")
plt.show()
data_by_county = data[['SiteId', 'CountryName']].groupby(['CountryName'], as_index=False).count().sort_values(by='SiteId').tail(10)
plt.barh(data_by_county['CountryName'], data_by_county['SiteId'])
plt.xlabel("Total Contributions")
plt.ylabel("Country")
plt.title("Mosquito Mapper Contributions by Country (2018-2025)")
plt.show()
A powerful feature of the Land Cover dataset is that users submit pictures of the area. Let’s view some these images.
# Get the first observation where all photos were submitted
entry = data.dropna(subset=['DownwardPhotoUrl', 'EastPhotoUrl', 'NorthPhotoUrl', 'SouthPhotoUrl', 'WestPhotoUrl', 'UpwardPhotoUrl',
'Feature1PhotoUrl', 'Feature2PhotoUrl', 'Feature3PhotoUrl', 'Feature4PhotoUrl']).head(1)
url_list = []
col_list = []
for col in entry.columns:
if 'Url' in col:
print(f'{col}: {entry[col].values[0]}')
url_list.append(entry[col].values[0])
col_list.append(col)
display(entry)
DownwardPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325699/original.jpg
EastPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325695/original.jpg
Feature1PhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325700/original.jpg
Feature2PhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325701/original.jpg
Feature3PhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325702/original.jpg
Feature4PhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325703/original.jpg
NorthPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325694/original.jpg
SouthPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325696/original.jpg
UpwardPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325698/original.jpg
WestPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325697/original.jpg
| CountryCode | CountryName | Elevation | DataSource | DownwardCaption | DownwardExtraData | DownwardPhotoUrl | DryGround | EastCaption | EastClassifications | EastExtraData | EastPhotoUrl | Feature1Caption | Feature1ExtraData | Feature1PhotoUrl | Feature2Caption | Feature2ExtraData | Feature2PhotoUrl | Feature3Caption | Feature3ExtraData | Feature3PhotoUrl | Feature4Caption | Feature4ExtraData | Feature4PhotoUrl | FieldNotes | GlobeTeams | LandCoverId | LeavesOnTrees | LocationAccuracyM | LocationMethod | MeasuredAt | MeasurementElevation | MeasurementLatitude | MeasurementLongitude | MucCode | MucDescription | MucDetails | Muddy | NorthCaption | NorthClassifications | NorthExtraData | NorthPhotoUrl | RainingSnowing | SnowIce | SouthCaption | SouthClassifications | SouthExtraData | SouthPhotoUrl | StandingWater | UpwardCaption | UpwardExtraData | UpwardPhotoUrl | Userid | WestCaption | WestClassifications | WestExtraData | WestPhotoUrl | OrganizationId | OrganizationName | Protocol | SiteId | SiteName | geometry | MeasuredDate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 | ITA | Italy | 1076.2 | GLOBE Observer App | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | false | NaN | 90% MUC 12 (b) [Trees, Loosely Spaced, Deciduo... | NaN | https://data.globe.gov/system/photos/2024/12/3... | NaN | ((compassData.heading: 328, compassData.horizo... | https://data.globe.gov/system/photos/2024/12/3... | NaN | ((compassData.heading: 68, compassData.horizon... | https://data.globe.gov/system/photos/2024/12/3... | NaN | ((compassData.heading: 80, compassData.horizon... | https://data.globe.gov/system/photos/2024/12/3... | NaN | ((compassData.heading: 118, compassData.horizo... | https://data.globe.gov/system/photos/2024/12/3... | Buca in fustaia coetanea di faggio a dominanza... | [Conservazione Natura Universita Tuscia] | 78265 | false | 4 | automatic | 2024-12-31 10:47:00 | 1062.4 | 41.7648 | 14.2258 | M12 | Trees, Loosely Spaced, Deciduous - Broad Leaved | b | false | NaN | 90% MUC 12 (b) [Trees, Loosely Spaced, Deciduo... | NaN | https://data.globe.gov/system/photos/2024/12/3... | false | true | NaN | 90% MUC 12 (b) [Trees, Loosely Spaced, Deciduo... | NaN | https://data.globe.gov/system/photos/2024/12/3... | false | NaN | NaN | https://data.globe.gov/system/photos/2024/12/3... | 140413632 | NaN | 90% MUC 12 (b) [Trees, Loosely Spaced, Deciduo... | NaN | https://data.globe.gov/system/photos/2024/12/3... | 17453129 | Italy Citizen Science | land_covers | 375492 | 33TVG356239 | POINT (14.22524 41.76432) | 2024-12-31 |
# Plot all of the images
plt.figure(figsize=(20, 6))
for i, (url, title) in enumerate(zip(url_list, col_list)):
response = requests.get(url)
img = Image.open(BytesIO(response.content))
# Create plot with 2 rows, 5 columns
plt.subplot(2, 5, i + 1)
plt.imshow(img)
plt.title(title)
plt.axis('off')
plt.tight_layout()
plt.show()