Open in Colab

Introduction to GLOBE Data#

We will examine two GLOBE datasets: Mosquito Habitat Mapper and Land Cover.

You can view an interactive dashboard of this data.

You can run the following code using Google Colab, which runs on your browser (no installations required).

First, we’ll load the Python packages, which gives us more options with our code.

import pandas as pd                           # For working with data
pd.set_option("display.max_columns", None)    # Lets us see all columns of the data instead of just a preview
import geopandas as gpd                       # For working with spatial data
import numpy as np                            # For working with numbers
import matplotlib.pyplot as plt               # For making graphs
from datetime import date                     # For formatting dates
from PIL import Image                         # For getting and displaying images from links
import requests                               # For getting information from links
from io import BytesIO                        # For working with types of input and output
end_date = "2024-12-31"

Mosquito Habitat Mapper#

Let’s use the GLOBE API, which allows us to get data directly without needing to download anything.

data = gpd.read_file(f"https://api.globe.gov/search/v1/measurement/?protocols=mosquito_habitat_mapper&datefield=measuredDate&startdate=2018-01-01&enddate={end_date}&geojson=TRUE&sample=FALSE")

If you get an error NameError: name 'gpd' is not defined, go to the top of this notebook and click the arrow next to the first code block starting with import geopandas as gpd. This will install the packages needed for the rest of the code.

View the first 10 rows of the data, which are the most recently collected entries submitted to the GLOBE Observer App.

data.head(10)
countryCode countryName elevation mosquitohabitatmapperAbdomenCloseupPhotoUrls mosquitohabitatmapperBreedingGroundEliminated mosquitohabitatmapperComments mosquitohabitatmapperDataSource mosquitohabitatmapperExtraData mosquitohabitatmapperGenus mosquitohabitatmapperGlobeTeams mosquitohabitatmapperLarvaFullBodyPhotoUrls mosquitohabitatmapperLarvaeCount mosquitohabitatmapperLastIdentifyStage mosquitohabitatmapperLocationAccuracyM mosquitohabitatmapperLocationMethod mosquitohabitatmapperMeasuredAt mosquitohabitatmapperMeasurementElevation mosquitohabitatmapperMeasurementLatitude mosquitohabitatmapperMeasurementLongitude mosquitohabitatmapperMosquitoAdults mosquitohabitatmapperMosquitoEggCount mosquitohabitatmapperMosquitoEggs mosquitohabitatmapperMosquitoHabitatMapperId mosquitohabitatmapperMosquitoPupae mosquitohabitatmapperSpecies mosquitohabitatmapperUserid mosquitohabitatmapperWaterSource mosquitohabitatmapperWaterSourcePhotoUrls mosquitohabitatmapperWaterSourceType organizationId organizationName protocol siteId siteName geometry
0 BRA Brazil 6.3 null false null GLOBE Observer App LarvaeVisibleNo null [COLUNSLZ] null 0 null 13 automatic 2024-12-31 17:16:00 0 -2.5617 -44.2657 null null null 46287 false null 137422629 ovitrap https://data.globe.gov/system/photos/2024/12/3... container: artificial 17459532 Brazil Citizen Science mosquito_habitat_mapper 371514 23MNT816168 POINT (-44.26597 -2.56197)
1 BRA Brazil 6.3 null false null GLOBE Observer App LarvaeVisibleNo null [COLUNSLZ] null 0 null 13 automatic 2024-12-31 17:20:00 0 -2.5617 -44.2657 null null null 46290 false null 137422629 ovitrap https://data.globe.gov/system/photos/2024/12/3... container: artificial 17459532 Brazil Citizen Science mosquito_habitat_mapper 371514 23MNT816168 POINT (-44.26597 -2.56197)
2 BRA Brazil 7.4 null true null GLOBE Observer App LarvaeVisibleNo null [COLUNSLZ] null 0 null 51 automatic 2024-12-31 22:32:00 0 -2.5163 -44.3023 null null null 46482 false null 137420190 cement, metal or plastic tank null container: artificial 17459532 Brazil Citizen Science mosquito_habitat_mapper 372864 23MNT775218 POINT (-44.30288 -2.51676)
3 BRA Brazil 20.6 null true null GLOBE Observer App LarvaeVisibleNo null [COLUNSLZ] null 0 null 66 automatic 2024-12-31 00:05:00 0 -2.8639 -44.0549 null null null 46203 false null 137419937 can or bottle null container: artificial 17459532 Brazil Citizen Science mosquito_habitat_mapper 373085 23MPS050834 POINT (-44.05526 -2.86396)
4 BRA Brazil 20.6 null true null GLOBE Observer App LarvaeVisibleNo null [COLUNSLZ] null 0 null 28 automatic 2024-12-31 00:23:00 0 -2.8639 -44.055 null null null 46223 false null 137419937 lake null still: lake/pond/swamp 17459532 Brazil Citizen Science mosquito_habitat_mapper 373085 23MPS050834 POINT (-44.05526 -2.86396)
5 BRA Brazil 20.6 null true null GLOBE Observer App LarvaeVisibleNo null [COLUNSLZ] null 0 null 19 automatic 2024-12-31 00:26:00 0 -2.8639 -44.055 null null null 46230 false null 137419937 plant husk (areca, coconut etc) null container: natural 17459532 Brazil Citizen Science mosquito_habitat_mapper 373085 23MPS050834 POINT (-44.05526 -2.86396)
6 BRA Brazil 20.6 null true null GLOBE Observer App LarvaeVisibleNo null [COLUNSLZ] null 0 null 98 automatic 2024-12-31 00:31:00 0 -2.8638 -44.0548 null null null 46234 false null 137419937 lake null still: lake/pond/swamp 17459532 Brazil Citizen Science mosquito_habitat_mapper 373085 23MPS050834 POINT (-44.05526 -2.86396)
7 BRA Brazil 20.6 null true null GLOBE Observer App LarvaeVisibleNo null [COLUNSLZ] null 0 null 98 automatic 2024-12-31 00:42:00 0 -2.8638 -44.0552 null null null 46261 false null 137419937 plant clumps (bamboo etc) null container: natural 17459532 Brazil Citizen Science mosquito_habitat_mapper 373085 23MPS050834 POINT (-44.05526 -2.86396)
8 BRA Brazil 20.6 null true null GLOBE Observer App LarvaeVisibleNo null [COLUNSLZ] null 0 null 100 automatic 2024-12-31 00:46:00 0 -2.8639 -44.0552 null null null 46264 false null 137419937 pond null still: lake/pond/swamp 17459532 Brazil Citizen Science mosquito_habitat_mapper 373085 23MPS050834 POINT (-44.05526 -2.86396)
9 BRA Brazil 20.6 null true null GLOBE Observer App LarvaeVisibleNo null [COLUNSLZ] null 0 null 98 automatic 2024-12-31 00:48:00 0 -2.8638 -44.0552 null null null 46266 false null 137419937 adult mosquito trap null container: artificial 17459532 Brazil Citizen Science mosquito_habitat_mapper 373085 23MPS050834 POINT (-44.05526 -2.86396)

Let’s use the info() function to learn more about what the dataset contains.

data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 43345 entries, 0 to 43344
Data columns (total 35 columns):
 #   Column                                         Non-Null Count  Dtype         
---  ------                                         --------------  -----         
 0   countryCode                                    43259 non-null  object        
 1   countryName                                    43259 non-null  object        
 2   elevation                                      43345 non-null  object        
 3   mosquitohabitatmapperAbdomenCloseupPhotoUrls   43345 non-null  object        
 4   mosquitohabitatmapperBreedingGroundEliminated  43345 non-null  object        
 5   mosquitohabitatmapperComments                  43345 non-null  object        
 6   mosquitohabitatmapperDataSource                43345 non-null  object        
 7   mosquitohabitatmapperExtraData                 43345 non-null  object        
 8   mosquitohabitatmapperGenus                     43345 non-null  object        
 9   mosquitohabitatmapperGlobeTeams                16353 non-null  object        
 10  mosquitohabitatmapperLarvaFullBodyPhotoUrls    43345 non-null  object        
 11  mosquitohabitatmapperLarvaeCount               43345 non-null  object        
 12  mosquitohabitatmapperLastIdentifyStage         43345 non-null  object        
 13  mosquitohabitatmapperLocationAccuracyM         25004 non-null  object        
 14  mosquitohabitatmapperLocationMethod            25004 non-null  object        
 15  mosquitohabitatmapperMeasuredAt                43345 non-null  datetime64[ms]
 16  mosquitohabitatmapperMeasurementElevation      43328 non-null  object        
 17  mosquitohabitatmapperMeasurementLatitude       43328 non-null  object        
 18  mosquitohabitatmapperMeasurementLongitude      43328 non-null  object        
 19  mosquitohabitatmapperMosquitoAdults            43345 non-null  object        
 20  mosquitohabitatmapperMosquitoEggCount          43345 non-null  object        
 21  mosquitohabitatmapperMosquitoEggs              43345 non-null  object        
 22  mosquitohabitatmapperMosquitoHabitatMapperId   43345 non-null  object        
 23  mosquitohabitatmapperMosquitoPupae             43345 non-null  object        
 24  mosquitohabitatmapperSpecies                   43345 non-null  object        
 25  mosquitohabitatmapperUserid                    43345 non-null  object        
 26  mosquitohabitatmapperWaterSource               43345 non-null  object        
 27  mosquitohabitatmapperWaterSourcePhotoUrls      43345 non-null  object        
 28  mosquitohabitatmapperWaterSourceType           43345 non-null  object        
 29  organizationId                                 43345 non-null  object        
 30  organizationName                               43259 non-null  object        
 31  protocol                                       43345 non-null  object        
 32  siteId                                         43345 non-null  object        
 33  siteName                                       43345 non-null  object        
 34  geometry                                       43345 non-null  geometry      
dtypes: datetime64[ms](1), geometry(1), object(33)
memory usage: 11.6+ MB

We see information about each column, how many non-null rows there are (non-null means not missing), and the type (a dtype of “object” means it is text). We see that we also have a datetime column, which would be great for analyzing the data over time. The geometry column gives us information about where the citizen scientist collected the data about mosquitoes.

However, there are some columns that are currently stored as an “object” when we want then to be stored as a “float” (decimal) or “int” (whole number). Also, for some rows, if the value is missing, there is the word “null,” which can be confused as an actual value. We’ll replace this with NAN, which Python reads as empty rather than an object.

Note: The column ‘mosquitohabitatmapperLarvaeCount’ stores numbers in addition to ranges like 1-25. However, we need to simplify these ranges to a single number to make them easier to store in the dataset and more consistent. So, if there is a range, we will replace it with the lower value of the range.

# Let's remove the "mosquitomapper" in front of all the column names to make it easier to see the names
new_column_names = data.columns.str.replace("mosquitohabitatmapper", "")

# Make the first letter of each column name capitalized (for consistency) except for geometry
new_column_names = [name[0].upper() + name[1:] if name != 'geometry' else name for name in new_column_names]

data.columns = new_column_names
data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 43345 entries, 0 to 43344
Data columns (total 35 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   CountryCode               43259 non-null  object        
 1   CountryName               43259 non-null  object        
 2   Elevation                 43345 non-null  object        
 3   AbdomenCloseupPhotoUrls   43345 non-null  object        
 4   BreedingGroundEliminated  43345 non-null  object        
 5   Comments                  43345 non-null  object        
 6   DataSource                43345 non-null  object        
 7   ExtraData                 43345 non-null  object        
 8   Genus                     43345 non-null  object        
 9   GlobeTeams                16353 non-null  object        
 10  LarvaFullBodyPhotoUrls    43345 non-null  object        
 11  LarvaeCount               43345 non-null  object        
 12  LastIdentifyStage         43345 non-null  object        
 13  LocationAccuracyM         25004 non-null  object        
 14  LocationMethod            25004 non-null  object        
 15  MeasuredAt                43345 non-null  datetime64[ms]
 16  MeasurementElevation      43328 non-null  object        
 17  MeasurementLatitude       43328 non-null  object        
 18  MeasurementLongitude      43328 non-null  object        
 19  MosquitoAdults            43345 non-null  object        
 20  MosquitoEggCount          43345 non-null  object        
 21  MosquitoEggs              43345 non-null  object        
 22  MosquitoHabitatMapperId   43345 non-null  object        
 23  MosquitoPupae             43345 non-null  object        
 24  Species                   43345 non-null  object        
 25  Userid                    43345 non-null  object        
 26  WaterSource               43345 non-null  object        
 27  WaterSourcePhotoUrls      43345 non-null  object        
 28  WaterSourceType           43345 non-null  object        
 29  OrganizationId            43345 non-null  object        
 30  OrganizationName          43259 non-null  object        
 31  Protocol                  43345 non-null  object        
 32  SiteId                    43345 non-null  object        
 33  SiteName                  43345 non-null  object        
 34  geometry                  43345 non-null  geometry      
dtypes: datetime64[ms](1), geometry(1), object(33)
memory usage: 11.6+ MB
# Add new column for date, not including the time
data['MeasuredDate'] = data['MeasuredAt'].dt.date
# The LarvaeCount column has some grouped entries like '1-25' that we will replace with the center value
data['LarvaeCountProcessed'] = data['LarvaeCount'].replace({
    '1-25': 13,
    '26-50': 38,
    '51-100': 76,
    'more than 100': 100,
    'null': np.nan
})

# If the LarvaeCount is very long (more than 10 characters), then it is likely an error, so we'll replace with NAN
data.loc[data['LarvaeCountProcessed'].str.len() > 10, 'LarvaeCountProcessed'] = np.nan

numeric_cols = ['LarvaeCountProcessed', 'MeasurementLatitude', 'MeasurementLongitude']
data[numeric_cols] = data[numeric_cols].apply(pd.to_numeric)

# Drop the MosquitoEggCount column because it is all null
data = data.drop(columns=['MosquitoEggCount'])

# Replace the word 'null' with NAN to ensure it is stored as an empty value instead of a word
data = data.replace('null', np.nan)

data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 43345 entries, 0 to 43344
Data columns (total 36 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   CountryCode               43259 non-null  object        
 1   CountryName               43259 non-null  object        
 2   Elevation                 43345 non-null  object        
 3   AbdomenCloseupPhotoUrls   886 non-null    object        
 4   BreedingGroundEliminated  43284 non-null  object        
 5   Comments                  4040 non-null   object        
 6   DataSource                43345 non-null  object        
 7   ExtraData                 12521 non-null  object        
 8   Genus                     4402 non-null   object        
 9   GlobeTeams                16353 non-null  object        
 10  LarvaFullBodyPhotoUrls    8697 non-null   object        
 11  LarvaeCount               24897 non-null  object        
 12  LastIdentifyStage         30148 non-null  object        
 13  LocationAccuracyM         13877 non-null  object        
 14  LocationMethod            18313 non-null  object        
 15  MeasuredAt                43345 non-null  datetime64[ms]
 16  MeasurementElevation      43328 non-null  object        
 17  MeasurementLatitude       43328 non-null  float64       
 18  MeasurementLongitude      43328 non-null  float64       
 19  MosquitoAdults            16994 non-null  object        
 20  MosquitoEggs              16999 non-null  object        
 21  MosquitoHabitatMapperId   43345 non-null  object        
 22  MosquitoPupae             41529 non-null  object        
 23  Species                   1170 non-null   object        
 24  Userid                    43345 non-null  object        
 25  WaterSource               43345 non-null  object        
 26  WaterSourcePhotoUrls      34556 non-null  object        
 27  WaterSourceType           43345 non-null  object        
 28  OrganizationId            43259 non-null  object        
 29  OrganizationName          43259 non-null  object        
 30  Protocol                  43345 non-null  object        
 31  SiteId                    43345 non-null  object        
 32  SiteName                  43345 non-null  object        
 33  geometry                  43345 non-null  geometry      
 34  MeasuredDate              43345 non-null  object        
 35  LarvaeCountProcessed      24894 non-null  float64       
dtypes: datetime64[ms](1), float64(3), geometry(1), object(31)
memory usage: 11.9+ MB

The final data cleaning step we will perform for now is to remove points that have invalid coordinates. Sometimes, the latitude and longitude may be incorrectly reported, placing the point at the middle of the ocean. We want to remove any points that are not on land. To do that, let’s load a publicly-available file of country boundaries. The country boundaries (excluding Antarctica) were downloaded from ArcGIS Data and Maps.

countries = gpd.read_file('https://github.com/geo-di-lab/emerge-lessons/raw/refs/heads/main/docs/data/world_countries.zip')[['COUNTRY', 'geometry']].to_crs(4326)
countries.plot()
<Axes: >
../../_images/25deabaf4b2d1f3721a1d75ccfa1b17b4ec9e6b639f6875d1fbb9f82fd6d32e0.png

We will use a spatial join (sjoin) to get all of the data that “intersects” the countries layer, meaning it is either within or on the boundary of a country.

data = gpd.sjoin(data, countries, how="inner", predicate='intersects') \
          .drop(columns=['index_right', 'COUNTRY']) \
          .reset_index(drop=True)
data
CountryCode CountryName Elevation AbdomenCloseupPhotoUrls BreedingGroundEliminated Comments DataSource ExtraData Genus GlobeTeams LarvaFullBodyPhotoUrls LarvaeCount LastIdentifyStage LocationAccuracyM LocationMethod MeasuredAt MeasurementElevation MeasurementLatitude MeasurementLongitude MosquitoAdults MosquitoEggs MosquitoHabitatMapperId MosquitoPupae Species Userid WaterSource WaterSourcePhotoUrls WaterSourceType OrganizationId OrganizationName Protocol SiteId SiteName geometry MeasuredDate LarvaeCountProcessed
0 BRA Brazil 6.3 NaN false NaN GLOBE Observer App LarvaeVisibleNo NaN [COLUNSLZ] NaN 0 NaN 13 automatic 2024-12-31 17:16:00 0 -2.561700 -44.265700 NaN NaN 46287 false NaN 137422629 ovitrap https://data.globe.gov/system/photos/2024/12/3... container: artificial 17459532 Brazil Citizen Science mosquito_habitat_mapper 371514 23MNT816168 POINT (-44.26597 -2.56197) 2024-12-31 0.0
1 BRA Brazil 6.3 NaN false NaN GLOBE Observer App LarvaeVisibleNo NaN [COLUNSLZ] NaN 0 NaN 13 automatic 2024-12-31 17:20:00 0 -2.561700 -44.265700 NaN NaN 46290 false NaN 137422629 ovitrap https://data.globe.gov/system/photos/2024/12/3... container: artificial 17459532 Brazil Citizen Science mosquito_habitat_mapper 371514 23MNT816168 POINT (-44.26597 -2.56197) 2024-12-31 0.0
2 BRA Brazil 7.4 NaN true NaN GLOBE Observer App LarvaeVisibleNo NaN [COLUNSLZ] NaN 0 NaN 51 automatic 2024-12-31 22:32:00 0 -2.516300 -44.302300 NaN NaN 46482 false NaN 137420190 cement, metal or plastic tank NaN container: artificial 17459532 Brazil Citizen Science mosquito_habitat_mapper 372864 23MNT775218 POINT (-44.30288 -2.51676) 2024-12-31 0.0
3 BRA Brazil 20.6 NaN true NaN GLOBE Observer App LarvaeVisibleNo NaN [COLUNSLZ] NaN 0 NaN 66 automatic 2024-12-31 00:05:00 0 -2.863900 -44.054900 NaN NaN 46203 false NaN 137419937 can or bottle NaN container: artificial 17459532 Brazil Citizen Science mosquito_habitat_mapper 373085 23MPS050834 POINT (-44.05526 -2.86396) 2024-12-31 0.0
4 BRA Brazil 20.6 NaN true NaN GLOBE Observer App LarvaeVisibleNo NaN [COLUNSLZ] NaN 0 NaN 28 automatic 2024-12-31 00:23:00 0 -2.863900 -44.055000 NaN NaN 46223 false NaN 137419937 lake NaN still: lake/pond/swamp 17459532 Brazil Citizen Science mosquito_habitat_mapper 373085 23MPS050834 POINT (-44.05526 -2.86396) 2024-12-31 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
43007 ISR Israel 714.0 NaN false بركة اصطناعية GLOBE Observer App NaN NaN None NaN 1-25 identify-siphon-pecten None None 2018-01-08 12:14:00 714 33.010806 35.331531 false false 1340 true NaN 31782720 pond https://data.globe.gov/system/photos/2018/01/0... still: lake/pond/swamp 2567205 Horfish Elementary B School mosquito_habitat_mapper 104220 36SYB178549 POINT (35.33153 33.01081) 2018-01-08 13.0
43008 ISR Israel 755.0 NaN false بركة اصطناعية من مياة الامطار GLOBE Observer App NaN NaN None https://data.globe.gov/system/photos/2018/01/0... 1-25 identify-analyze-siphon None None 2018-01-07 10:53:00 755 33.013490 35.332672 false false 1339 true NaN 31782720 pond https://data.globe.gov/system/photos/2018/01/0... still: lake/pond/swamp 2567205 Horfish Elementary B School mosquito_habitat_mapper 100667 36SYB179552 POINT (35.33267 33.01349) 2018-01-07 13.0
43009 KOR South Korea 287.5 NaN false NaN GLOBE Observer App NaN NaN None NaN NaN NaN None None 2018-01-07 03:00:00 287.5 44.962525 -93.161031 NaN NaN 1338 NaN NaN 36916117 fountain or bird bath https://data.globe.gov/system/photos/2018/01/0... container: artificial 17479077 Republic of Korea Citizen Science mosquito_habitat_mapper 104189 15TVK873788 POINT (-93.16103 44.96252) 2018-01-07 NaN
43010 KOR South Korea 287.5 NaN false NaN GLOBE Observer App NaN NaN None NaN 0 identify-verify-larva None None 2018-01-05 03:00:00 287.5 44.962525 -93.161031 false false 1337 false NaN 36916117 well or cistern https://data.globe.gov/system/photos/2018/01/0... container: artificial 17479077 Republic of Korea Citizen Science mosquito_habitat_mapper 104189 15TVK873788 POINT (-93.16103 44.96252) 2018-01-05 0.0
43011 KOR South Korea 287.5 NaN true NaN GLOBE Observer App NaN NaN None NaN NaN NaN None None 2018-01-03 03:00:00 287.5 44.962525 -93.161031 NaN NaN 1336 NaN NaN 36916117 trash container https://data.globe.gov/system/photos/2018/01/0... container: artificial 17479077 Republic of Korea Citizen Science mosquito_habitat_mapper 104189 15TVK873788 POINT (-93.16103 44.96252) 2018-01-03 NaN

43012 rows × 36 columns

A copy of this dataset is on GitHub, and we will use this in future chapters. You do not need to download the dataset to your computer, as we will load the data directly using the link.

Explore the Data#

Let’s make a simple graph showing the number of contributions submitted by citizen scientists over time!

data_by_day = data[['SiteId', 'MeasuredDate']].groupby(['MeasuredDate'], as_index=False).count()

plt.figure(figsize=(8,4))
plt.plot(data_by_day['MeasuredDate'], data_by_day['SiteId'])
plt.xlabel("Date")
plt.ylabel("Daily Contribution")
plt.title("Daily Mosquito Habitat Mapper Contributions Over Time (2018-2025)")
plt.show()
../../_images/5b8ff0df48a160f3d04f50f9b161dd0dd0ffd935a8ed81bdb44861397299d245.png

Let’s plot the number of contributions by country:

data_by_county = data[['SiteId', 'CountryName']].groupby(['CountryName'], as_index=False).count().sort_values(by='SiteId').tail(10)

plt.barh(data_by_county['CountryName'], data_by_county['SiteId'])
plt.xlabel("Total Contributions")
plt.ylabel("Country")
plt.title("Mosquito Mapper Contributions by Country (2018-2025)")
plt.show()
../../_images/a360193db19652fb3fddd70c15adf1b31b833b51a73c051eac05f1414d301316.png

Land Cover#

Now, we’ll review the Land Cover dataset. In a similar way, we’ll get the data from the GLOBE API using the same date range from the Mosquito dataset.

data = gpd.read_file(f"https://api.globe.gov/search/v1/measurement/?protocols=land_covers&datefield=measuredDate&startdate=2018-01-01&enddate={end_date}&geojson=TRUE&sample=FALSE")

View a list of the columns in the dataset:

data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 52038 entries, 0 to 52037
Data columns (total 63 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   countryCode                     49631 non-null  object        
 1   countryName                     49631 non-null  object        
 2   elevation                       52038 non-null  object        
 3   landcoversDataSource            52038 non-null  object        
 4   landcoversDownwardCaption       52038 non-null  object        
 5   landcoversDownwardExtraData     52038 non-null  object        
 6   landcoversDownwardPhotoUrl      52038 non-null  object        
 7   landcoversDryGround             52038 non-null  object        
 8   landcoversEastCaption           52038 non-null  object        
 9   landcoversEastClassifications   52038 non-null  object        
 10  landcoversEastExtraData         52038 non-null  object        
 11  landcoversEastPhotoUrl          52038 non-null  object        
 12  landcoversFeature1Caption       52038 non-null  object        
 13  landcoversFeature1ExtraData     52038 non-null  object        
 14  landcoversFeature1PhotoUrl      52038 non-null  object        
 15  landcoversFeature2Caption       52038 non-null  object        
 16  landcoversFeature2ExtraData     52038 non-null  object        
 17  landcoversFeature2PhotoUrl      52038 non-null  object        
 18  landcoversFeature3Caption       52038 non-null  object        
 19  landcoversFeature3ExtraData     52038 non-null  object        
 20  landcoversFeature3PhotoUrl      52038 non-null  object        
 21  landcoversFeature4Caption       52038 non-null  object        
 22  landcoversFeature4ExtraData     52038 non-null  object        
 23  landcoversFeature4PhotoUrl      52038 non-null  object        
 24  landcoversFieldNotes            52038 non-null  object        
 25  landcoversGlobeTeams            23991 non-null  object        
 26  landcoversLandCoverId           52038 non-null  object        
 27  landcoversLeavesOnTrees         52038 non-null  object        
 28  landcoversLocationAccuracyM     52038 non-null  object        
 29  landcoversLocationMethod        52038 non-null  object        
 30  landcoversMeasuredAt            52038 non-null  datetime64[ms]
 31  landcoversMeasurementElevation  43920 non-null  object        
 32  landcoversMeasurementLatitude   43920 non-null  object        
 33  landcoversMeasurementLongitude  43920 non-null  object        
 34  landcoversMucCode               52038 non-null  object        
 35  landcoversMucDescription        52038 non-null  object        
 36  landcoversMucDetails            52038 non-null  object        
 37  landcoversMuddy                 52038 non-null  object        
 38  landcoversNorthCaption          52038 non-null  object        
 39  landcoversNorthClassifications  52038 non-null  object        
 40  landcoversNorthExtraData        52038 non-null  object        
 41  landcoversNorthPhotoUrl         52038 non-null  object        
 42  landcoversRainingSnowing        52038 non-null  object        
 43  landcoversSnowIce               52038 non-null  object        
 44  landcoversSouthCaption          52038 non-null  object        
 45  landcoversSouthClassifications  52038 non-null  object        
 46  landcoversSouthExtraData        52038 non-null  object        
 47  landcoversSouthPhotoUrl         52038 non-null  object        
 48  landcoversStandingWater         52038 non-null  object        
 49  landcoversUpwardCaption         52038 non-null  object        
 50  landcoversUpwardExtraData       52038 non-null  object        
 51  landcoversUpwardPhotoUrl        52038 non-null  object        
 52  landcoversUserid                52038 non-null  object        
 53  landcoversWestCaption           52038 non-null  object        
 54  landcoversWestClassifications   52038 non-null  object        
 55  landcoversWestExtraData         52038 non-null  object        
 56  landcoversWestPhotoUrl          52038 non-null  object        
 57  organizationId                  52038 non-null  object        
 58  organizationName                49632 non-null  object        
 59  protocol                        52038 non-null  object        
 60  siteId                          52038 non-null  object        
 61  siteName                        52038 non-null  object        
 62  geometry                        52038 non-null  geometry      
dtypes: datetime64[ms](1), geometry(1), object(61)
memory usage: 25.0+ MB
# Let's remove the "landcovers" in front of all the column names to make it easier to see the names
new_column_names = data.columns.str.replace("landcovers", "")

# Make the first letter of each column name capitalized (for consistency) except for geometry
new_column_names = [name[0].upper() + name[1:] if name != 'geometry' else name for name in new_column_names]

data.columns = new_column_names
data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 52038 entries, 0 to 52037
Data columns (total 63 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   CountryCode           49631 non-null  object        
 1   CountryName           49631 non-null  object        
 2   Elevation             52038 non-null  object        
 3   DataSource            52038 non-null  object        
 4   DownwardCaption       52038 non-null  object        
 5   DownwardExtraData     52038 non-null  object        
 6   DownwardPhotoUrl      52038 non-null  object        
 7   DryGround             52038 non-null  object        
 8   EastCaption           52038 non-null  object        
 9   EastClassifications   52038 non-null  object        
 10  EastExtraData         52038 non-null  object        
 11  EastPhotoUrl          52038 non-null  object        
 12  Feature1Caption       52038 non-null  object        
 13  Feature1ExtraData     52038 non-null  object        
 14  Feature1PhotoUrl      52038 non-null  object        
 15  Feature2Caption       52038 non-null  object        
 16  Feature2ExtraData     52038 non-null  object        
 17  Feature2PhotoUrl      52038 non-null  object        
 18  Feature3Caption       52038 non-null  object        
 19  Feature3ExtraData     52038 non-null  object        
 20  Feature3PhotoUrl      52038 non-null  object        
 21  Feature4Caption       52038 non-null  object        
 22  Feature4ExtraData     52038 non-null  object        
 23  Feature4PhotoUrl      52038 non-null  object        
 24  FieldNotes            52038 non-null  object        
 25  GlobeTeams            23991 non-null  object        
 26  LandCoverId           52038 non-null  object        
 27  LeavesOnTrees         52038 non-null  object        
 28  LocationAccuracyM     52038 non-null  object        
 29  LocationMethod        52038 non-null  object        
 30  MeasuredAt            52038 non-null  datetime64[ms]
 31  MeasurementElevation  43920 non-null  object        
 32  MeasurementLatitude   43920 non-null  object        
 33  MeasurementLongitude  43920 non-null  object        
 34  MucCode               52038 non-null  object        
 35  MucDescription        52038 non-null  object        
 36  MucDetails            52038 non-null  object        
 37  Muddy                 52038 non-null  object        
 38  NorthCaption          52038 non-null  object        
 39  NorthClassifications  52038 non-null  object        
 40  NorthExtraData        52038 non-null  object        
 41  NorthPhotoUrl         52038 non-null  object        
 42  RainingSnowing        52038 non-null  object        
 43  SnowIce               52038 non-null  object        
 44  SouthCaption          52038 non-null  object        
 45  SouthClassifications  52038 non-null  object        
 46  SouthExtraData        52038 non-null  object        
 47  SouthPhotoUrl         52038 non-null  object        
 48  StandingWater         52038 non-null  object        
 49  UpwardCaption         52038 non-null  object        
 50  UpwardExtraData       52038 non-null  object        
 51  UpwardPhotoUrl        52038 non-null  object        
 52  Userid                52038 non-null  object        
 53  WestCaption           52038 non-null  object        
 54  WestClassifications   52038 non-null  object        
 55  WestExtraData         52038 non-null  object        
 56  WestPhotoUrl          52038 non-null  object        
 57  OrganizationId        52038 non-null  object        
 58  OrganizationName      49632 non-null  object        
 59  Protocol              52038 non-null  object        
 60  SiteId                52038 non-null  object        
 61  SiteName              52038 non-null  object        
 62  geometry              52038 non-null  geometry      
dtypes: datetime64[ms](1), geometry(1), object(61)
memory usage: 25.0+ MB
# Add new column for date
data['MeasuredDate'] = data['MeasuredAt'].dt.date

data.head(10)
CountryCode CountryName Elevation DataSource DownwardCaption DownwardExtraData DownwardPhotoUrl DryGround EastCaption EastClassifications EastExtraData EastPhotoUrl Feature1Caption Feature1ExtraData Feature1PhotoUrl Feature2Caption Feature2ExtraData Feature2PhotoUrl Feature3Caption Feature3ExtraData Feature3PhotoUrl Feature4Caption Feature4ExtraData Feature4PhotoUrl FieldNotes GlobeTeams LandCoverId LeavesOnTrees LocationAccuracyM LocationMethod MeasuredAt MeasurementElevation MeasurementLatitude MeasurementLongitude MucCode MucDescription MucDetails Muddy NorthCaption NorthClassifications NorthExtraData NorthPhotoUrl RainingSnowing SnowIce SouthCaption SouthClassifications SouthExtraData SouthPhotoUrl StandingWater UpwardCaption UpwardExtraData UpwardPhotoUrl Userid WestCaption WestClassifications WestExtraData WestPhotoUrl OrganizationId OrganizationName Protocol SiteId SiteName geometry MeasuredDate
0 ITA Italy 489.2 GLOBE Observer App null null https://data.globe.gov/system/photos/2024/12/3... true null null null https://data.globe.gov/system/photos/2024/12/3... Snag C. sativa, 40 cm, cl 2, #01 #04 #12 ((compassData.heading: 182, compassData.horizo... https://data.globe.gov/system/photos/2024/12/3... Log 70 cm, C. sativa, cl 2, #04 #01 #12 ((compassData.heading: 182, compassData.horizo... https://data.globe.gov/system/photos/2024/12/3... Stump of C. sativa 230 cm ((compassData.heading: null, compassData.horiz... https://data.globe.gov/system/photos/2024/12/3... null null null Old Coppice of Castanea sativa [Conservazione Natura Universita Tuscia] 78608 false 8 automatic 2024-12-31 15:07:00.000 492.4 42.1818 12.1825 null null false null null null https://data.globe.gov/system/photos/2024/12/3... false false null null null https://data.globe.gov/system/photos/2024/12/3... false null null https://data.globe.gov/system/photos/2024/12/3... 128342138 null null null https://data.globe.gov/system/photos/2024/12/3... 17453129 Italy Citizen Science land_covers 376869 33TTG673738 POINT (12.18229 42.18175) 2024-12-31
1 MDG Madagascar 1350.1 GLOBE Observer App null null https://data.globe.gov/system/photos/2024/12/3... true null 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... null https://data.globe.gov/system/photos/2024/12/3... null null null null null null null null null null null null Arbres plantés par l'équipe GLOBE avec la comm... [Africa 2024 Regional Meeting, Coordinating Of... 77695 true 10 automatic 2024-12-31 11:29:00.000 1340.6 -18.7576 47.5615 M01 Trees, Closely Spaced, Evergreen - Needle Leaved n false null 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... null https://data.globe.gov/system/photos/2024/12/3... false false null 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... null https://data.globe.gov/system/photos/2024/12/3... false null null https://data.globe.gov/system/photos/2024/12/3... 2538037 null 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... null https://data.globe.gov/system/photos/2024/12/3... 6508873 Madagascar GLOBE v-School land_covers 373647 38KQE700240 POINT (47.56096 -18.75807) 2024-12-31
2 MDG Madagascar 1324.7 GLOBE Observer App null null https://data.globe.gov/system/photos/2024/12/3... true null 60% MUC 93 [Urban, Roads and Parking] null https://data.globe.gov/system/photos/2024/12/3... null null null null null null null null null null null null (none) [Africa 2024 Regional Meeting, Coordinating Of... 77691 true 10 automatic 2024-12-31 12:07:00.000 1324.8 -18.7944 47.5799 M93 Urban, Roads and Parking false null 60% MUC 93 [Urban, Roads and Parking] null https://data.globe.gov/system/photos/2024/12/3... false false null 60% MUC 93 [Urban, Roads and Parking] null https://data.globe.gov/system/photos/2024/12/3... false null null https://data.globe.gov/system/photos/2024/12/3... 2538037 null 60% MUC 93 [Urban, Roads and Parking] null https://data.globe.gov/system/photos/2024/12/3... 6508873 Madagascar GLOBE v-School land_covers 373642 38KQE719199 POINT (47.57953 -18.79484) 2024-12-31
3 USA United States 182.2 GLOBE Data Entry Site Definition null null null null null null null null null null null null null null null null null null null null null None 77689 null null null 2024-12-31 16:12:03.111 None None None M4 Herbaceous Vegetation null null null null null null null null null null null null null null null null null null null null null 52107 Crestwood High School land_covers 373628 Hillcrest Elementary Trail POINT (-83.27705 42.3465) 2024-12-31
4 GRC Greece 3.0 GLOBE Observer App null null https://data.globe.gov/system/photos/2024/12/3... true null 90% MUC 91 [Urban, Residential Property]; 10% ... null https://data.globe.gov/system/photos/2024/12/3... null null null null null null null null null null null null (none) [tinycore lab, tinycorelab] 77683 false null manual 2024-12-31 11:45:00.000 3.2 37.939 23.697 M91 Urban, Residential Property false null 90% MUC 91 [Urban, Residential Property]; 10% ... null https://data.globe.gov/system/photos/2024/12/3... false false null 20% MUC 91 [Urban, Residential Property]; 80% ... null https://data.globe.gov/system/photos/2024/12/3... false null null https://data.globe.gov/system/photos/2024/12/3... 65209921 null 10% MUC 91 [Urban, Residential Property]; 90% ... null https://data.globe.gov/system/photos/2024/12/3... 6508393 Greece GLOBE v-School land_covers 373613 34SGH370024 POINT (23.6969 37.9383) 2024-12-31
5 GRC Greece 15.5 GLOBE Observer App null null https://data.globe.gov/system/photos/2024/12/3... true null 40% MUC 94 [Urban, Other]; 60% MUC 43 [Herbace... null https://data.globe.gov/system/photos/2024/12/3... Park ((compassData.heading: 137, compassData.horizo... https://data.globe.gov/system/photos/2024/12/3... null null null null null null null null null (none) [tinycore lab, tinycorelab] 77681 false null manual 2024-12-31 11:01:00.000 16.7 37.9374 23.7051 M43 Herbaceous/Grassland, Short Grass true null 10% MUC 94 [Urban, Other]; 90% MUC 43 [Herbace... null https://data.globe.gov/system/photos/2024/12/3... false false null 40% MUC 94 [Urban, Other]; 60% MUC 43 [Herbace... null https://data.globe.gov/system/photos/2024/12/3... false null null https://data.globe.gov/system/photos/2024/12/3... 65209921 null 10% MUC 94 [Urban, Other]; 90% MUC 43 [Herbace... null https://data.globe.gov/system/photos/2024/12/3... 6508393 Greece GLOBE v-School land_covers 373609 34SGH377023 POINT (23.70482 37.93722) 2024-12-31
6 MDG Madagascar 1303.5 GLOBE Observer App null null https://data.globe.gov/system/photos/2024/12/3... true null null null https://data.globe.gov/system/photos/2024/12/3... null null null null null null null null null null null null (none) [Africa 2024 Regional Meeting, Coordinating Of... 77692 true 7 automatic 2024-12-31 11:53:00.000 1337.6 -18.7639 47.5584 M01 Trees, Closely Spaced, Evergreen - Needle Leaved n false null 50% MUC 01 (n) [Trees, Closely Spaced, Evergre... null https://data.globe.gov/system/photos/2024/12/3... false false null null null https://data.globe.gov/system/photos/2024/12/3... false null null https://data.globe.gov/system/photos/2024/12/3... 2538037 null null null https://data.globe.gov/system/photos/2024/12/3... 6508873 Madagascar GLOBE v-School land_covers 373644 38KQE697233 POINT (47.55822 -18.76443) 2024-12-31
7 ITA Italy 440.2 GLOBE Observer App null null https://data.globe.gov/system/photos/2024/12/3... true null null null https://data.globe.gov/system/photos/2024/12/3... snag F. sylvatica 140 cm, cl 1, #4 #12 ((compassData.heading: 174, compassData.horizo... https://data.globe.gov/system/photos/2024/12/3... Log F. sylvatica90 cm, cl2, #1 #4 #12 #14 ((compassData.heading: 174, compassData.horizo... https://data.globe.gov/system/photos/2024/12/3... null null null null null null Old high forest of Fagus sylvatica [Conservazione Natura Universita Tuscia] 78607 false 12 automatic 2024-12-31 14:07:00.000 447.3 42.1797 12.1754 null null false null null null https://data.globe.gov/system/photos/2024/12/3... false false null null null https://data.globe.gov/system/photos/2024/12/3... false null null https://data.globe.gov/system/photos/2024/12/3... 128342138 null null null https://data.globe.gov/system/photos/2024/12/3... 17453129 Italy Citizen Science land_covers 373629 33TTG667735 POINT (12.17516 42.17887) 2024-12-31
8 MDG Madagascar 1461.9 GLOBE Observer App null null https://data.globe.gov/system/photos/2024/12/3... true null 80% MUC 94 [Urban, Other] null https://data.globe.gov/system/photos/2024/12/3... null null null null null null null null null null null null (none) [Africa 2024 Regional Meeting, Coordinating Of... 77686 false 6 automatic 2024-12-31 10:57:00.000 1463.8 -18.7605 47.563 M94 Urban, Other false null 80% MUC 94 [Urban, Other] null https://data.globe.gov/system/photos/2024/12/3... false false null 80% MUC 94 [Urban, Other] null https://data.globe.gov/system/photos/2024/12/3... false null null https://data.globe.gov/system/photos/2024/12/3... 2538037 null 80% MUC 94 [Urban, Other] null https://data.globe.gov/system/photos/2024/12/3... 6508873 Madagascar GLOBE v-School land_covers 373618 38KQE702237 POINT (47.5629 -18.76075) 2024-12-31
9 MDG Madagascar 1450.3 GLOBE Observer App null null https://data.globe.gov/system/photos/2024/12/3... true null 50% MUC 94 [Urban, Other] null https://data.globe.gov/system/photos/2024/12/3... null null null null null null null null null null null null (none) [Africa 2024 Regional Meeting, Coordinating Of... 77696 false 8 automatic 2024-12-31 11:13:00.000 1455.8 -18.7607 47.5623 M94 Urban, Other false null 50% MUC 94 [Urban, Other] null https://data.globe.gov/system/photos/2024/12/3... false false null 50% MUC 94 [Urban, Other] null https://data.globe.gov/system/photos/2024/12/3... false null null https://data.globe.gov/system/photos/2024/12/3... 2538037 null 50% MUC 94 [Urban, Other] null https://data.globe.gov/system/photos/2024/12/3... 6508873 Madagascar GLOBE v-School land_covers 373617 38KQE701237 POINT (47.56195 -18.76076) 2024-12-31

Replace null values:

data['FieldNotes'] = data['FieldNotes'].replace('(none)', np.nan)
data = data.replace('null', np.nan)
data.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 52038 entries, 0 to 52037
Data columns (total 64 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   CountryCode           49631 non-null  object        
 1   CountryName           49631 non-null  object        
 2   Elevation             52038 non-null  object        
 3   DataSource            52038 non-null  object        
 4   DownwardCaption       22648 non-null  object        
 5   DownwardExtraData     22644 non-null  object        
 6   DownwardPhotoUrl      40040 non-null  object        
 7   DryGround             43973 non-null  object        
 8   EastCaption           23889 non-null  object        
 9   EastClassifications   14465 non-null  object        
 10  EastExtraData         23884 non-null  object        
 11  EastPhotoUrl          42897 non-null  object        
 12  Feature1Caption       341 non-null    object        
 13  Feature1ExtraData     1389 non-null   object        
 14  Feature1PhotoUrl      1390 non-null   object        
 15  Feature2Caption       158 non-null    object        
 16  Feature2ExtraData     694 non-null    object        
 17  Feature2PhotoUrl      694 non-null    object        
 18  Feature3Caption       98 non-null     object        
 19  Feature3ExtraData     463 non-null    object        
 20  Feature3PhotoUrl      463 non-null    object        
 21  Feature4Caption       78 non-null     object        
 22  Feature4ExtraData     347 non-null    object        
 23  Feature4PhotoUrl      347 non-null    object        
 24  FieldNotes            21884 non-null  object        
 25  GlobeTeams            23991 non-null  object        
 26  LandCoverId           52038 non-null  object        
 27  LeavesOnTrees         43973 non-null  object        
 28  LocationAccuracyM     36779 non-null  object        
 29  LocationMethod        43973 non-null  object        
 30  MeasuredAt            52038 non-null  datetime64[ms]
 31  MeasurementElevation  43920 non-null  object        
 32  MeasurementLatitude   43920 non-null  object        
 33  MeasurementLongitude  43920 non-null  object        
 34  MucCode               22725 non-null  object        
 35  MucDescription        22722 non-null  object        
 36  MucDetails            43973 non-null  object        
 37  Muddy                 43973 non-null  object        
 38  NorthCaption          24182 non-null  object        
 39  NorthClassifications  14498 non-null  object        
 40  NorthExtraData        24182 non-null  object        
 41  NorthPhotoUrl         43276 non-null  object        
 42  RainingSnowing        43973 non-null  object        
 43  SnowIce               43973 non-null  object        
 44  SouthCaption          23750 non-null  object        
 45  SouthClassifications  14463 non-null  object        
 46  SouthExtraData        23748 non-null  object        
 47  SouthPhotoUrl         42782 non-null  object        
 48  StandingWater         43973 non-null  object        
 49  UpwardCaption         23150 non-null  object        
 50  UpwardExtraData       23143 non-null  object        
 51  UpwardPhotoUrl        40807 non-null  object        
 52  Userid                43973 non-null  object        
 53  WestCaption           23718 non-null  object        
 54  WestClassifications   14438 non-null  object        
 55  WestExtraData         23728 non-null  object        
 56  WestPhotoUrl          42689 non-null  object        
 57  OrganizationId        49632 non-null  object        
 58  OrganizationName      49632 non-null  object        
 59  Protocol              52038 non-null  object        
 60  SiteId                52038 non-null  object        
 61  SiteName              52038 non-null  object        
 62  geometry              52038 non-null  geometry      
 63  MeasuredDate          52038 non-null  object        
dtypes: datetime64[ms](1), geometry(1), object(62)
memory usage: 25.4+ MB

Similar to the mosquito dataset, we will filter out any points that do not fall within country boundaries. If the point is over water, then we assume the coordinates were incorrectly reported and remove it from the final dataset.

data = gpd.sjoin(data, countries, how="inner", predicate='intersects') \
          .drop(columns=['index_right', 'COUNTRY']) \
          .reset_index(drop=True)
data
CountryCode CountryName Elevation DataSource DownwardCaption DownwardExtraData DownwardPhotoUrl DryGround EastCaption EastClassifications EastExtraData EastPhotoUrl Feature1Caption Feature1ExtraData Feature1PhotoUrl Feature2Caption Feature2ExtraData Feature2PhotoUrl Feature3Caption Feature3ExtraData Feature3PhotoUrl Feature4Caption Feature4ExtraData Feature4PhotoUrl FieldNotes GlobeTeams LandCoverId LeavesOnTrees LocationAccuracyM LocationMethod MeasuredAt MeasurementElevation MeasurementLatitude MeasurementLongitude MucCode MucDescription MucDetails Muddy NorthCaption NorthClassifications NorthExtraData NorthPhotoUrl RainingSnowing SnowIce SouthCaption SouthClassifications SouthExtraData SouthPhotoUrl StandingWater UpwardCaption UpwardExtraData UpwardPhotoUrl Userid WestCaption WestClassifications WestExtraData WestPhotoUrl OrganizationId OrganizationName Protocol SiteId SiteName geometry MeasuredDate
0 ITA Italy 489.2 GLOBE Observer App NaN NaN https://data.globe.gov/system/photos/2024/12/3... true NaN NaN NaN https://data.globe.gov/system/photos/2024/12/3... Snag C. sativa, 40 cm, cl 2, #01 #04 #12 ((compassData.heading: 182, compassData.horizo... https://data.globe.gov/system/photos/2024/12/3... Log 70 cm, C. sativa, cl 2, #04 #01 #12 ((compassData.heading: 182, compassData.horizo... https://data.globe.gov/system/photos/2024/12/3... Stump of C. sativa 230 cm ((compassData.heading: null, compassData.horiz... https://data.globe.gov/system/photos/2024/12/3... NaN NaN NaN Old Coppice of Castanea sativa [Conservazione Natura Universita Tuscia] 78608 false 8 automatic 2024-12-31 15:07:00.000 492.4 42.1818 12.1825 NaN NaN false NaN NaN NaN https://data.globe.gov/system/photos/2024/12/3... false false NaN NaN NaN https://data.globe.gov/system/photos/2024/12/3... false NaN NaN https://data.globe.gov/system/photos/2024/12/3... 128342138 NaN NaN NaN https://data.globe.gov/system/photos/2024/12/3... 17453129 Italy Citizen Science land_covers 376869 33TTG673738 POINT (12.18229 42.18175) 2024-12-31
1 MDG Madagascar 1350.1 GLOBE Observer App NaN NaN https://data.globe.gov/system/photos/2024/12/3... true NaN 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... NaN https://data.globe.gov/system/photos/2024/12/3... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Arbres plantés par l'équipe GLOBE avec la comm... [Africa 2024 Regional Meeting, Coordinating Of... 77695 true 10 automatic 2024-12-31 11:29:00.000 1340.6 -18.7576 47.5615 M01 Trees, Closely Spaced, Evergreen - Needle Leaved n false NaN 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... NaN https://data.globe.gov/system/photos/2024/12/3... false false NaN 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... NaN https://data.globe.gov/system/photos/2024/12/3... false NaN NaN https://data.globe.gov/system/photos/2024/12/3... 2538037 NaN 90% MUC 01 (n) [Trees, Closely Spaced, Evergre... NaN https://data.globe.gov/system/photos/2024/12/3... 6508873 Madagascar GLOBE v-School land_covers 373647 38KQE700240 POINT (47.56096 -18.75807) 2024-12-31
2 MDG Madagascar 1324.7 GLOBE Observer App NaN NaN https://data.globe.gov/system/photos/2024/12/3... true NaN 60% MUC 93 [Urban, Roads and Parking] NaN https://data.globe.gov/system/photos/2024/12/3... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN [Africa 2024 Regional Meeting, Coordinating Of... 77691 true 10 automatic 2024-12-31 12:07:00.000 1324.8 -18.7944 47.5799 M93 Urban, Roads and Parking false NaN 60% MUC 93 [Urban, Roads and Parking] NaN https://data.globe.gov/system/photos/2024/12/3... false false NaN 60% MUC 93 [Urban, Roads and Parking] NaN https://data.globe.gov/system/photos/2024/12/3... false NaN NaN https://data.globe.gov/system/photos/2024/12/3... 2538037 NaN 60% MUC 93 [Urban, Roads and Parking] NaN https://data.globe.gov/system/photos/2024/12/3... 6508873 Madagascar GLOBE v-School land_covers 373642 38KQE719199 POINT (47.57953 -18.79484) 2024-12-31
3 USA United States 182.2 GLOBE Data Entry Site Definition NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN None 77689 NaN NaN NaN 2024-12-31 16:12:03.111 None None None M4 Herbaceous Vegetation NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 52107 Crestwood High School land_covers 373628 Hillcrest Elementary Trail POINT (-83.27705 42.3465) 2024-12-31
4 GRC Greece 3.0 GLOBE Observer App NaN NaN https://data.globe.gov/system/photos/2024/12/3... true NaN 90% MUC 91 [Urban, Residential Property]; 10% ... NaN https://data.globe.gov/system/photos/2024/12/3... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN [tinycore lab, tinycorelab] 77683 false NaN manual 2024-12-31 11:45:00.000 3.2 37.939 23.697 M91 Urban, Residential Property false NaN 90% MUC 91 [Urban, Residential Property]; 10% ... NaN https://data.globe.gov/system/photos/2024/12/3... false false NaN 20% MUC 91 [Urban, Residential Property]; 80% ... NaN https://data.globe.gov/system/photos/2024/12/3... false NaN NaN https://data.globe.gov/system/photos/2024/12/3... 65209921 NaN 10% MUC 91 [Urban, Residential Property]; 90% ... NaN https://data.globe.gov/system/photos/2024/12/3... 6508393 Greece GLOBE v-School land_covers 373613 34SGH370024 POINT (23.6969 37.9383) 2024-12-31
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
50778 OMN Oman 584.0 GLOBE Data Entry Site Definition NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN None 20324 NaN NaN NaN 2018-01-16 12:52:15.524 None None None M823 Cultivated Land, Non-Agriculture, Cemeteries NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 23062994 Elayet feda basic school land_covers 104390 yanqul park POINT (56.43 23.43) 2018-01-16
50779 HRV Croatia 153.0 GLOBE Data Entry Site Definition NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN None 20326 NaN NaN NaN 2018-01-16 20:19:21.378 None None None M1211 Woodland, Mainly Deciduous, Drought-Deciduous,... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 177974 II. osnovna skola Cakovec land_covers 104405 Žabnik- Sv. Martin POINT (16.377 46.528) 2018-01-16
50780 OMN Oman 27.0 GLOBE Data Entry Site Definition NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN None 20321 NaN NaN NaN 2018-01-03 21:57:54.342 None None None M812 Cultivated Land, Agriculture, Orchard and Hort... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 23061684 Al duqom basic school land_covers 101771 ALduqm shcool POINT (57.37 19.37) 2018-01-03
50781 OMN Oman 27.0 GLOBE Data Entry Site Definition NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN None 20322 NaN NaN NaN 2018-01-03 22:00:19.250 None None None M812 Cultivated Land, Agriculture, Orchard and Hort... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 23061684 Al duqom basic school land_covers 101771 ALduqm shcool POINT (57.37 19.37) 2018-01-03
50782 OMN Oman 0.0 GLOBE Data Entry Site Definition NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN None 20320 NaN NaN NaN 2018-01-01 08:47:12.483 None None None M8 Cultivated Land NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 36736232 Madira Pasic School land_covers 104036 MADERA BASIC SCHOOL POINT (58.167 21.0717) 2018-01-01

50783 rows × 64 columns

Like the mosquito data, a copy of this dataset is on GitHub, and we will use this in future chapters. You do not need to download the dataset to your computer, as we will load the data directly using the link.

Explore the Data#

data_by_day = data[['SiteId', 'MeasuredDate']].groupby(['MeasuredDate'], as_index=False).count()

plt.figure(figsize=(8,4))
plt.plot(data_by_day['MeasuredDate'], data_by_day['SiteId'])
plt.xlabel("Date")
plt.ylabel("Daily Contribution")
plt.title("Daily Land Cover Contributions Over Time (2018-2025)")
plt.show()
../../_images/6e48738e2b10351ff8a8abfae98035061da6f6d6a38757bca543e897138f8e0d.png
data_by_county = data[['SiteId', 'CountryName']].groupby(['CountryName'], as_index=False).count().sort_values(by='SiteId').tail(10)

plt.barh(data_by_county['CountryName'], data_by_county['SiteId'])
plt.xlabel("Total Contributions")
plt.ylabel("Country")
plt.title("Mosquito Mapper Contributions by Country (2018-2025)")
plt.show()
../../_images/4c772d6d3768d963b34afc54d67ab04d8c2a2c6f1f235e7b918cfb716b79ef50.png

A powerful feature of the Land Cover dataset is that users submit pictures of the area. Let’s view some these images.

# Get the first observation where all photos were submitted
entry = data.dropna(subset=['DownwardPhotoUrl', 'EastPhotoUrl', 'NorthPhotoUrl', 'SouthPhotoUrl', 'WestPhotoUrl', 'UpwardPhotoUrl',
                            'Feature1PhotoUrl', 'Feature2PhotoUrl', 'Feature3PhotoUrl', 'Feature4PhotoUrl']).head(1)

url_list = []
col_list = []

for col in entry.columns:
  if 'Url' in col:
    print(f'{col}: {entry[col].values[0]}')
    url_list.append(entry[col].values[0])
    col_list.append(col)

display(entry)
DownwardPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325699/original.jpg
EastPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325695/original.jpg
Feature1PhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325700/original.jpg
Feature2PhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325701/original.jpg
Feature3PhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325702/original.jpg
Feature4PhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325703/original.jpg
NorthPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325694/original.jpg
SouthPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325696/original.jpg
UpwardPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325698/original.jpg
WestPhotoUrl: https://data.globe.gov/system/photos/2024/12/31/4325697/original.jpg
CountryCode CountryName Elevation DataSource DownwardCaption DownwardExtraData DownwardPhotoUrl DryGround EastCaption EastClassifications EastExtraData EastPhotoUrl Feature1Caption Feature1ExtraData Feature1PhotoUrl Feature2Caption Feature2ExtraData Feature2PhotoUrl Feature3Caption Feature3ExtraData Feature3PhotoUrl Feature4Caption Feature4ExtraData Feature4PhotoUrl FieldNotes GlobeTeams LandCoverId LeavesOnTrees LocationAccuracyM LocationMethod MeasuredAt MeasurementElevation MeasurementLatitude MeasurementLongitude MucCode MucDescription MucDetails Muddy NorthCaption NorthClassifications NorthExtraData NorthPhotoUrl RainingSnowing SnowIce SouthCaption SouthClassifications SouthExtraData SouthPhotoUrl StandingWater UpwardCaption UpwardExtraData UpwardPhotoUrl Userid WestCaption WestClassifications WestExtraData WestPhotoUrl OrganizationId OrganizationName Protocol SiteId SiteName geometry MeasuredDate
11 ITA Italy 1076.2 GLOBE Observer App NaN NaN https://data.globe.gov/system/photos/2024/12/3... false NaN 90% MUC 12 (b) [Trees, Loosely Spaced, Deciduo... NaN https://data.globe.gov/system/photos/2024/12/3... NaN ((compassData.heading: 328, compassData.horizo... https://data.globe.gov/system/photos/2024/12/3... NaN ((compassData.heading: 68, compassData.horizon... https://data.globe.gov/system/photos/2024/12/3... NaN ((compassData.heading: 80, compassData.horizon... https://data.globe.gov/system/photos/2024/12/3... NaN ((compassData.heading: 118, compassData.horizo... https://data.globe.gov/system/photos/2024/12/3... Buca in fustaia coetanea di faggio a dominanza... [Conservazione Natura Universita Tuscia] 78265 false 4 automatic 2024-12-31 10:47:00 1062.4 41.7648 14.2258 M12 Trees, Loosely Spaced, Deciduous - Broad Leaved b false NaN 90% MUC 12 (b) [Trees, Loosely Spaced, Deciduo... NaN https://data.globe.gov/system/photos/2024/12/3... false true NaN 90% MUC 12 (b) [Trees, Loosely Spaced, Deciduo... NaN https://data.globe.gov/system/photos/2024/12/3... false NaN NaN https://data.globe.gov/system/photos/2024/12/3... 140413632 NaN 90% MUC 12 (b) [Trees, Loosely Spaced, Deciduo... NaN https://data.globe.gov/system/photos/2024/12/3... 17453129 Italy Citizen Science land_covers 375492 33TVG356239 POINT (14.22524 41.76432) 2024-12-31
# Plot all of the images
plt.figure(figsize=(20, 6))

for i, (url, title) in enumerate(zip(url_list, col_list)):
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))

    # Create plot with 2 rows, 5 columns
    plt.subplot(2, 5, i + 1)
    plt.imshow(img)
    plt.title(title)
    plt.axis('off')

plt.tight_layout()
plt.show()
../../_images/95449d642dc072ecee3fa8f8b4434d3f37d792a830f82f92de1066812388d889.png