What is Coding?

What is Coding?#

Intro to Python, Google Colab, and Data Science.

To run the code:

Make sure you have this notebook open in Google Colab. If you are starting from the digital textbook, click

Each block of code is called a cell. To run a cell, hover over it and click the arrow in the top left of the cell, or click inside of the cell and press Shift + Enter.

Note: When you run a block of code for the first time, Google Colab will say Warning: This notebook was not authored by Google. Please click Run Anyway.

A great feature of Google Colab is that you are able to write Python code and see the output directly on your browser. Let’s go through the basics below:

# This is a comment (added using a hashtag # at the start), used by programmers to explain their code
print("Hello World!")
# Comments do not impact how the code runs

Hello World!

Python Basics#

Python is a powerful tool for processing and analyzing data - perfect for this project. Let’s explore some of the basic functionality.

A variable can be used to store values calculated in expressions and then used for other calculations.

Defining variables is straightforward, but in good naming practice use an underscore (_) rather than a space.

# Setting up two variables
variable_one = 1
variable_two = 2

# Performing a calculation using those variables
variable_three = variable_one + variable_two

# Print the variable to see the result
print(variable_three)

There are different types of variables in Python. Here are the main variable types you may run into. Note: The spaces between the left and right parentheses are just to make it easier to see. You can remove the spaces or add more without impacting how the code runs

print(type( 100 ))
print(type( "Word" ))
print(type( 100.10 ))
print(type( True ))
print(type( [0, 1] ))
print(type( (0, 1) ))
print(type( {"Key": "Value"} ))

<class 'int'>
<class 'str'>
<class 'float'>
<class 'bool'>
<class 'list'>
<class 'tuple'>
<class 'dict'>

weather_forecast = "Hot"
type(weather_forecast)

str

daily_temperature = 85.0
type(daily_temperature)

float

You can redefine variables or convert their types, to redefine, you take the original variable and assign it a different value.

daily_temperature = "85.0"
type(daily_temperature)

str

updated_temperature = float(daily_temperature)
# We can confirm the type has changed by checking the type of forecastHigh or by checking the output of a code cell with the variable.

type(updated_temperature)

float

So the value stays the same, but the variable type changes.

print(updated_temperature)

85.0

Functions are blocks of code that you use for a specific task that are easy to reuse.
We’ll define our first function in order to convert Fahrenheit to Celsius.

def is the keyword
celsius_to_fahr is the function name
(temp) is the parameter

def celsius_to_fahrenheit(temp):
    return 9 / 5 * temp + 32

To call a function, it’s the same as asking the terminal to print

freezing_point = celsius_to_fahrenheit(0)
print(f"The freezing point of water in Fahrenheit is: {freezing_point}")

The freezing point of water in Fahrenheit is: 32.0

You can also define a function using lambda. This helps us create small functions using just one line.

fahrenheit_to_celsius = lambda temp: (temp - 32) * 5 / 9

We can then use the lambda function the same as before.

freezing_point = fahrenheit_to_celsius(32)
print(f"The freezing point of water in Celsius is: {freezing_point}")

The freezing point of water in Celsius is: 0.0

Learn more about Python with these free resources

Data Science Basics#

This section will provide an introduction to working with data in Python, including common libraries, loading data, and creating graphs.

Python has “libraries” that you can import, which expands the options we have with our code. Each library comes with many functions designed to accomplish specific tasks, like loading a dataset, performing calculations, making a chart, and more.

Import pandas, A popular library for processing and analyzing data.

import pandas as pd

Google Colab has pre-loaded datasets that anyone can use. Below, when we use pd.read_csv, we are using the Pandas library that we imported above.

data = pd.read_csv("sample_data/california_housing_test.csv")

Here, data is a dataframe that stores the data in memory. Run the code below to show the first 10 rows of the data. If we wrote data.head(), this will show 5 rows by default. If we just wrote data, this will show the first 5 rows and last 5 rows.

data.head(10)

	longitude	latitude	housing_median_age	total_rooms	total_bedrooms	population	households	median_income	median_house_value
0	-122.05	37.37	27.0	3885.0	661.0	1537.0	606.0	6.6085	344700.0
1	-118.30	34.26	43.0	1510.0	310.0	809.0	277.0	3.5990	176500.0
2	-117.81	33.78	27.0	3589.0	507.0	1484.0	495.0	5.7934	270500.0
3	-118.36	33.82	28.0	67.0	15.0	49.0	11.0	6.1359	330000.0
4	-119.67	36.33	19.0	1241.0	244.0	850.0	237.0	2.9375	81700.0
5	-119.56	36.51	37.0	1018.0	213.0	663.0	204.0	1.6635	67000.0
6	-121.43	38.63	43.0	1009.0	225.0	604.0	218.0	1.6641	67000.0
7	-120.65	35.48	19.0	2310.0	471.0	1341.0	441.0	3.2250	166900.0
8	-122.84	38.40	15.0	3080.0	617.0	1446.0	599.0	3.6696	194400.0
9	-118.02	34.08	31.0	2402.0	632.0	2830.0	603.0	2.3333	164200.0

See information about the data, including the columns, non-null count (non-null means not empty), and dtype (data type).

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           3000 non-null   float64
 1   latitude            3000 non-null   float64
 2   housing_median_age  3000 non-null   float64
 3   total_rooms         3000 non-null   float64
 4   total_bedrooms      3000 non-null   float64
 5   population          3000 non-null   float64
 6   households          3000 non-null   float64
 7   median_income       3000 non-null   float64
 8   median_house_value  3000 non-null   float64
dtypes: float64(9)
memory usage: 211.1 KB

See the mean, standard deviation, minimum, and other qualities of each column in the data.

data.describe()

	longitude	latitude	housing_median_age	total_rooms	total_bedrooms	population	households	median_income	median_house_value
count	3000.000000	3000.00000	3000.000000	3000.000000	3000.000000	3000.000000	3000.00000	3000.000000	3000.00000
mean	-119.589200	35.63539	28.845333	2599.578667	529.950667	1402.798667	489.91200	3.807272	205846.27500
std	1.994936	2.12967	12.555396	2155.593332	415.654368	1030.543012	365.42271	1.854512	113119.68747
min	-124.180000	32.56000	1.000000	6.000000	2.000000	5.000000	2.00000	0.499900	22500.00000
25%	-121.810000	33.93000	18.000000	1401.000000	291.000000	780.000000	273.00000	2.544000	121200.00000
50%	-118.485000	34.27000	29.000000	2106.000000	437.000000	1155.000000	409.50000	3.487150	177650.00000
75%	-118.020000	37.69000	37.000000	3129.000000	636.000000	1742.750000	597.25000	4.656475	263975.00000
max	-114.490000	41.92000	52.000000	30450.000000	5419.000000	11935.000000	4930.00000	15.000100	500001.00000

Import Matplotlib, a popular library for making visualizing data.

import matplotlib.pyplot as plt

plt.hist(data["median_income"], bins=20)
plt.show()

../../_images/8f27c9dee0491a3e185833ef34dbf48aefbf844b152ad91cccd7598367ef611a.png

Import the GeoPandas library, useful for working with spatial data (any data with locations like addresses or coordinates).

import geopandas as gpd

geo_data = gpd.GeoDataFrame(
    data,
    geometry=gpd.points_from_xy(data.longitude, data.latitude),
    crs="EPSG:4326"
)

geo_data

	longitude	latitude	housing_median_age	total_rooms	total_bedrooms	population	households	median_income	median_house_value	geometry
0	-122.05	37.37	27.0	3885.0	661.0	1537.0	606.0	6.6085	344700.0	POINT (-122.05 37.37)
1	-118.30	34.26	43.0	1510.0	310.0	809.0	277.0	3.5990	176500.0	POINT (-118.3 34.26)
2	-117.81	33.78	27.0	3589.0	507.0	1484.0	495.0	5.7934	270500.0	POINT (-117.81 33.78)
3	-118.36	33.82	28.0	67.0	15.0	49.0	11.0	6.1359	330000.0	POINT (-118.36 33.82)
4	-119.67	36.33	19.0	1241.0	244.0	850.0	237.0	2.9375	81700.0	POINT (-119.67 36.33)
...	...	...	...	...	...	...	...	...	...	...
2995	-119.86	34.42	23.0	1450.0	642.0	1258.0	607.0	1.1790	225000.0	POINT (-119.86 34.42)
2996	-118.14	34.06	27.0	5257.0	1082.0	3496.0	1036.0	3.3906	237200.0	POINT (-118.14 34.06)
2997	-119.70	36.30	10.0	956.0	201.0	693.0	220.0	2.2895	62000.0	POINT (-119.7 36.3)
2998	-117.12	34.10	40.0	96.0	14.0	46.0	14.0	3.2708	162500.0	POINT (-117.12 34.1)
2999	-119.63	34.42	42.0	1765.0	263.0	753.0	260.0	8.5608	500001.0	POINT (-119.63 34.42)

3000 rows × 10 columns

Make a simple map of the points.

geo_data.plot()

<Axes: >

../../_images/2abcbbc4cbc53d4f6a4c8e3f3a010a7d2547c70634c82677a3ce54c37f650ac3.png

Data Analysis Activity#

Data from NOAA”s Climate at a Glance Global Time Series

Title: Global Land and Ocean Average Temperature Anomalies
Units: Degrees Celsius
Base Period: 1901-2000

We can get data directly from the url (no need to download).

data = pd.read_csv("https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series/globe/tavg/land_ocean/ytd/12/1850-2025/data.csv",
                   skiprows=3) # Skip the first 3 rows because they state the title, units, and period of the data
data.head(10)

	Year	Anomaly
0	1850	-0.19
1	1851	-0.09
2	1852	-0.05
3	1853	-0.10
4	1854	-0.06
5	1855	-0.08
6	1856	-0.15
7	1857	-0.19
8	1858	-0.18
9	1859	-0.05

plt.plot(data["Year"], data["Anomaly"])

plt.title("Global Land and Ocean Average Temperature Anomalies")
plt.ylabel("Temperature Anomaly (Degrees Celsius)")
plt.xlabel("Year")
plt.show()

../../_images/d4872650ac644f208ba51fdc229cd8058eedb207624039a6c464b4f8187ad73f.png

Let’s look at another dataset: Ambient Air Monitoring Sites in Florida - May 2023 from the Florida Department of Environmental Protection available through the Florida Geographic Data Library.

data = gpd.read_file('https://fgdl.org/zips/geospatial_data/archive/airmonitoring_may23.zip')

data.head()

	AGENCY	MANAGING_P	SITE_ID	COUNTY_FIP	COUNTY	SITE_NAME	CO	NO2	O3	...	LONGITUDE	LATITUDE	LAT_DD	LONG_DD	MGRS	GOOGLEMAP	DESCRIPT	FGDLAQDATE	AUTOID	geometry
0	867.0	LOCAL - PINELLAS CNTY DEM	L103-0012	103.0	PINELLAS	WOODLAWN	0.0	0.0	0.0	...	-82.659265	27.784749	27.784749	-82.659265	17RLL3652074461	https://www.google.com/maps/place/17RLL3652074461	PINELLAS (WOODLAWN)	2023-06-27	71	POINT (531854.993 420607.238)
1	867.0	LOCAL - PINELLAS CNTY DEM	L103-0018	103.0	PINELLAS	AZALEA PARK	0.0	1.0	1.0	...	-82.739875	27.785866	27.785866	-82.739875	17RLL2857974695	https://www.google.com/maps/place/17RLL2857974695	PINELLAS (AZALEA PARK)	2023-06-27	72	POINT (523926.446 420647.656)
2	867.0	LOCAL - PINELLAS CNTY DEM	L103-0023	103.0	PINELLAS	DERBY LANE	0.0	0.0	0.0	...	-82.623153	27.863635	27.863635	-82.623153	17RLL4019483154	https://www.google.com/maps/place/17RLL4019483154	PINELLAS (DERBY LANE)	2023-06-27	73	POINT (535308.445 429406.36)
3	867.0	LOCAL - PINELLAS CNTY DEM	L103-0026	103.0	PINELLAS	SKYVIEW DRIVE	0.0	0.0	0.0	...	-82.714600	27.850000	27.850000	-82.714600	17RLL3116981766	https://www.google.com/maps/place/17RLL3116981766	PINELLAS (SKYVIEW DRIVE)	2023-06-27	74	POINT (526337.746 427795.197)
4	867.0	LOCAL - PINELLAS CNTY DEM	L103-0027	103.0	PINELLAS	SAWGRASS LAKE PARK	1.0	1.0	0.0	...	-82.665251	27.834400	27.834400	-82.665251	17RLL3600579971	https://www.google.com/maps/place/17RLL3600579971	PINELLAS (SAWGRASS LAKE PARK)	2023-06-27	75	POINT (531206.635 426114.408)

5 rows × 29 columns

data.plot()

<Axes: >

../../_images/12b589dc05e95192e3483e42596a7ba15f08811f890406990990f71dab868dbb.png

Let’s update the design of the map using another Python library, contextily. We need to write pip install contextily to manually install the library first. This is because Google Colab does not include contextily by default, unlike the libraries we have used so far (pandas, geopandas, matplotlib).

!pip install contextily
import contextily as cx

First, we use .to_crs(epsg=3857) to set the data to a coordinate reference system that is consistent with the basemap (the map in the background) that we want to add.

ax = data.to_crs(epsg=3857).plot(color="orange",
                                    edgecolor="black",
                                    markersize=30)
cx.add_basemap(ax)
ax.set_title("Ambient Air Monitoring Sites in Florida")
plt.show()

../../_images/2704cb9d93065eb69d64954ea8e1c9115e660f3862b931f2b4127f7a189a5e7e.png

That’s it for this section! Continue to the next section to see how to use Google Earth Engine in Python to view satellite images and other Earth data.

What is Coding?

Contents

What is Coding?#

Python Basics#

Data Science Basics#

Data Analysis Activity#