A Python tutorial
Mapping Sea Surface Chlorophyll in Python
Phytoplankton are single-celled plants found in waters worldwide, from lakes to oceans and from the equator to the poles. As tiny as they are magnificent, they spend their lives floating near the surface of the water, where they bask in the sun’s rays to photosynthesize. Despite their small size, they play an enormous role in regulating the planet’s atmosphere, easily producing as much oxygen as their terrestrial counterparts and acting as a sink for carbon dioxide (CO₂), a greenhouse gas. They also make up the base of many food webs in the oceans, which means marine researchers who study ecology want to know when and where phytoplankton are thriving.
I want to share a photo taken through a microscope of some of the plankton I caught for a year-long study of plankton species and seasonality in Puget Sound. I hope this gives you an idea of the crowded and varied world of these wonderful organisms!
You may have heard about “harmful algae blooms” or “red tides” in the news. The algae involved in these events are phytoplankton, but not all phytoplankton are harmful! Phytoplankton “blooms” occur when several environmental variables that control phytoplankton populations align in a way that allows for explosive growth. These factors include the level of light exposure, the concentrations of nutrients in the water, and the presence of zooplankton, tiny animals that feed on phytoplankton.
In many environments worldwide, these controlling factors change seasonally, producing annual cycles of high and low productivity in phytoplankton populations. I’ve included a simplified diagram below that shows how phytoplankton and zooplankton populations change throughout the year in temperate seas. In early spring, sunlight increases, and the water is full of nutrients, which encourages phytoplankton to reproduce rapidly, producing a spring bloom. This bloom will use up most of the water's nutrients and act as a feast to the local zooplankton, encouraging them to start reproducing, too. As zooplankton numbers grow, they eat up all the phytoplankton, reducing their numbers back down to a minimum. The summer months have low phytoplankton populations, but a spike in nutrient levels in the fall encourages a second, smaller bloom to occur. This diagram is extremely simplified; in reality, environmental conditions can be unpredictable and might add a lot of variation to this cycle from year to year.
Researchers can get estimates of phytoplankton abundance over large areas by measuring sea surface chlorophyll concentrations through satellites. One example is the MODIS (Moderate Resolution Imaging Spectroradiometer) instrument aboard NASA’s Aqua satellite. The MODIS instrument measures light intensity ratios at various wavelengths reflected by the ocean surface. The data it collects is passed through algorithms to tease out an approximate measure of chlorophyll concentration, which is then used to calculate phytoplankton abundance. Some limitations of this technology are cloud cover that can block the satellite’s view of the oceans or sediment in the water that muddies (pun-intended) the relationship between the reflections and chlorophyll concentrations.
Now that we’ve covered the basics of phytoplankton and satellites, let’s jump into some code! For this tutorial, I wanted to map chlorophyll concentrations in Puget Sound, but the Pacific Northwest has enough cloudy weather to interfere with several visualizations. Instead, I chose to map the Gulf of California because it has strong seasonality in phytoplankton populations, without Washington’s frequent cloudy weather.
First, if you’ve never run any code in Python or Jupyter before, you can learn how to get started at Project Jupyter’s website. If you’d rather download and run the notebook containing the code that I wrote for these visualizations, you can find it in the chlorophyll folder in my blog visualizations repository on GitHub. The README there has instructions on how to get this code up and running on your machine, but I’ll also walk through the steps here.
You can get chlorophyll concentration data from the Aqua satellite from the NASA Earth Observations website. I included a screenshot of the website here with two boxes and arrows. On the left, I’ve outlined the options for time periods; you can get an 8-day average or a 1-month average (I’ll be using a 1-month average for every month from the year 2019 for this tutorial). On the right side, I’ve outlined the button that will download a NetCDF (.nc) file containing the chlorophyll data for the time period you’ve selected. If you’ve never worked with NetCDF files before, they store metadata variables, like the time and location of data collection, and array-oriented scientific data, like the chlorophyll concentrations we’ll be looking at. They’re the standard data format for many fields in environmental research. Using a Python package called xarray, mapping data in this format is delightfully easy! To look at seasonality in chlorophyll concentrations, you can download data files from multiple times of the year, put them into a new folder called ‘data,’ and start writing your code in the same directory as your new ‘data’ folder. I wrote the code here to work with 12 data files (one for every month of the year), but you can alter the grid size and axes names if you want to plot more or fewer files. Alternatively, you could plot 3 or 4 months from 3 or 4 different years to look at variation across an even greater time span.
To start, you’ll need to import a handful of packages. If one of them isn’t already installed on your machine, you can install it with a simple pip install.
import netCDF4 # pip install netCDF4
import xarray as xr # pip install xarray
import cmocean # pip install cmocean
import numpy as np
import matplotlib.pyplot as plt
Next, pull in your files from that ‘data’ folder and turn them into a list of opened xarray datasets. List comprehension comes in handy for these tasks!
# Get file path and create list of data files
parent_dir = os.getcwd()
file_path = os.path.join(parent_dir, 'data')
files = [item for item in os.listdir(file_path) if not item.startswith('.')]
# Open the datasets with xarray
datasets = [xr.open_dataset('./data/' + file) for file in files]
The variable that we’re going to plot is called ‘chlor_a’ in our datasets, but if you aren’t sure what your variable names are, you can use xarray’s
.data_vars method to find out.
# Use .data_vars to find variable names
If you’re like me, you’ll want to make sure you can plot one dataset before you try plotting a whole collection of them. Plotting with xarray is so easy that it only takes one line of code! We can also make use of the cmocean Python package that contains colormaps tailored specifically to oceanographers; I’ll be using the algae option, which is a collection of greens. If we don’t specify any coordinates, the code below will generate a map of the whole planet by default.
# Generate global snapshot
datasets.chlor_a.plot(x='lon', y='lat', figsize=(26,12), vmin=0, vmax=5, cmap=cmocean.cm.algae);# Add a title showing the year and month of data collection
This map of the world’s oceans shows that chlorophyll is sparse in the open ocean and highly concentrated around coastlines and lakes. The white on the map represents land, clouds, or ice. The green represents chlorophyll concentrations in units of milligrams per meter cubed (mg/m³). Next, let’s zoom in on our area of interest: the Gulf of California. If you’d like to plot a different region of the planet, change the values of
site_lon below to be the latitude and longitude coordinates of the region you’re interested in. Note that this method requires values between -90 and 90 for latitude and -180 to 180 for longitude.
# Enter coordinates of the Gulf of California
site_lat = 26.7
site_lon = -110.7# Slice the data using the coordinates so that our computer doesn't have to process so much information
ds_slice = datasets.sel(lat=slice(site_lat+10, site_lat-10), lon=slice(site_lon-10, site_lon+10))# Create a plot
ds_slice.chlor_a.plot(x='lon', y='lat', figsize=(12,12), vmin=0, vmax=3, cmap=cmocean.cm.algae);# Add a title showing the year and month of data collection
plt.title('Gulf of California, ' + datasets.attrs['time_coverage_start'][:7])
This plot shows us that the Gulf of California has higher chlorophyll concentrations than the waters offshore. Now that we know how to plot a single month of chlorophyll data, let’s plot 12 months and see how chlorophyll concentrations in the Gulf of California change throughout a single year!
# Set colorbar values. May take trial and error to get the level of detail you are aiming for.
vmin = 0.0
vmax = 3.0# Set the lat/lon distance from site location to plot
box_lim = 7# Set levels for resolution of colorbar. Change the 0.01 value for higher or lower resolution.
lvl = np.arange(vmin, vmax, 0.01).tolist()# Create a grid of subplots, 3 rows x 4 columns
f, ((ax1, ax2, ax3, ax4), (ax5, ax6, ax7, ax8), (ax9, ax10, ax11, ax12)) = plt.subplots(3, 4, figsize=(24,16))
ax_list = [ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9, ax10, ax11, ax12]# Loop through the subplot axes
for i in range(len(ax_list)):
ds = datasets[i]
# Slice the data using previous coordinates
ds_slice = ds.sel(lat=slice(site_lat+box_lim, site_lat-box_lim), lon=slice(site_lon-box_lim, site_lon+box_lim))
# Generate plot and title
ds_slice.chlor_a.plot.contourf(x='lon', y='lat', ax=ax_list[i], vmin=vmin, vmax=vmax, levels=lvl, cmap=cmocean.cm.algae)
ax_list[i].set_title('Gulf of California, ' + ds.attrs['time_coverage_start'][:7])
Excellent! This visual shows us that chlorophyll concentrations appear to ramp up in January and February before the large spring bloom in March and April. Phytoplankton populations are lowest in the Gulf of California from June to September, and smaller blooms appear in October, November, and December. I hope this leaves you with more questions than you started with and encourages you to think about the seasonality of other ecosystems around you!
There are a couple of ways this code could be improved. For instance, the website that hosts this data makes beautiful visualizations using a logarithmic color scale. I believe they use the ‘deep’ cmocean colormap reversed (you can pass
cmap=cmocean.cm.deep_r as a plotting parameter) on a log scale from 0.01 to 60 mg/m³; I would love to replicate this! They also fill the land, cloud, and ice cover with the color black instead of the default white. In addition, the color bar really only needs to be displayed once since it’s the same for each of the 12 subplots. Also, I noticed that my visualizations have transparent backgrounds, making the titles, axis marks, and labels impossible to read when you click an image on the iPhone Medium app. Apologies to any mobile viewers; I will try to correct this for visualizations in future posts!
Much work goes into polishing the data collection, processing, and distribution pipelines that make this data easy to use for public citizens like me. Still, there are so many wonderful, untapped resources like this that the general public is unaware of. I hope to make more tutorials like this in the future to make it easier for entry-level oceanographers and programmers to explore all of the data available to them.
Keep in mind that the visualizations created here are just a few snapshots of one little piece of the Earth, taken through the lens of a radiance-tracking satellite. There are countless more regions to investigate and many other variables to consider, and I hope we can uncover a few gems as we continue to explore environmental data together. As long as satellites are orbiting the Earth, we won’t be running out of work anytime soon — what a wonderful reality!
Thanks for reading! Here are a few resources if you’d like to read more about phytoplankton and how satellites help researchers understand the ocean surface:
Phytoplankton - A Simple Guide | WHOI
Phytoplankton are mostly microscopic, single-celled photosynthetic organisms that live suspended in water. Like land…
MODIS Design The MODIS instrument provides high radiometric sensitivity (12 bit) in 36 spectral bands ranging in…