Sunday, November 22, 2020

Quick Note about the future of GrADSaholic

 Quick Note:

Since I started this blog in 2013 (mostly out of the monotony of post grad-school unemployment), I've been consistently surprised at just how much engagement it's gotten from the community.  Even now, after several years of relative inactivity, this blog still has a consistent following and seems to serve a useful educational purpose.  While GrADS is still very popular in the atmospheric science and climate community (I know because I can spot a GrADS figure a mile away in a paper/presentation), I haven't used it much recently myself.  Which brings me to what I see for the future of GrADSaholic.  

In short, I don't foresee myself coming up with new GrADS specific educational content, but would like to use this platform to continue to provide fresh and relevant educational computer programming and data science resources to aspiring atmospheric and Earth scientists.

Like many in my field, I made the jump to the Python bandwagon due to its extremely user-friendly feel, broad community support, and seemingly infinite number of data-science tools and applications.  In the past, I've been extremely wary of shifting the focus of this blog from a GrADS focus to a Python focus, if for no other reason than the fact that there is already an enormous wealth of information about Python on the internet.  However, a growing number of recent experiences have led me to believe that shifting in this direction may be worthwhile.

So, going forward, this blog will no longer be restricted to GrADS tutorials and will rather evolve towards a more broad focus on "how to be a better data-scientist and code developer" with a heavy focus on atmospheric science and (for now) Python.  My goal is NOT to regurgitate or provide completely redundant information, but given the shear amount of Python information out there, some overlap is unavoidable. All of the GrADS scripts and tutorials will remain up on the site, and if I do find anything new or interesting to write about with respect to GrADS, I'll be sure to do so.

Thanks again for all of the community engagement and feedback over the years, and I hope I am able to continue being a valuable educational resource for the community.

Python | Tutorial: Blue and Black Marble Aurora Figure using Background images and Day/Night Delimiter

Introduction

One of the strengths of Python, and specifically Cartopy, is that it's pretty easy to combine many different datasets with varying geographic coordinates onto a single map.  In this tutorial, I'll combine background "Blue and Black" marble images from NASA with the forecast aurora data from the Space Weather Prediction Center (SWPC: https://www.swpc.noaa.gov/), and simulated cloud cover from the GFS to produce an aurora forecast map.  Now, since one of my main goals for this blog is not to simply regurgitate what already exists, if you are just looking to plot the aurora forecast, there is an excellent example (which I'll be building off of in this tutorial) of how to so in the Cartopy gallery: here.
This is a somewhat advanced tutorial, so I won't be focusing on trying to explain a lot of the plotting minutia or reasoning.  If you're just starting out it is recommended that you check out some of my beginner tutorials first: Introduction to Cartopy, and my post on how to read in NetCDF data.
In this tutorial you will learn how to:
  • Load background images into Cartopy
  • Use the Cartopy "Nightshade" feature to blend the NASA blue and black marble images
  • Maskout shapefile geometries
  • Load the aurora forecast from the SWPC using Numpy "loadtxt"
  • Create a custom colormap using matplotlib "LinearSegmentedColorMap"
  • Use the NearsidePerspective projection from Cartopy
The end result of the tutorial will be an image that looks similar to this, but will be different depending on the time of day, auroral activity, and cloud cover:

Final Product: Forecast Aurora Probability and simulated GFS clouds valid UTC time.



Loading the background images

The first step is to download the "Blue Marble" and "Black Marble" background images to load into the dataset.  These are pretty easy to grab, I will simply provide links to the geotiff 0.1 degree versions for each image.  Load these into a folder where you can access them using a Python script.
If these don't work you can try the parent sites: Blue Marble here, and Black Marble here.

Now that you have the data, it's time to load it into Python.  Before that however, the required packages for the entire tutorial need to be imported:

try:
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen

from io import StringIO

import numpy as np
from datetime import datetime, timedelta
import cartopy.crs as ccrs
from cartopy import feature as cfeature
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap 
from cartopy.feature.nightshade import Nightshade
from matplotlib.path import Path
from cartopy.mpl.patch import geos_to_path
import netCDF4 as NC

Note the"try" statement is used to make the script compatible with both Python 2.7 and Python 3 urllib syntax.  Now, once the data is imported, I'm going to define a common function to load the "Marble" images, since they have identical lat/lon boundaries.

def load_basemap(map_path):
    img=plt.imread(map_path)
    img_proj = ccrs.PlateCarree()
    img_extent = (-180, 180, -90, 90)
    return img, img_proj,img_extent

You should be able to see that the above function is really straightforward.  The image is first loaded into an array using "imread" from matplotlib, and then the image extent is set to span the entire glob (which is obvious from looking at the image), and then the image projection information for Cartopy is set to PlateCarree, since we are working in lat/lon coordinate space.  To investigate a little further, we can call this function and look at the shape of the image array.
Assuming the image is in the same folder as your script...

map_path='RenderData.tif'
bm_img,bm_proj,bm_extent=load_basemap(map_path)
print(np.shape(bm_img))

returns: (1800, 3600, 3)

Notice that the array returned has 3 dimensions despite the fact that it is a 2D image: the 3rd column is where the individual red/green/blue information for the image is stored, where as the first 2 columns correspond to the y and x pixels.  To demonstrate this, I'll plot each channel separately without geographic information.

plt.figure(figsize=(8,14))
ax=plt.subplot(311)
plt.pcolormesh(bm_img[:,:,0],cmap='Reds')
ax=plt.subplot(312)
plt.pcolormesh(bm_img[:,:,1],cmap='Greens')
ax=plt.subplot(313)
plt.pcolormesh(bm_img[:,:,2],cmap='Blues')
plt.show()

This produces the following image:
RGB channels for the Blue Marble image (Yikes! It's upside down!)
Note that where all three channels are saturated, the combined RGB image will be white (e.g., Antarctica), where as where all three channels are faded, the RGB image will appear back (e.g., the Oceans).  Also notice how the image is upside down, this will be addressed in the next part.

The question, is how can we plot the combined RGB image from the 3 dimensional array.  Instead of using the "pcolormesh" or the "contourf" functions, the "imshow" function is used.  this function, is versitile and naturally takes a 3D array assuming the 3rd axis (if there is one) is the RGB channel.  It also, takes the extent argument, which defines the image corners, without needed corresponding coordinate arrays.

    plt.figure(figsize=(11,8))
    ax=plt.subplot(111,projection=ccrs.PlateCarree())
    ax.imshow(bm_img,extent=bm_extent,transform=bm_proj,origin='upper')
    plt.show()

This results in the following image:

Blue Marble
I won't go into detail on how to load the black marble, however, I will provide the code used to load both images as they are read into the final product.


bm_img,bm_proj,bm_extent=load_basemap('RenderData.tif')
nt_img,nt_proj,nt_extent=load_basemap('BlackMarble_2016_01deg_geo.tif')

Blend Images using the Day/Night Delimiter

This next section is the crux of the whole tutorial: Basically, I'll explain how to use Cartopy Nightshade (Note: requires version 0.17 or higher) to mask out the day time blue marble and show the night time black marble image.  For this, I'll use the matplotlib "Path" function to draw a polygon "path" connecting the coordinate information associated with the Nightshade function.

The Nightshade function is a neat function that allows you to draw the day/night delimiter on a glob for a given datetime object.  In the full example, the datetime object will be defined by the aurora forecast, but for demonstration of Nightshade, the datetime.now() object will be used.

For a simple demonstration, simply copying the code to generate the blue marble image, with only a couple of added lines, provided you are following the tutorial with the package imports written as above:


plt.figure(figsize=(11,8))
ax=plt.subplot(111,projection=ccrs.PlateCarree())
ax.imshow(bm_img,extent=bm_extent,transform=bm_proj,origin='upper')
ax.add_feature(Nightshade(datetime.utcnow()),zorder=2,alpha=0.8,color='r')
plt.title(datetime.utcnow().strftime('%c'))
plt.show()

This will essentially mask out the "night sky" over the blue marble image, it's that simple!
Night masked out.

Now the goal here is simple: replace the "red" masked out section with the black marble image.  To do that, I'll first define a function to clip regions within the shapefile polygon.

The first step is to define the Nightshade "shape" from the Nightshade function:

nshade=Nightshade(datetime.utcnow())
transform=nshade.crs

Now we have both the nightshade shape, and the nightshade coordinate reference system, which is needed to ensure the projection transform is performed correctly.  From the shape, we grab the geometries (i.e., the polygons, or in this case ... polygon; singular).

geoms=list(nshade.geometries())

Here is the tricky part: because the Nightshade coordinate reference system is NOT Plate Carree (it's a rotated pole projection for those interested), it's important that the transform is done correctly.  If you assume the transform is Plate Carree you will end up with a mask that is wrong.  However, the matplotlib path function requires that the transform is a matplotlib transform, and not a cartopy crs object.  So, to do this conversion, I will define a "dummy" transform when defining my Caropy subplot that equals the crs from the nightshade object.

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.PlateCarree(),transform=nshade.crs)

Now, we have a defined figure, and a matplotlib transform that matchs the nightshade object, we're ready to mask.  First we plot the blue and black marble images on the same subplot, and then use the geos_to_path and Path functions imported at the top of the script:

plt.figure(figsize=(11,8))
ax=plt.subplot(111,projection=ccrs.PlateCarree(),transform=nshade.crs)
im0=ax.imshow(bm_img,extent=bm_extent,transform=bm_proj,origin='upper',zorder=1) 
im1=ax.imshow(nt_img,extent=nt_extent,transform=nt_proj,origin='upper',zorder=2)
path = Path.make_compound_path(*geos_to_path(geoms))
im1.set_clip_path(path, transform=ax.get_transform())
plt.title(datetime.utcnow().strftime('%c'))
plt.show()

And the blended image is complete,note that the subplot transform is used in the set_clip_to_path function:

Blended image based on current datetime.

Now, it's time to plot the aurora forecast.

Loading and Plotting the Aurora Forecast

This section will mostly be a reproduction of the code demonstrated here, with some minor differences.  Esentially, Numpy loadtxt is used to load the ascii data from the SWPC into a 2 dimensional array and the array is plotted on the map.  The first step is to define a cool looking aurora color bar.  This a a direct reproduction from the Cartopy example because I think they did a nice job with the color scales.

def aurora_cmap():
    """Return a colormap with aurora like colors"""
    stops = {'red': [(0.00, 0.1725, 0.1725),
                     (0.50, 0.1725, 0.1725),
                     (1.00, 0.8353, 0.8353)],

             'green': [(0.00, 0.9294, 0.9294),
                       (0.50, 0.9294, 0.9294),
                       (1.00, 0.8235, 0.8235)],

             'blue': [(0.00, 0.3843, 0.3843),
                      (0.50, 0.3843, 0.3843),
                      (1.00, 0.6549, 0.6549)],

             'alpha': [(0.00, 0.0, 0.0),
                       (0.70, 1.0, 1.0),
                       (1.00, 1.0, 1.0)]}

    return LinearSegmentedColormap('aurora', stops)

Basically, a dictionary with 0-1 values for red (r), green (g), blue (b), and transparancy (alpha) is passed to the LinearSegmentedColormap function which defines a color map that basically linearly interpolates between each of the 9 "stops" defined by the user.

Now, that the colormap is defined, you can pull the aurora data from the SWPC and load it into an array:

    # To plot the current forecast instead, uncomment the following line
    url = 'http://services.swpc.noaa.gov/text/aurora-nowcast-map.txt'

    response_text = StringIO(urlopen(url).read().decode('utf-8'))
    aurora_prob = np.loadtxt(response_text)
    # Read forecast date and time
    response_text.seek(0)
    for line in response_text:
        if line.startswith('Product Valid At:', 2):
            dt = datetime.strptime(line[-17:-1], '%Y-%m-%d %H:%M')

Now, again this code is largely copied from the Cartopy example, but essentially you're reading the text information from the URL, first into the Numpy loadtxt function, which is a natural fit for this data, since the function, by default, assumes "#" is a comment, and "spaces" are delimiters, so in this instance, no additional arguments are required for loadtxt.  Then, the datetime associated with forecast is read into a datetime object.

Since it is known that the lat/lon data spans -90 to 90 and -180 to 180, we define lat/lon arrays based on the shape of the aurora data:


lons=np.linspace(-180,180,np.shape(aurora_prob)[1])
lats=np.linspace(-90,90,np.shape(aurora_prob)[0])

Finally, combining everything together to product the "cloud-free" aurora forecast:


plt.figure(figsize=(11,8))
ax=plt.subplot(111,projection=ccrs.PlateCarree(),transform=nshade.crs)
im0=ax.imshow(bm_img,extent=bm_extent,transform=bm_proj,origin='upper',zorder=1)
im1=ax.imshow(nt_img,extent=nt_extent,transform=nt_proj,origin='upper',zorder=2)
path = Path.make_compound_path(*geos_to_path(geoms))
im1.set_clip_path(path,transform=ax.get_transform())   
Z=ax.contourf(lons,lats,np.ma.masked_less(aurora_prob,1), levels=np.linspace(0,100,61), transform=ccrs.PlateCarree(),     zorder=5,cmap=aurora_cmap(),antialiased=True)
plt.title(datetime.utcnow().strftime('%c'))
plt.show()

Which gives you the following image:

Aurora Forecast with Plate Carree Projection

Finally, replacing the PlateCarree map projection with the NearsidePerspective projection, we get something a little bit closer to the final image:


fig = plt.figure(figsize=[10, 8])
ax = plt.axes(projection=ccrs.NearsidePerspective(-170, 45),transform=nshade.crs)
im0=ax.imshow(bm_img,extent=bm_extent,transform=bm_proj,origin='upper',zorder=1)
im1=ax.imshow(nt_img,extent=nt_extent,transform=nt_proj,origin='upper',zorder=2)
path = Path.make_compound_path(*geos_to_path(geoms))
im1.set_clip_path(path, transform=ax.get_transform())
Z=ax.contourf(lons,lats,np.ma.masked_less(aurora_prob,1), levels=np.linspace(0,100,61), transform=ccrs.PlateCarree(),
              zorder=5,cmap=aurora_cmap(),antialiased=True)
plt.title("Forecast Aurora Probabilty \n Valid: %s"%dt.strftime('%b/%d/%Y %H:%M UTC'),
          loc='left',fontweight='bold')

Orthographic / Nearside Perspective Projection

Adding A Cloud Forecast from the GFS

The final step in this tutorial is to add cloud cover from the GFS to the map.  This is unnecessary, if you're happy with the above image, but my personal thought is, you aren't going to see the aurora if it's cloudy, so having the forecast cloud cover is a nice overlay.  It's real easy to add the clouds, simply pull the GFS data from the NOMADS server using netCDF4, match the GFS model time to the aurora forecast time and overlay the total cloud cover variable on the map.  Again, my assumption is that if you've made it this far, you're comfortable working with GFS data using netCDF4 and I won't explain in detail why the following code works:


dhour='12' ## or 00, or 18 or 06, your choice
dtstr=dt.strftime('%Y%m%d')
gfs_path='http://nomads.ncep.noaa.gov:80/dods/gfs_0p25_1hr/gfs%s/gfs_0p25_1hr_%sz'%(dtstr,dhour)

gfs_data=NC.Dataset(gfs_path,'r')

gfslons=gfs_data.variables['lon']
gfslats=gfs_data.variables['lat']

tres=gfs_data.variables['time'].resolution
tmin=datetime.strptime(gfs_data.variables['time'].minimum,'%Hz%d%b%Y')
times=[timedelta(hours=float(i)*24.*tres)+tmin for i in range(len(gfs_data.variables['time'][:]))]

tidx=np.argmin(np.abs(np.array(times)-dt))

clouds=gfs_data.variables['tcdcclm'][tidx,:]

and then simply add the following code to the beneath the code used to load the background images and plot the aurora forecast:


ax.pcolormesh(gfslons[:],gfslats[:],np.ma.masked_less(clouds,75.),transform=ccrs.PlateCarree(),
    alpha=0.7,cmap='Greys_r',zorder=4,vmin=50,vmax=100)

And voila:

Final Product

Final Notes

That is all for this tutorial, it was a long one, but I think there is a lot of useful stuff in here.  A few final notes, reprojecting the background images can lead to really poor quality images, I find this particularly true for Lambert and PolarStereo projections, so be aware of that. 

Python | Tutorial: Intro to Cartopy

Introduction:

There are currently two main Python libraries for plotting geographic data on map: Cartopy and Basemap.  New users should use the Cartopy since support Cartopy will replace Basemap and support for Basemap is expected to wrap up in 2020.  Therefore, this tutorial will focus on Cartopy.  If you want to use Basemap (e.g., for plotting in "3D"), here are a couple of links to existing tutorials and examples to help get you started.  Furthermore, many concepts described here may be useful for understanding Basemap.

Cartopy:

Cartopy is a geospatial plotting library built on top of Numpy and Matplotlib that makes plotting gridded data, shapefiles, and other geographic data on over 30 different map projections.  Furthermore, the Cartopy "transform" functionality makes it straightforward to convert data from one projection to another.  In this tutorial you will learn how to use Cartopy to:
  • Plot GFS surface temperature data on a map
  • Mask out land surfaces to plot a map of sea surface temperature and ice-cover from the GFS
  • Plot a regional map of surface temperature with US states
  • Use the transform function to plot GFS data on a different map projection

This tutorial accesses the NOMADS data server using the netCDF4 library, if you are unfamiliar with doing this, I recommend you see my tutorial on reading NetCDF data.


Getting Started - Grabbing GFS Data:
The following code is used to get started with this tutorial, essentially, it imports all of the required modules, pulls the GFS data from the NOMADS data server, and loads it into arrays.  Once the data is loaded, the remainder of the tutorial will focus on Cartopy. 

import numpy as np
from matplotlib import pyplot as plt
from cartopy import crs as ccrs
import cartopy.feature as cfeature
import netCDF4 as nc
import datetime as DT 
datetime=(DT.datetime.utcnow()-DT.timedelta(hours=6)).strftime('%Y%m%d')
fhour=20

nomads_path="https://nomads.ncep.noaa.gov:9090/dods/gfs_0p25/gfs{}/gfs_0p25_00z".format(datetime)

datafile=nc.Dataset(nomads_path)
lat=datafile.variables['lat'][:]
lon=datafile.variables['lon'][:] 
sfcT=datafile.variables['tmpsfc'] #Keep meta data for now!
sfcT_data=sfcT[fhour,:]

title=sfcT.long_name
time=datafile.variables['time']
tstart=DT.datetime.strptime(time.minimum,'%Hz%d%b%Y')
timestr=(tstart+DT.timedelta(hours=time.resolution*24.*fhour)).strftime('%c')

Now that your data is loaded, we initialize a figure, and subplot using matplotlibs "projection" keyword to link to the Cartopy library.  Once you've attached the Cartopy package to your subplot, there are a number of additional objects and classes you can add to your figure.  In the first example, I'll demonstrate by calling upon the "coastlines" attribute to plot continents on the map.

fig=plt.figure(figsize=(11,5))
ax=plt.subplot(111,projection=ccrs.PlateCarree())

Now, your figure and subplot are defined.  The Cartopy "PlateCarree()" projection is a basic cylindrical map projection that accepts coordinate information in the form of lat/lon coordinate pairs.  The lat/lon data can be either 1D or 2D, but essentially, if your coordinate information is lat/lon, your projection is PlateCarree().   You can pass 2 keywords into PlateCarree (central_longitude and globe), but in most instances, you won't need to.   Occasionally, I reset the central_longitude to 180, instead of it's default zero.

The code to plot the map is:
Z=plt.pcolormesh(lon,lat,sfcT_data,cmap='jet')

pos=ax.get_position()

plt.title(title,fontweight='bold',loc='left')
plt.title("Valid:{} UTC".format(timestr),loc='right')
ax.coastlines()

cbar_ax=fig.add_axes([pos.x1+0.01,pos.y0,0.015,pos.height])
cbar=plt.colorbar(Z,cax=cbar_ax)
cbar.set_label(title)

datafile.close()

plt.show()

And you'll get an image similar to this one:
GFS surface temperature from Cartopy

Now, lets say you wanted to mask out land areas, and add sea-ice to the map, to focus on the oceans.
Adding the sea-ice is easy, all it requires is for you to pull the icecsfc variable from the GFS data, and plot it overtop of the surface temperature map:

plt.pcolormesh(lon,lat,np.ma.masked_less(ice,0.1),vmin=0,vmax=1,cmap='Greys_r',zorder=3)

We use the numpy mask (ma) functionality to maskout gridcells where the ice cover is less than 10%.
To mask out land areas, we make use of the "Cartopy Feature" module, which handles shapefiles.  The Cartopy Feature module connects seemlessly to the Natural Earth Dataset, and allows for anyone to plot a host of different GIS datasets without needing to download individual shapefiles before hand.  The interface even has many pre-defined functions, including land.  This makes masking out the land areas, a breeze.

ax.add_feature(cfeature.LAND,zorder=4,color='gray')

In the above code, the "add_feature" function is used, and the pre-defined "LAND" feature from the Cartopy Feature package is supplied with a couple of basic keyword arguments.  Putting it all together:


fig=plt.figure(figsize=(11,5))
ax=plt.subplot(111,projection=ccrs.PlateCarree())
Z=plt.pcolormesh(lon,lat,sfcT_data-273.15,cmap='jet',vmin=-1,vmax=30.)
plt.pcolormesh(lon,lat,np.ma.masked_less(ice,0.1),vmin=0,vmax=1,cmap='Greys_r',zorder=3)
ax.add_feature(cfeature.LAND,zorder=4,color='gray')
pos=ax.get_position()

plt.title(title,fontweight='bold',loc='left')
plt.title("Valid:{} UTC".format(timestr),loc='right')
ax.coastlines(color='k',zorder=5)

cbar_ax=fig.add_axes([pos.x1+0.01,pos.y0,0.015,pos.height])
cbar=plt.colorbar(Z,cax=cbar_ax)
cbar.set_label(title)

datafile.close()

plt.show()

Running the above code will give you an image like this:
GFS SST and Sea-ice cover


Now, lets zoom in over the United States, and add states to the map.  To clip the extent, the "set_extent" function is used:

ax.set_extent([-125, -65, 25, 55],crs=ccrs.PlateCarree())

The list corresponds to lon/lat boundaries from [western-most longitude, eastern-most longitude, southern-most latitude, northern-most latitude] and the crs argument indicates which transform is being applied.  To add the states, you need to define the states from Natural Earth database. the "scale" argument

states = cfeature.NaturalEarthFeature(category='cultural', scale='50m', facecolor='none',
                         name='admin_1_states_provinces_shp',edgecolor='k')

 By changing the extent, and adding the states, you'll get a map that looks like this:

GFS SST centered over the United States


You'll notice that the "LAND" feature masks out the Great Lakes, which is unfortunate, and to my knowledge the only way to get both the land mask and the SST over lakes using GFS surface temperature data, is to use a Lakes shapefile with shapefile clipping, a technique that is beyond this simple introduction, but may be the subject of a future tutorial.

Finally, if you want to use a different map projection to plot the data, e.g., a North Polar Sterographic projection, you simply define a different projection when setting up your subplot, and use the "transform" keyword in the plotting function to specify that you are transforming your lon/lat coordinates from the PlateCarree() projection:

ax=plt.subplot(111,projection=ccrs.NorthPolarStereo())
Z=plt.pcolormesh(lon,lat,sfcT_data-273.15,cmap='jet',vmin=-1,vmax=30.,transform=ccrs.PlateCarree())
plt.pcolormesh(lon,lat,np.ma.masked_less(ice,0.1),vmin=0,vmax=1,cmap='Greys_r',zorder=3,transform=ccrs.PlateCarree())
ax.set_extent([-180, 180, 45, 90],crs=ccrs.PlateCarree()) 

Note, that we also reset the extent to focus on the Northern Latitudes.  You should get something that looks like this:
GFS Sea-ice and SST Polar Plot

That is the extent of this introductory tutorial on how to use Cartopy to work with atmospheric data.  You can learn more from the Cartopy Website, and get a list of different map projections here Cartopy Map Projections. Future tutorials on Cartopy will focus on working with shapefiles, transforming individual points, how to work with data without lat/lon coordinates, and how to put fine details on your map.

Python | Tutorial: Introduction to reading netCDF files

Introduction

NetCDF (Network Common Data Format) is an extremely useful way to store, transfer, subset, and otherwise work with gridded data sets.  This format is commonly encountered in the atmospheric sciences, climate, and oceanography communities.  It is much easier to work with than the "grib" data format, as it allows for metadata attributes to be attached directly to the datafile.  Finally, there are numerous stand alone programs (e.g., ncview, nco) and bindings (Fortran, Python, IDL, Matlab, GrADS:see here for my post on netCDF with GrADS) that make it easy to work with NetCDF data regardless of platform, or programming language.  Because NetCDF is so common, and because most of the blog posts on this site require some knowledge on how to work with NetCDF data, I'm going to provide a short tutorial on how to read and work with NetCDF files in Python.  In this tutorial, you will learn how to:
  • Open at NetCDF file using "NetCDF4"
  • Query the file attributes, dimensions, and variables
  • Convert NetCDF integer time to Python "datetime" objects
  • Extract monthly temperature anomaly
  • Interface with the NOMADS OpenDAP-alt data repository
Important Note: There are two supported packages (that I'm aware of) that can open and translate NetCDF data (NetCDF4 and xarray).  For this tutorial, I will focus on the NetCDF4 package, perhaps I will post an xarray tutorial later on, but if you're already reading and working with NetCDF data using xarray, and you have no intention on changing, then maybe this tutorial isn't for you.

Opening, and Querying a NetCDF file

First off, if you are going to work with the NetCDF data, you will need to have the NetCDF4 package installed and imported.

Once you have that package installed, we'll need to grab some data.  For this tutorial, I'll be using monthly mean air temperature data from the NCEP reanalysis.  To get this data, go to this website and download the "air.mon.mean.nc" file and place it in the same directory as the script you'll be working with.  Once that is done, you're ready to proceed with the tutorial.

The first step is to import the relevant packages:

import numpy as np
import netCDF4 as NC
from matplotlib import pyplot as plt 
from cartopy import crs as ccrs
import datetime as DT

I included the Cartopy import to plot the data on a map, if you don't have this package and you don't want to install it, that's fine you can simply remove that line from the import section, and skip the mapping portion of the tutorial.

Once the packages are imported, set the path to the file, and use the "Dataset" function to open the data file:

## Since file is in current folder, the path is just the file name
ncdata_path='air.mon.mean.nc'
ncdata=NC.Dataset(ncdata_path,'r')
print(ncdata)

This will result in the following output to the screen:

root group (NETCDF4_CLASSIC data model, file format HDF5):
    description: Data from NCEP initialized reanalysis (4x/day).  These are the 0.9950 sigma level values
    platform: Model
    Conventions: COARDS
    NCO: 20121012
    history: Thu May  4 20:11:16 2000: ncrcat -d time,0,623 /Datasets/ncep.reanalysis.derived/surface/air.mon.mean.nc air.mon.mean.nc
Thu May  4 18:11:50 2000: ncrcat -d time,0,622 /Datasets/ncep.reanalysis.derived/surface/air.mon.mean.nc ./surface/air.mon.mean.nc
Mon Jul  5 23:47:18 1999: ncrcat ./air.mon.mean.nc /Datasets/ncep.reanalysis.derived/surface/air.mon.mean.nc /dm/dmwork/nmc.rean.ingest/combinedMMs/surface/air.mon.mean.nc
/home/hoop/crdc/cpreanjuke2farm/cpreanjuke2farm Mon Oct 23 21:04:20 1995 from air.sfc.gauss.85.nc
created 95/03/13 by Hoop (netCDF2.3)
Converted to chunked, deflated non-packed NetCDF4 2014/09
    title: monthly mean air.sig995 from the NCEP Reanalysis
    References: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.derived.html
    dataset_title: NCEP-NCAR Reanalysis 1
    dimensions(sizes): lat(73), lon(144), time(863)
    variables(dimensions): float32 lat(lat), float32 lon(lon), float64 time(time), float32 air(time,lat,lon)
    groups: 

Note that a lot of the information above is probably not relevant to you, but you can see some pieces of information in there that might be useful.  One thing that you notice, is that there printed output has colon (:) separated text, these correspond to the global metadata attributes, and structures attached to the file, and they can be accessed individually using the following syntax:
For example, accessing the file "description"

print(ncdata.description)

Returns:

Data from NCEP initialized reanalysis (4x/day).  These are the 0.9950 sigma level values

Which you'll notice matches the text after the colon separating the word "description"

Though, it can be kind of burdensom to have to print the entire netCDF file, just to see what's in it before you can start extracting data.  One way to findout what attributes the file has without dumping the entire file out to screen is to use the "ncattrs" function which is a shorthand for "NetCDF (nc) + attributes (attrs) - ncattrs" to find out what attributes the file has:

print(ncdata.ncattrs())

Returns:

['description', 'platform', 'Conventions', 'NCO', 'history',
'title', 'References', 'dataset_title']

You can use the "ncgetattr" function to return an attribute value for a given attribute.  Following the example above:

print(ncdata.getncattr('description'))

Returns:

Data from NCEP initialized reanalysis (4x/day).  These are the 0.9950 sigma level values

You may be asking the question: why would I use the "getncattr" function to get an attribute instead of just using the shorthand "ncdata.description"?  The advantage is that the getncattr function allows you to pass string information, where as the shorthand method does not, which makes it more versitile incase you don't know what the attributes are.

Loading NetCDF variables and Dimensions

Dimensions:

The dimensions structure includes the dimensions used in the file.  In geographic datasets, these are typically x,y,z,time.  x and y are often latitude and longitude, but not always (for example, WRF data has the x and y dimensions as "west_east" and "south_north").  z is usually a vertical coordinate (sigma, height, pressure, etc), and time is usually a floating point value referenced to a specific starting time.  The file dimensions can be printed simply:

print(ncdata.dimensions)

Returns:

OrderedDict([('lat', <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73), 
('lon', <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144), 
('time', <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 863)])

What you get is a dictionary of dimensions with names and sizes: in this instance, latitude, longitude, and time.  Now, the key about the dimensions, is that EACH variable will have dimensions that will be some combination of the dimensions within the dimensions dictionary.

Variables:

Simiarly, the variables, can be accessed:

for v in ncdata.variables:
    print(v)

Returns:
lat
lon
time
air

Similar to the dimensions object, the variables object is a dictionary, where the variables can be accessed using keys.  To demonstrate, I'm going to access the "time" variable and convert the time values into Python datetime.
Printing the time variable:

print(ncdata.variables['time'])

Returns:
<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    long_name: Time
    delta_t: 0000-01-00 00:00:00
    avg_period: 0000-01-00 00:00:00
    prev_avg_period: 0000-00-01 00:00:00
    standard_name: time
    axis: T
    units: hours since 1800-01-01 00:00:0.0
    actual_range: [1297320. 1927008.]
unlimited dimensions: time
current shape = (863,)
filling on, default _FillValue of 9.969209968386869e+36 used

This statement tells us that the time data is stored as a 64bit (double) float, and is hours since 1800-01-01 00 UTC. We also see that it's only dimension is the "time" dimension.  So, from this data, we can use the datetime strptime and timedelta functions to convert the floating point values to a list of datetime objects.  Knowing the string format of the "units" attribute, the starting "strp" time object is set following:

starttime=DT.datetime.strptime(ncdata.variables['time'].units,
        'hours since %Y-%m-%d %H:%M:0.0')

Now, pulling the time data into a Numpy array, and using list iteration, we can build a list of datetime objects:

time_values=ncdata.variables['time'][:]
strptimes=[starttime+DT.timedelta(hours=i) for i in time_values]

Okay, now that we have a list of times, let's use it to get a time index for a specific input time.  For example: March 1997:

choice='1997-03-01'
choice_strp=DT.datetime.strptime(choice,'%Y-%m-%d')
tidx=np.argmin(np.abs(np.array(strptimes)-choice_strp))

print("Index:%i, time:"%tidx, strptimes[tidx])

Returns:

Index:590, time: 1997-03-01 00:00:00

Now, finally, lets plot the data!  Looking back at the variables, data, we see a lat,lon and air variable as well.  Each variable can be printed to find out it's metadata but, for brevity, I'll just tell you that, lon and lat are each 1-dimensional arrays that vary with their corresponding dimensions, and air is air temperature varying in time,lat,lon.

Simply accessing the data for each dimension:

t2=ncdata.variables['air'][tidx,:]
lon=ncdata.variables['lon'][:]
lat=ncdata.variables['lat'][:]

and plotting it:

plt.figure(figsize=(10,7))
plt.pcolormesh(lon,lat,t2,cmap='jet')
plt.show()
Simple air temperature plot

Or, if you want to include the Cartopy mapping:

plt.figure(figsize=(10,7))
ax=plt.subplot(111,projection=ccrs.PlateCarree())
Z=ax.pcolormesh(lon,lat,t2,cmap='jet')
plt.colorbar(Z)
ax.coastlines()
plt.show()

With a map and colorbar.

Finally, since 1997 was a famous En Nino year, we can plot the annual temperature anomaly by passing a tuple of time indices to the time index of the data array and plot the annual mean temperature anomaly.

month_idx=[i for i in range(len(strptimes)) if strptimes[i].strftime('%Y')=='1997']
nino_mean=np.mean(ncdata.variables['air'][tuple(month_idx),:],axis=0)
total_mean=np.mean(ncdata.variables['air'][:],axis=0)

plt.figure(figsize=(10,7))
ax=plt.subplot(111,projection=ccrs.PlateCarree())
Z=ax.pcolormesh(lon,lat,nino_mean-total_mean,cmap='coolwarm',vmin=-2.5,vmax=2.5)
plt.title("Annual Mean Temperature Anomaly: 1997")
plt.colorbar(Z)
ax.coastlines()
plt.show()

Basically, we build a tuple of indices where that matches where the time array indicates 1997 and then average the temperature data for all of those indicies to get the 1997 mean temperature.  We then compute a total mean, which takes the mean along the entire time axis, and difference the two values.  This results in the following image:
1997 ENSO event shown in the 0.995 sigma NCEP reanalysis temperature field

Interfacing with the NOMADS server

One great aspect of using the NetCDF library is that you can access near realtime model data from the NOAA NOMADS data server.  This section will be a short add on, since there isn't a whole lot new to add, but it will show you how to access near real time model data.

First, a simple line of code is required to a) get todays date, and b) define a filepath to the data.  In this example, I'll use the 0.25 degree GFS data, explore the NOMADS server for more options.  Assuming you have imported the packages above:

today=DT.datetime.now().strftime('%Y%m%d')
input_file='http://nomads.ncep.noaa.gov:80/dods/gfs_0p25/gfs%s/gfs_0p25_00z'%today
gfsdata=NC.Dataset(input_file,'r')

Now, you are welcome (encouraged?) to query the file using the same techniques as above, but for brevity, I'll just write down the code to store the lat/lon values into arrays (which similarly, are 1-D arrays cooresponding to x and y).  Note that time is stored in a very similar way as in the NCEP reanalysis file and can be handled in an identical way, but won't be demonstrated here.

lons=gfsdata.variables['lon'][:]
lats=gfsdata.variables['lat'][:] 

Then the data can be extracted and plotted (in this example, 2 meter temperature at time index 6):

t2=gfsdata.variables['tmp2m'][6,:]

plt.figure(figsize=(11,9))
ax=plt.subplot(111,projection=ccrs.PlateCarree())
plt.pcolormesh(lons,lats,t2,cmap='jet')
plt.colorbar()
plt.title(gfsdata.variables['tmp2m'].long_name)
ax.coastlines()
plt.show()

Note that the above example gets the figure title directly from the NetCDF metadata.

Finally, it's always a good practice to close your NetCDF files when you are done using them:

gfsdata.close()

Note that when you close a NetCDF file, any data that you have loaded into a Numpy array is still accessible.  However!  Any variables that are just pointing to the NetCDF object are no longer accesible.  For example:

t2_pointer=gfsdata.variables['tmp2m']
t2_values=gfsdata.variables['tmp2m'][6,:]

gfsdata.close()
 
t2_values ## Still here!
t2_pointer ## GONE!

Wrap up

That's it for the introduction to NetCDF tutorial.  There is a lot left to be discovered and learned when it comes to working with NetCDF data using Python, however I tried to provide a basic overview on how to open, query, and load data from netCDF files with a eye towards more common applications. There is a lot of great documentation out there if you want to learn more, a good place to start is the Unidata Documentation.

So get out there and start working with NetCDF data!