Basic Filtering#

The Filter Object#

The core object in GCM-Filters is the gcm_filters.Filter object. Its full documentation below enumerates all possible options.

gcm_filters.Filter(filter_scale: float, dx_min: float, filter_shape: gcm_filters.filter.FilterShape = FilterShape.GAUSSIAN, transition_width: float = 3.141592653589793, ndim: int = 2, n_steps: int = 0, grid_type: gcm_filters.kernels.GridType = GridType.REGULAR, grid_vars: dict = <factory>) None[source]#

A class for applying diffusion-based smoothing filters to gridded data.

Parameters
  • filter_scale (float) – The filter scale, which has different meaning depending on filter shape

  • dx_min (float) – The smallest grid spacing. Should have same units as filter_scale

  • n_steps (int, optional) – Number of total steps in the filter n_steps == 0 means the number of steps is chosen automatically

  • filter_shape (FilterShape) –

    • FilterShape.GAUSSIAN: The target filter has shape \(e^{-(k filter_scale)^2/24}\)

    • FilterShape.TAPER: The target filter has target grid scale Lf. Smaller scales are zeroed out. Scales larger than pi * filter_scale / 2 are left as-is. In between is a smooth transition.

  • transition_width (float, optional) – Width of the transition region in the “Taper” filter. This is a nondimensional parameter. Theoretical minimum is 1; not recommended.

  • ndim (int, optional) – Laplacian is applied on a grid of dimension ndim

  • grid_type (GridType) – what sort of grid we are dealing with

  • grid_vars (dict) – dictionary of extra parameters used to initialize the grid Laplacian

gcm_filters.filter_spec#
Type

FilterSpec

Details related to filter_scale, filter_shape, transition_width, and n_steps can be found in the Filter Theory. The following sections explain the options for grid_type and grid_vars in more detail.

Grid types#

To define a filter, we need to pick a grid and associated Laplacian that matches our data. The currently implemented grid types are:

In [1]: import gcm_filters

In [2]: list(gcm_filters.GridType)
Out[2]: 
[<GridType.REGULAR: 1>,
 <GridType.REGULAR_AREA_WEIGHTED: 2>,
 <GridType.REGULAR_WITH_LAND: 3>,
 <GridType.REGULAR_WITH_LAND_AREA_WEIGHTED: 4>,
 <GridType.IRREGULAR_WITH_LAND: 5>,
 <GridType.MOM5U: 6>,
 <GridType.MOM5T: 7>,
 <GridType.TRIPOLAR_REGULAR_WITH_LAND_AREA_WEIGHTED: 8>,
 <GridType.TRIPOLAR_POP_WITH_LAND: 9>,
 <GridType.VECTOR_C_GRID: 10>,
 <GridType.VECTOR_B_GRID: 11>]

This list will grow as we implement more Laplacians.

The following table provides an overview of these different grid type options: what grid they are suitable for, whether they handle land (i.e., continental boundaries), what boundary condition the Laplacian operators use, and whether they come with a scalar or vector Laplacian. You can also find links to example usages.

GridType

Grid

Handles land

Boundary condition

Laplacian type

Example

REGULAR

Cartesian grid

no

periodic

Scalar Laplacian

REGULAR_WITH_LAND

Cartesian grid

yes

periodic

Scalar Laplacian

see below

IRREGULAR_WITH_LAND

locally orthogonal grid

yes

periodic

Scalar Laplacian

Example: Different filter types; Example: Filtering on a tripole grid

MOM5U

Velocity-point on Arakawa B-Grid

yes

periodic

Scalar Laplacian

MOM5T

Tracer-point on Arakawa B-Grid

yes

periodic

Scalar Laplacian

TRIPOLAR_POP_WITH_LAND

locally orthogonal grid

yes

tripole

Scalar Laplacian

Example: Filtering on a tripole grid

VECTOR_C_GRID

Arakawa C-Grid

yes

periodic

Vector Laplacian

Example: Viscosity-based filtering with vector Laplacians

VECTOR_B_GRID

Arakawa B-Grid

no

periodic

Vector Laplacian

Grid types with scalar Laplacians can be used for filtering scalar fields (such as temperature), and grid types with vector Laplacians can be used for filtering vector fields (such as velocity).

Grid types for simple fixed factor filtering#

The remaining grid types are for a special type of filtering: simple fixed factor filtering to achieve a fixed coarsening factor (see also the Filter Theory). If you specify one of the following grid types for your data, gcm_filters will internally transform your original (locally orthogonal) grid to a uniform Cartesian grid with dx = dy = 1, and perform fixed factor filtering on the uniform grid. After this is done, gcm_filters transforms the filtered field back to your original grid. In practice, this coordinate transformation is achieved by area weighting and deweighting (see Filter Theory). This is why the following grid types have the suffix AREA_WEIGHTED.

GridType

Grid

Handles land

Boundary condition

Laplacian type

Example

REGULAR_AREA_WEIGHTED

locally orthogonal grid

no

periodic

Scalar Laplacian

REGULAR_WITH_LAND_AREA_WEIGHTED

locally orthogonal grid

yes

periodic

Scalar Laplacian

Example: Different filter types; Example: Filtering on a tripole grid

TRIPOLAR_REGULAR_WITH_LAND_AREA_WEIGHTED

locally orthogonal grid

yes

tripole

Scalar Laplacian

Example: Filtering on a tripole grid

Grid variables#

Each grid type from the above two tables has different grid variables that must be provided as Xarray DataArrays. For example, let’s assume we are on a Cartesian grid (with uniform grid spacing equal to 1), and we want to use the grid type REGULAR_WITH_LAND. To find out what the required grid variables for this grid type are, we can use this utility function.

In [3]: gcm_filters.required_grid_vars(gcm_filters.GridType.REGULAR_WITH_LAND)
Out[3]: ['wet_mask']

wet_mask is a binary array representing the topography on our grid. Here the convention is that the array is 1 in the ocean (“wet points”) and 0 on land (“dry points”).

In [4]: import numpy as np

In [5]: import xarray as xr

In [6]: ny, nx = (128, 256)

In [7]: mask_data = np.ones((ny, nx))

In [8]: mask_data[(ny // 4):(3 * ny // 4), (nx // 4):(3 * nx // 4)] = 0

In [9]: wet_mask = xr.DataArray(mask_data, dims=['y', 'x'])
In [10]: wet_mask.plot()
Out[10]: <matplotlib.collections.QuadMesh at 0x7f8c30485a60>
_images/wet_mask.png

We have made a big island.

Note

Some more complicated grid types require more grid variables. The units for these variables should be consistent, but no specific system of units is required. For example, if grid cell edge lengths are defined using kilometers, then the filter scale and dx_min should also be defined using kilometers, and the grid cell areas should be defined in square kilometers.

Creating the Filter Object#

We create a filter object as follows.

In [11]: filter = gcm_filters.Filter(
   ....:     filter_scale=4,
   ....:     dx_min=1,
   ....:     filter_shape=gcm_filters.FilterShape.TAPER,
   ....:     grid_type=gcm_filters.GridType.REGULAR_WITH_LAND,
   ....:     grid_vars={'wet_mask': wet_mask}
   ....: )
   ....: 

In [12]: filter
Out[12]: Filter(filter_scale=4, dx_min=1, filter_shape=<FilterShape.TAPER: 2>, transition_width=3.141592653589793, ndim=2, n_steps=16, grid_type=<GridType.REGULAR_WITH_LAND: 3>)

The string representation for the filter object in the last line includes some of the parameters it was initiliazed with, to help us keep track of what we are doing. We have created a Taper filter that will filter our data by a fixed factor of 4.

Applying the Filter#

We can now apply the filter object that we created above to some data. Let’s create a random 3D cube of data that matches our grid.

In [13]: nt = 10

In [14]: data = np.random.rand(nt, ny, nx)

In [15]: da = xr.DataArray(data, dims=['time', 'y', 'x'])

In [16]: da
Out[16]: 
<xarray.DataArray (time: 10, y: 128, x: 256)>
array([[[0.7074392 , 0.81828527, 0.62863659, ..., 0.73194604,
         0.10693183, 0.12137072],
        [0.98881469, 0.5701012 , 0.83761258, ..., 0.08070533,
         0.09830769, 0.62869727],
        [0.42775611, 0.65161021, 0.8600364 , ..., 0.95797136,
         0.45427737, 0.57859573],
        ...,
        [0.62683527, 0.97030028, 0.73533072, ..., 0.59335079,
         0.25676299, 0.27855712],
        [0.30611674, 0.46198307, 0.61073857, ..., 0.28003507,
         0.49201069, 0.80749359],
        [0.236806  , 0.74119811, 0.97169449, ..., 0.29549282,
         0.4674991 , 0.52066305]],

       [[0.78385668, 0.07530331, 0.7514596 , ..., 0.24489073,
         0.29141307, 0.10565768],
        [0.95098482, 0.14336557, 0.33130425, ..., 0.37628938,
         0.29533895, 0.53478026],
        [0.19823983, 0.0109688 , 0.75465871, ..., 0.97788013,
         0.54291089, 0.38699874],
...
        [0.89551575, 0.94642642, 0.48164683, ..., 0.21300739,
         0.30780798, 0.54495586],
        [0.86208027, 0.32109764, 0.54255056, ..., 0.39772938,
         0.1229329 , 0.15236659],
        [0.19493015, 0.2352083 , 0.74357449, ..., 0.78543824,
         0.3749175 , 0.88473122]],

       [[0.84688902, 0.41125019, 0.48942885, ..., 0.96038552,
         0.25256197, 0.54025994],
        [0.97637066, 0.59977378, 0.32250734, ..., 0.18747654,
         0.16178786, 0.60262526],
        [0.90845076, 0.74185569, 0.48882033, ..., 0.38163716,
         0.36941489, 0.47863831],
        ...,
        [0.12757214, 0.32191528, 0.42800586, ..., 0.29913583,
         0.55497331, 0.86406829],
        [0.87769071, 0.86685717, 0.04509744, ..., 0.52494717,
         0.23723283, 0.49799475],
        [0.68397689, 0.85251511, 0.68446446, ..., 0.50633588,
         0.54030658, 0.43468439]]])
Dimensions without coordinates: time, y, x

We now mask our data with the wet_mask.

In [17]: da_masked = da.where(wet_mask)
In [18]: da_masked.isel(time=0).plot()
Out[18]: <matplotlib.collections.QuadMesh at 0x7f8c30314e50>
_images/data.png

Now that we have some data, we can apply our filter. We need to specify which dimension names to apply the filter over. In this case, it is y, x.

Warning

The dimension order matters! Since some filters deal with anisotropic grids, the latitude / y dimension must appear first in order to obtain the correct result. That is not an issue for this simple (isotropic) toy example but needs to be kept in mind for applications on real GCM grids.

In [19]: %time da_filtered = filter.apply(da_masked, dims=['y', 'x'])
CPU times: user 111 ms, sys: 7.87 ms, total: 119 ms
Wall time: 119 ms
In [20]: da_filtered
Out[20]: 
<xarray.DataArray (time: 10, y: 128, x: 256)>
array([[[0.56686122, 0.67249646, 0.7105007 , ..., 0.34642739,
         0.36913155, 0.44778233],
        [0.59732818, 0.70546256, 0.75224115, ..., 0.36225413,
         0.40086592, 0.48103801],
        [0.62432079, 0.70310426, 0.73250805, ..., 0.42797248,
         0.47510899, 0.53867868],
        ...,
        [0.54336261, 0.58211737, 0.55114237, ..., 0.44763521,
         0.4328567 , 0.47373531],
        [0.54632614, 0.60231948, 0.58300179, ..., 0.40910925,
         0.40373282, 0.45885067],
        [0.54886445, 0.6307255 , 0.64008326, ..., 0.36933652,
         0.37601494, 0.44387261]],

       [[0.44585404, 0.43346771, 0.42071701, ..., 0.49100089,
         0.47772443, 0.4600166 ],
        [0.43716094, 0.43652837, 0.44196488, ..., 0.51618983,
         0.48250874, 0.45106404],
        [0.47334658, 0.48037704, 0.49281464, ..., 0.55359778,
         0.51422504, 0.48221341],
...
        [0.45743025, 0.49361885, 0.52107751, ..., 0.4971073 ,
         0.43178525, 0.42488075],
        [0.48414096, 0.51044626, 0.52802758, ..., 0.50621422,
         0.45345386, 0.45436477],
        [0.51349707, 0.51775347, 0.51836714, ..., 0.53299232,
         0.49769609, 0.49989308]],

       [[0.65978038, 0.67835527, 0.64778327, ..., 0.46224667,
         0.51463319, 0.59197152],
        [0.61918622, 0.64086146, 0.6141962 , ..., 0.44994037,
         0.48495446, 0.5530137 ],
        [0.53793961, 0.54733042, 0.52539844, ..., 0.45275402,
         0.46157765, 0.49920368],
        ...,
        [0.45428433, 0.44189328, 0.42281765, ..., 0.44574484,
         0.45623819, 0.45681163],
        [0.54428   , 0.53837721, 0.51099241, ..., 0.4496    ,
         0.48655199, 0.52152306],
        [0.63073988, 0.6374326 , 0.60651134, ..., 0.46350603,
         0.51672991, 0.58098036]]])
Dimensions without coordinates: time, y, x

Let’s visualize what the filter did.

In [21]: da_filtered.isel(time=0).plot()
Out[21]: <matplotlib.collections.QuadMesh at 0x7f8c3025e610>
_images/data_filtered.png

Using Dask#

Up to now, we have filtered eagerly; when we called .apply, the results were computed immediately and stored in memory. GCM-Filters is also designed to work seamlessly with Dask array inputs. With dask, we can filter lazily, deferring the filter computations and possibly executing them in parallel. We can do this with our synthetic data by converting them to dask.

In [22]: da_dask = da_masked.chunk({'time': 2})

In [23]: da_dask
Out[23]: 
<xarray.DataArray (time: 10, y: 128, x: 256)>
dask.array<xarray-<this-array>, shape=(10, 128, 256), dtype=float64, chunksize=(2, 128, 256), chunktype=numpy.ndarray>
Dimensions without coordinates: time, y, x

We now filter our data lazily.

In [24]: da_filtered_lazy = filter.apply(da_dask, dims=['y', 'x'])

In [25]: da_filtered_lazy
Out[25]: 
<xarray.DataArray (time: 10, y: 128, x: 256)>
dask.array<transpose, shape=(10, 128, 256), dtype=float64, chunksize=(2, 128, 256), chunktype=numpy.ndarray>
Dimensions without coordinates: time, y, x

Nothing has actually been computed yet. We can trigger computation as follows:

In [26]: %time da_filtered_computed = da_filtered_lazy.compute()
CPU times: user 335 ms, sys: 20.4 ms, total: 356 ms
Wall time: 264 ms

Here we got only a very modest speedup because our example data are too small. For bigger data, the performance benefit will be more evident.