Basic Filtering

The Filter Object

The core object in GCM-Filters is the gcm_filters.Filter object. Its full documentation below enumerates all possible options.

gcm_filters.Filter(filter_scale: float, dx_min: float, filter_shape: gcm_filters.filter.FilterShape = FilterShape.GAUSSIAN, transition_width: float = 3.141592653589793, ndim: int = 2, n_steps: int = 0, grid_type: gcm_filters.kernels.GridType = GridType.REGULAR, grid_vars: dict = <factory>) None[source]

A class for applying diffusion-based smoothing filters to gridded data.

filter_scalefloat

The filter scale, which has different meaning depending on filter shape

dx_minfloat

The smallest grid spacing. Should have same units as filter_scale

n_stepsint, optional

Number of total steps in the filter (A biharmonic step counts as two steps) n_steps == 0 means the number of steps is chosen automatically

filter_shapeFilterShape
  • FilterShape.GAUSSIAN: The target filter has shape \(e^{-(k filter_scale)^2/24}\)

  • FilterShape.TAPER: The target filter has target grid scale Lf. Smaller scales are zeroed out. Scales larger than pi * filter_scale / 2 are left as-is. In between is a smooth transition.

transition_widthfloat, optional

Width of the transition region in the “Taper” filter. Theoretical minimum is 1; not recommended.

ndimint, optional

Laplacian is applied on a grid of dimension ndim

grid_typeGridType

what sort of grid we are dealing with

grid_varsdict

dictionary of extra parameters used to initialize the grid Laplacian

gcm_filters.filter_spec
Type

FilterSpec

Details related to filter_scale, filter_shape, transition_width, and n_steps can be found in the Filter Theory. The following sections explain the options for grid_type and grid_vars in more detail.

Grid types

To define a filter, we need to pick a grid and associated Laplacian that matches our data. The currently implemented grid types are:

In [1]: import gcm_filters

In [2]: list(gcm_filters.GridType)
Out[2]: 
[<GridType.REGULAR: 1>,
 <GridType.REGULAR_AREA_WEIGHTED: 2>,
 <GridType.REGULAR_WITH_LAND: 3>,
 <GridType.REGULAR_WITH_LAND_AREA_WEIGHTED: 4>,
 <GridType.IRREGULAR_WITH_LAND: 5>,
 <GridType.MOM5U: 6>,
 <GridType.MOM5T: 7>,
 <GridType.TRIPOLAR_REGULAR_WITH_LAND_AREA_WEIGHTED: 8>,
 <GridType.TRIPOLAR_POP_WITH_LAND: 9>,
 <GridType.VECTOR_C_GRID: 10>]

This list will grow as we implement more Laplacians.

The following table provides an overview of these different grid type options: what grid they are suitable for, whether they handle land (i.e., continental boundaries), what boundary condition the Laplacian operators use, and whether they come with a scalar or vector Laplacian. You can also find links to example usages.

GridType

Grid

Handles land

Boundary condition

Laplacian type

Example

REGULAR

Cartesian grid

no

periodic

Scalar Laplacian

REGULAR_WITH_LAND

Cartesian grid

yes

periodic

Scalar Laplacian

see below

IRREGULAR_WITH_LAND

locally orthogonal grid

yes

periodic

Scalar Laplacian

Example: Different filter types; Example: Filtering on a tripole grid

MOM5U

Velocity-point on Arakawa B-Grid

yes

periodic

Scalar Laplacian

MOM5T

Tracer-point on Arakawa B-Grid

yes

periodic

Scalar Laplacian

TRIPOLAR_POP_WITH_LAND

locally orthogonal grid

yes

tripole

Scalar Laplacian

Example: Filtering on a tripole grid

VECTOR_C_GRID

Arakawa C-Grid

yes

periodic

Vector Laplacian

Example: Viscosity-based filtering with vector Laplacians

Grid types with scalar Laplacians can be used for filtering scalar fields (such as temperature), and grid types with vector Laplacians can be used for filtering vector fields (such as velocity).

Grid types for simple fixed factor filtering

The remaining grid types are for a special type of filtering: simple fixed factor filtering to achieve a fixed coarsening factor (see also the Filter Theory). If you specify one of the following grid types for your data, gcm_filters will internally transform your original (locally orthogonal) grid to a uniform Cartesian grid with dx = dy = 1, and perform fixed factor filtering on the uniform grid. After this is done, gcm_filters transforms the filtered field back to your original grid. In practice, this coordinate transformation is achieved by area weighting and deweighting (see Filter Theory). This is why the following grid types have the suffix AREA_WEIGHTED.

GridType

Grid

Handles land

Boundary condition

Laplacian type

Example

REGULAR_AREA_WEIGHTED

locally orthogonal grid

no

periodic

Scalar Laplacian

REGULAR_WITH_LAND_AREA_WEIGHTED

locally orthogonal grid

yes

periodic

Scalar Laplacian

Example: Different filter types; Example: Filtering on a tripole grid

TRIPOLAR_REGULAR_WITH_LAND_AREA_WEIGHTED

locally orthogonal grid

yes

tripole

Scalar Laplacian

Example: Filtering on a tripole grid

Grid variables

Each grid type from the above two tables has different grid variables that must be provided as Xarray DataArrays. For example, let’s assume we are on a Cartesian grid (with uniform grid spacing equal to 1), and we want to use the grid type REGULAR_WITH_LAND. To find out what the required grid variables for this grid type are, we can use this utility function.

In [3]: gcm_filters.required_grid_vars(gcm_filters.GridType.REGULAR_WITH_LAND)
Out[3]: ['wet_mask']

wet_mask is a binary array representing the topography on our grid. Here the convention is that the array is 1 in the ocean (“wet points”) and 0 on land (“dry points”).

In [4]: import numpy as np

In [5]: import xarray as xr

In [6]: ny, nx = (128, 256)

In [7]: mask_data = np.ones((ny, nx))

In [8]: mask_data[(ny // 4):(3 * ny // 4), (nx // 4):(3 * nx // 4)] = 0

In [9]: wet_mask = xr.DataArray(mask_data, dims=['y', 'x'])
In [10]: wet_mask.plot()
Out[10]: <matplotlib.collections.QuadMesh at 0x7f310741aa60>
_images/wet_mask.png

We have made a big island.

Creating the Filter Object

We create a filter object as follows.

In [11]: filter = gcm_filters.Filter(
   ....:     filter_scale=4,
   ....:     dx_min=1,
   ....:     filter_shape=gcm_filters.FilterShape.TAPER,
   ....:     grid_type=gcm_filters.GridType.REGULAR_WITH_LAND,
   ....:     grid_vars={'wet_mask': wet_mask}
   ....: )
   ....: 

In [12]: filter
Out[12]: Filter(filter_scale=4, dx_min=1, filter_shape=<FilterShape.TAPER: 2>, transition_width=3.141592653589793, ndim=2, n_steps=16, grid_type=<GridType.REGULAR_WITH_LAND: 3>)

The string representation for the filter object in the last line includes some of the parameters it was initiliazed with, to help us keep track of what we are doing. We have created a Taper filter that will filter our data by a fixed factor of 4.

Applying the Filter

We can now apply the filter object that we created above to some data. Let’s create a random 3D cube of data that matches our grid.

In [13]: nt = 10

In [14]: data = np.random.rand(nt, ny, nx)

In [15]: da = xr.DataArray(data, dims=['time', 'y', 'x'])

In [16]: da
Out[16]: 
<xarray.DataArray (time: 10, y: 128, x: 256)>
array([[[0.21482559, 0.34827533, 0.99115514, ..., 0.31080071,
         0.53049763, 0.46167167],
        [0.91414664, 0.1466529 , 0.19001729, ..., 0.82167823,
         0.34162895, 0.16819143],
        [0.63748149, 0.32902255, 0.89419113, ..., 0.23931612,
         0.70149501, 0.41709187],
        ...,
        [0.00740154, 0.00569117, 0.28220987, ..., 0.08536584,
         0.49428569, 0.62542103],
        [0.78411381, 0.31720441, 0.25315187, ..., 0.01088536,
         0.10367744, 0.79067807],
        [0.32577845, 0.48168196, 0.649204  , ..., 0.01677366,
         0.75490046, 0.00380429]],

       [[0.05588465, 0.89490286, 0.90186422, ..., 0.76552882,
         0.66164577, 0.00242357],
        [0.00602197, 0.54340005, 0.47926164, ..., 0.90342315,
         0.65416368, 0.13364844],
        [0.45265473, 0.62476603, 0.21727683, ..., 0.8269115 ,
         0.07841149, 0.41554473],
...
        [0.74973159, 0.94444632, 0.55066954, ..., 0.15045334,
         0.62795389, 0.37040103],
        [0.05698078, 0.87069418, 0.93716051, ..., 0.52677276,
         0.78530815, 0.09943461],
        [0.39743043, 0.74337617, 0.83241167, ..., 0.22350924,
         0.58301802, 0.11856071]],

       [[0.12447479, 0.00923474, 0.31615442, ..., 0.06531484,
         0.817872  , 0.87985659],
        [0.82445245, 0.92848131, 0.35863431, ..., 0.96500016,
         0.27812668, 0.03211635],
        [0.91513602, 0.80247855, 0.92863826, ..., 0.20248401,
         0.41530577, 0.58374913],
        ...,
        [0.89297321, 0.48417899, 0.09877201, ..., 0.66867598,
         0.59636351, 0.73775463],
        [0.5414683 , 0.84745727, 0.2142708 , ..., 0.92382817,
         0.79031301, 0.52012836],
        [0.69686792, 0.80823503, 0.78360249, ..., 0.23605224,
         0.31355182, 0.34213993]]])
Dimensions without coordinates: time, y, x

We now mask our data with the wet_mask.

In [17]: da_masked = da.where(wet_mask)
In [18]: da_masked.isel(time=0).plot()
Out[18]: <matplotlib.collections.QuadMesh at 0x7f30ff2f9a30>
_images/data.png

Now that we have some data, we can apply our filter. We need to specify which dimension names to apply the filter over. In this case, it is y, x.

In [19]: %time da_filtered = filter.apply(da_masked, dims=['y', 'x'])
CPU times: user 79.9 ms, sys: 12 ms, total: 91.9 ms
Wall time: 92.7 ms
In [20]: da_filtered
Out[20]: 
<xarray.DataArray (time: 10, y: 128, x: 256)>
array([[[0.4172859 , 0.44066133, 0.48267258, ..., 0.37930248,
         0.38034291, 0.39962993],
        [0.45279895, 0.46566277, 0.49802588, ..., 0.44491946,
         0.44058854, 0.44753946],
        [0.46351011, 0.47080046, 0.49532031, ..., 0.47527721,
         0.46779336, 0.46532188],
        ...,
        [0.49755664, 0.50126811, 0.48573478, ..., 0.36518244,
         0.40599994, 0.46310028],
        [0.42198917, 0.45294133, 0.47865345, ..., 0.30512787,
         0.32905049, 0.37907262],
        [0.39487306, 0.42940222, 0.4731927 , ..., 0.3150185 ,
         0.32545848, 0.3609767 ]],

       [[0.41804694, 0.47260544, 0.51928121, ..., 0.52784809,
         0.46597333, 0.41423249],
        [0.38259106, 0.42496724, 0.47981002, ..., 0.54789976,
         0.47459218, 0.40225109],
        [0.37309529, 0.40501947, 0.45941727, ..., 0.53404006,
         0.46976511, 0.40096884],
...
        [0.54732379, 0.60141455, 0.61294758, ..., 0.360898  ,
         0.41605412, 0.47793768],
        [0.52149051, 0.585214  , 0.61275567, ..., 0.34610425,
         0.40057793, 0.45533347],
        [0.47420115, 0.52649768, 0.55295606, ..., 0.34194682,
         0.39066573, 0.42836296]],

       [[0.54414573, 0.54565976, 0.54754221, ..., 0.46802521,
         0.49523066, 0.52605625],
        [0.55324482, 0.54521661, 0.53547611, ..., 0.51993292,
         0.53472374, 0.54927423],
        [0.56152551, 0.53095052, 0.50372881, ..., 0.57322753,
         0.58494179, 0.58158461],
        ...,
        [0.60063748, 0.53753151, 0.47556577, ..., 0.54638271,
         0.60189586, 0.62545642],
        [0.58706702, 0.5459631 , 0.50855335, ..., 0.49455808,
         0.55523304, 0.59257783],
        [0.55845533, 0.54621159, 0.53721998, ..., 0.45808458,
         0.50514634, 0.54534977]]])
Dimensions without coordinates: time, y, x

Let’s visualize what the filter did.

In [21]: da_filtered.isel(time=0).plot()
Out[21]: <matplotlib.collections.QuadMesh at 0x7f30ff219d90>
_images/data_filtered.png

Using Dask

Up to now, we have filtered eagerly; when we called .apply, the results were computed immediately and stored in memory. GCM-Filters is also designed to work seamlessly with Dask array inputs. With dask, we can filter lazily, deferring the filter computations and possibly executing them in parallel. We can do this with our synthetic data by converting them to dask.

In [22]: da_dask = da_masked.chunk({'time': 2})

In [23]: da_dask
Out[23]: 
<xarray.DataArray (time: 10, y: 128, x: 256)>
dask.array<xarray-<this-array>, shape=(10, 128, 256), dtype=float64, chunksize=(2, 128, 256), chunktype=numpy.ndarray>
Dimensions without coordinates: time, y, x

We now filter our data lazily.

In [24]: da_filtered_lazy = filter.apply(da_dask, dims=['y', 'x'])

In [25]: da_filtered_lazy
Out[25]: 
<xarray.DataArray (time: 10, y: 128, x: 256)>
dask.array<transpose, shape=(10, 128, 256), dtype=float64, chunksize=(2, 128, 256), chunktype=numpy.ndarray>
Dimensions without coordinates: time, y, x

Nothing has actually been computed yet. We can trigger computation as follows:

In [26]: %time da_filtered_computed = da_filtered_lazy.compute()
CPU times: user 174 ms, sys: 11.8 ms, total: 186 ms
Wall time: 107 ms

Here we got only a very modest speedup because our example data are too small. For bigger data, the performance benefit will be more evident.