Basic Filtering
Contents
Basic Filtering#
The Filter Object#
The core object in GCMFilters is the gcm_filters.Filter
object. Its full documentation below enumerates all possible options.
 gcm_filters.Filter(filter_scale: float, dx_min: float, filter_shape: gcm_filters.filter.FilterShape = FilterShape.GAUSSIAN, transition_width: float = 3.141592653589793, ndim: int = 2, n_steps: int = 0, grid_type: gcm_filters.kernels.GridType = GridType.REGULAR, grid_vars: dict = <factory>) None [source]#
A class for applying diffusionbased smoothing filters to gridded data.
 Parameters
filter_scale (float) – The filter scale, which has different meaning depending on filter shape
dx_min (float) – The smallest grid spacing. Should have same units as
filter_scale
n_steps (int, optional) – Number of total steps in the filter
n_steps == 0
means the number of steps is chosen automaticallyfilter_shape (FilterShape) –
FilterShape.GAUSSIAN
: The target filter has shape \(e^{(k filter_scale)^2/24}\)FilterShape.TAPER
: The target filter has target grid scale Lf. Smaller scales are zeroed out. Scales larger thanpi * filter_scale / 2
are left asis. In between is a smooth transition.
transition_width (float, optional) – Width of the transition region in the “Taper” filter. This is a nondimensional parameter. Theoretical minimum is 1; not recommended.
ndim (int, optional) – Laplacian is applied on a grid of dimension ndim
grid_type (GridType) – what sort of grid we are dealing with
grid_vars (dict) – dictionary of extra parameters used to initialize the grid Laplacian
 gcm_filters.filter_spec#
 Type
FilterSpec
Details related to filter_scale
, filter_shape
, transition_width
, and n_steps
can be found in the Filter Theory.
The following sections explain the options for grid_type
and grid_vars
in more detail.
Grid types#
To define a filter, we need to pick a grid and associated Laplacian that matches our data. The currently implemented grid types are:
In [1]: import gcm_filters
In [2]: list(gcm_filters.GridType)
Out[2]:
[<GridType.REGULAR: 1>,
<GridType.REGULAR_AREA_WEIGHTED: 2>,
<GridType.REGULAR_WITH_LAND: 3>,
<GridType.REGULAR_WITH_LAND_AREA_WEIGHTED: 4>,
<GridType.IRREGULAR_WITH_LAND: 5>,
<GridType.MOM5U: 6>,
<GridType.MOM5T: 7>,
<GridType.TRIPOLAR_REGULAR_WITH_LAND_AREA_WEIGHTED: 8>,
<GridType.TRIPOLAR_POP_WITH_LAND: 9>,
<GridType.VECTOR_C_GRID: 10>,
<GridType.VECTOR_B_GRID: 11>]
This list will grow as we implement more Laplacians.
The following table provides an overview of these different grid type options: what grid they are suitable for, whether they handle land (i.e., continental boundaries), what boundary condition the Laplacian operators use, and whether they come with a scalar or vector Laplacian. You can also find links to example usages.

Grid 
Handles land 
Boundary condition 
Laplacian type 
Example 


Cartesian grid 
no 
periodic 
Scalar Laplacian 


Cartesian grid 
yes 
periodic 
Scalar Laplacian 
see below 

locally orthogonal grid 
yes 
periodic 
Scalar Laplacian 
Example: Different filter types; Example: Filtering on a tripole grid 

Velocitypoint on Arakawa BGrid 
yes 
periodic 
Scalar Laplacian 


Tracerpoint on Arakawa BGrid 
yes 
periodic 
Scalar Laplacian 


locally orthogonal grid 
yes 
tripole 
Scalar Laplacian 


yes 
periodic 
Vector Laplacian 


no 
periodic 
Vector Laplacian 
Grid types with scalar Laplacians can be used for filtering scalar fields (such as temperature), and grid types with vector Laplacians can be used for filtering vector fields (such as velocity).
Grid types for simple fixed factor filtering#
The remaining grid types are for a special type of filtering: simple fixed factor filtering to achieve a fixed coarsening factor (see also the Filter Theory). If you specify one of the following grid types for your data, gcm_filters
will internally transform your original (locally orthogonal) grid to a uniform Cartesian grid with dx = dy = 1, and perform fixed factor filtering on the uniform grid. After this is done, gcm_filters
transforms the filtered field back to your original grid.
In practice, this coordinate transformation is achieved by area weighting and deweighting (see Filter Theory). This is why the following grid types have the suffix AREA_WEIGHTED
.

Grid 
Handles land 
Boundary condition 
Laplacian type 
Example 


locally orthogonal grid 
no 
periodic 
Scalar Laplacian 


locally orthogonal grid 
yes 
periodic 
Scalar Laplacian 
Example: Different filter types; Example: Filtering on a tripole grid 

locally orthogonal grid 
yes 
tripole 
Scalar Laplacian 
Grid variables#
Each grid type from the above two tables has different grid variables that must be provided as Xarray DataArrays. For example, let’s assume we are on a Cartesian grid (with uniform grid spacing equal to 1), and we want to use the grid type REGULAR_WITH_LAND
. To find out what the required grid variables for this grid type are, we can use this utility function.
In [3]: gcm_filters.required_grid_vars(gcm_filters.GridType.REGULAR_WITH_LAND)
Out[3]: ['wet_mask']
wet_mask
is a binary array representing the topography on our grid. Here the convention is that the array is 1 in the ocean (“wet points”) and 0 on land (“dry points”).
In [4]: import numpy as np
In [5]: import xarray as xr
In [6]: ny, nx = (128, 256)
In [7]: mask_data = np.ones((ny, nx))
In [8]: mask_data[(ny // 4):(3 * ny // 4), (nx // 4):(3 * nx // 4)] = 0
In [9]: wet_mask = xr.DataArray(mask_data, dims=['y', 'x'])
In [10]: wet_mask.plot()
Out[10]: <matplotlib.collections.QuadMesh at 0x7f62aa3c3af0>
We have made a big island.
Note
Some more complicated grid types require more grid variables.
The units for these variables should be consistent, but no specific system of units is required.
For example, if grid cell edge lengths are defined using kilometers, then the filter scale and dx_min
should also be defined using kilometers, and the grid cell areas should be defined in square kilometers.
Creating the Filter Object#
We create a filter object as follows.
In [11]: filter = gcm_filters.Filter(
....: filter_scale=4,
....: dx_min=1,
....: filter_shape=gcm_filters.FilterShape.TAPER,
....: grid_type=gcm_filters.GridType.REGULAR_WITH_LAND,
....: grid_vars={'wet_mask': wet_mask}
....: )
....:
In [12]: filter
Out[12]: Filter(filter_scale=4, dx_min=1, filter_shape=<FilterShape.TAPER: 2>, transition_width=3.141592653589793, ndim=2, n_steps=16, grid_type=<GridType.REGULAR_WITH_LAND: 3>)
The string representation for the filter object in the last line includes some of the parameters it was initiliazed with, to help us keep track of what we are doing. We have created a Taper filter that will filter our data by a fixed factor of 4.
Applying the Filter#
We can now apply the filter object that we created above to some data. Let’s create a random 3D cube of data that matches our grid.
In [13]: nt = 10
In [14]: data = np.random.rand(nt, ny, nx)
In [15]: da = xr.DataArray(data, dims=['time', 'y', 'x'])
In [16]: da
Out[16]:
<xarray.DataArray (time: 10, y: 128, x: 256)>
array([[[0.10235517, 0.43513619, 0.57866776, ..., 0.806658 ,
0.25237564, 0.36378704],
[0.50546965, 0.88930939, 0.204396 , ..., 0.05581885,
0.71452709, 0.54140329],
[0.24881243, 0.25654687, 0.49254108, ..., 0.79219097,
0.850574 , 0.18679747],
...,
[0.8205335 , 0.66658358, 0.07860287, ..., 0.25512925,
0.24808461, 0.00595309],
[0.98360626, 0.98142524, 0.05688472, ..., 0.4320012 ,
0.74840841, 0.71685946],
[0.7381581 , 0.85186482, 0.24351319, ..., 0.07597803,
0.40043371, 0.94918429]],
[[0.15175355, 0.27258104, 0.42398841, ..., 0.91165997,
0.46505718, 0.61317541],
[0.26747364, 0.19253424, 0.67738117, ..., 0.46283541,
0.76746048, 0.10032073],
[0.1569251 , 0.23568147, 0.34470908, ..., 0.37993209,
0.90421679, 0.29888855],
...
[0.46364628, 0.50675959, 0.55778457, ..., 0.76683701,
0.59109803, 0.42020473],
[0.19034977, 0.53997781, 0.66003877, ..., 0.00200596,
0.5969268 , 0.90881142],
[0.74786084, 0.53625735, 0.66522682, ..., 0.1381813 ,
0.38225689, 0.08218062]],
[[0.70623826, 0.35114303, 0.4056733 , ..., 0.66173123,
0.08816331, 0.16059868],
[0.56945738, 0.7554283 , 0.10849533, ..., 0.14777942,
0.56385977, 0.54075593],
[0.82673277, 0.03821578, 0.43046738, ..., 0.63369692,
0.95273244, 0.49830573],
...,
[0.52312969, 0.97058121, 0.4771245 , ..., 0.15223943,
0.94908872, 0.18109459],
[0.99480737, 0.9654951 , 0.238012 , ..., 0.4928985 ,
0.68092959, 0.51733106],
[0.7204479 , 0.34584703, 0.75808609, ..., 0.69819684,
0.76891711, 0.56513511]]])
Dimensions without coordinates: time, y, x
We now mask our data with the wet_mask
.
In [17]: da_masked = da.where(wet_mask)
In [18]: da_masked.isel(time=0).plot()
Out[18]: <matplotlib.collections.QuadMesh at 0x7f62aa243700>
Now that we have some data, we can apply our filter. We need to specify which dimension names to apply the filter over. In this case, it is y
, x
.
Warning
The dimension order matters! Since some filters deal with anisotropic grids, the latitude / y dimension must appear first in order to obtain the correct result. That is not an issue for this simple (isotropic) toy example but needs to be kept in mind for applications on real GCM grids.
In [19]: %time da_filtered = filter.apply(da_masked, dims=['y', 'x'])
CPU times: user 103 ms, sys: 15.9 ms, total: 119 ms
Wall time: 119 ms
In [20]: da_filtered
Out[20]:
<xarray.DataArray (time: 10, y: 128, x: 256)>
array([[[0.54180026, 0.50927122, 0.47732539, ..., 0.49022 ,
0.50567138, 0.53669937],
[0.48357446, 0.46588004, 0.47886116, ..., 0.51829065,
0.50515387, 0.49898239],
[0.4735169 , 0.4579321 , 0.49228825, ..., 0.53540732,
0.52462217, 0.50387113],
...,
[0.53856683, 0.47529779, 0.39342479, ..., 0.37043689,
0.45609217, 0.53058831],
[0.59049031, 0.52504758, 0.42795868, ..., 0.40041486,
0.48441551, 0.57057623],
[0.59622362, 0.54457838, 0.46774837, ..., 0.44973587,
0.50621417, 0.57593898]],
[[0.39888577, 0.40063149, 0.44086065, ..., 0.53165697,
0.4895094 , 0.43626325],
[0.42424285, 0.41661728, 0.43495698, ..., 0.55257771,
0.51640064, 0.46422052],
[0.47730315, 0.47559436, 0.46740204, ..., 0.53675185,
0.51459219, 0.4898871 ],
...
[0.49851936, 0.51946829, 0.52332678, ..., 0.37681304,
0.41614792, 0.46196628],
[0.45891691, 0.46551074, 0.45898813, ..., 0.35812626,
0.39530662, 0.43444512],
[0.43588416, 0.43007157, 0.42942322, ..., 0.37317449,
0.40981096, 0.43371926]],
[[0.55611332, 0.49730326, 0.40175134, ..., 0.44296885,
0.50032135, 0.55083462],
[0.52205949, 0.4767276 , 0.40192936, ..., 0.40752637,
0.46118769, 0.5111225 ],
[0.50632302, 0.48762638, 0.43900962, ..., 0.39086021,
0.43147859, 0.47989344],
...,
[0.63225293, 0.62335135, 0.56347023, ..., 0.49070787,
0.54563529, 0.59867336],
[0.65085376, 0.61810875, 0.52898973, ..., 0.51196025,
0.56971112, 0.62540196],
[0.61150863, 0.55840977, 0.45630733, ..., 0.48793236,
0.54574377, 0.5988762 ]]])
Dimensions without coordinates: time, y, x
Let’s visualize what the filter did.
In [21]: da_filtered.isel(time=0).plot()
Out[21]: <matplotlib.collections.QuadMesh at 0x7f62aa17a0d0>
Using Dask#
Up to now, we have filtered eagerly; when we called .apply
, the results were computed immediately and stored in memory.
GCMFilters
is also designed to work seamlessly with Dask array inputs. With dask, we can filter lazily, deferring the filter computations and possibly executing them in parallel.
We can do this with our synthetic data by converting them to dask.
In [22]: da_dask = da_masked.chunk({'time': 2})
In [23]: da_dask
Out[23]:
<xarray.DataArray (time: 10, y: 128, x: 256)>
dask.array<xarray<thisarray>, shape=(10, 128, 256), dtype=float64, chunksize=(2, 128, 256), chunktype=numpy.ndarray>
Dimensions without coordinates: time, y, x
We now filter our data lazily.
In [24]: da_filtered_lazy = filter.apply(da_dask, dims=['y', 'x'])
In [25]: da_filtered_lazy
Out[25]:
<xarray.DataArray (time: 10, y: 128, x: 256)>
dask.array<transpose, shape=(10, 128, 256), dtype=float64, chunksize=(2, 128, 256), chunktype=numpy.ndarray>
Dimensions without coordinates: time, y, x
Nothing has actually been computed yet. We can trigger computation as follows:
In [26]: %time da_filtered_computed = da_filtered_lazy.compute()
CPU times: user 208 ms, sys: 13.1 ms, total: 221 ms
Wall time: 129 ms
Here we got only a very modest speedup because our example data are too small. For bigger data, the performance benefit will be more evident.