Processing (s2stools.process)

Data processing.

Functions

s2stools.process.add_model_cycle_ecmwf(ds)

Add a coordinate cycle to a dataset that denotes the ecmwf model cycle.

Parameters:

ds (xr.Dataset) – ecmwf s2s forecast data

Returns:

ds – dataset with new coordinate

Return type:

xr.Dataset

s2stools.process.add_validtime(da)

Given a DataArray/ Dataset with dimensions (‘reftime’, ‘hc_year’, ‘leadtime’), add a coordinate validtime that indicates the target date of the forecast. Example: reftime=”2000-01-01”, hc_year=-1, leadtime=+3D corresponds to validtime “1999-01-03”.

Parameters:

da (xr.DataArray or xr.Dataset) – Input data, requires dimensions (‘reftime’, ‘hc_year’, ‘leadtime’).

Returns:

Same dataset as input, but with coordinate validtime.

Return type:

xr.DataArray or xr.Dataset

Notes

Validtime is of type np.datetime64 and it will not be a dimension.

Warning

Only dimension leadtime is supported, not days_since_init.

Warning

Only makes sense for ECMWF data.

s2stools.process.combine_s2s_and_reanalysis(s2s, reanalysis, ensfc=True)

Project reanalysis time series on S2S forecast data. Resulting object will have dimensions of s2s dataset.

Parameters:
  • s2s (xr.Dataset | xr.DataArray)

  • reanalysis (xr.Dataset | xr.DataArray)

  • ensfc (bool) – If True, stack resulting forecasts to ensemble forecasts ([reftime, hc_year] -> fc)

Returns:

combined_data

Return type:

xr.Dataset

Examples

>>> ds_s2s
<xarray.Dataset>
Dimensions:    (leadtime: 47, longitude: 2, latitude: 1, number: 51,
                reftime: 2, hc_year: 21)
Coordinates:
  * leadtime   (leadtime) timedelta64[ns] 0 days 1 days ... 45 days 46 days
  * longitude  (longitude) float32 -180.0 -177.5
  * latitude   (latitude) float32 60.0
  * number     (number) int64 0 1 2 3 4 5 6 7 8 9 ... 42 43 44 45 46 47 48 49 50
  * reftime    (reftime) datetime64[ns] 2017-11-16 2017-11-20
  * hc_year    (hc_year) int64 -20 -19 -18 -17 -16 -15 -14 ... -5 -4 -3 -2 -1 0
    validtime  (reftime, leadtime, hc_year) datetime64[ns] 1997-11-16 ... 201...
Data variables:
    u          (reftime, latitude, longitude, leadtime, hc_year, number) float32 dask.array<chunksize=(1, 1, 2, 47, 20, 1), meta=np.ndarray>
>>> ds_reanalysis
<xarray.Dataset>
Dimensions:    (time: 30, latitude: 1, longitude: 2)
Coordinates:
  * time       (time) datetime64[ns] 2017-11-01 2017-11-02 ... 2017-11-30
  * longitude  (longitude) float32 -180.0 -177.5
  * latitude   (latitude) float32 60.0
Data variables:
    u          (time, latitude, longitude) float32 dask.array<chunksize=(30, 1, 2), meta=np.ndarray>
>>> import s2stools.process
>>> s2stools.process.combine_s2s_and_reanalysis(s2s, reanalysis)
<xarray.Dataset>
Dimensions:    (leadtime: 47, longitude: 2, latitude: 1, number: 51,
                reftime: 2, hc_year: 21)
Coordinates:
  * leadtime   (leadtime) timedelta64[ns] 0 days 1 days ... 45 days 46 days
  * longitude  (longitude) float32 -180.0 -177.5
  * latitude   (latitude) float32 60.0
  * number     (number) int64 0 1 2 3 4 5 6 7 8 9 ... 42 43 44 45 46 47 48 49 50
  * reftime    (reftime) datetime64[ns] 2017-11-16 2017-11-20
  * hc_year    (hc_year) int64 -20 -19 -18 -17 -16 -15 -14 ... -5 -4 -3 -2 -1 0
    validtime  (reftime, leadtime, hc_year) datetime64[ns] 1997-11-16 ... 201...
Data variables:
    u          (reftime, latitude, longitude, leadtime, hc_year, number) float32 dask.array<chunksize=(1, 1, 2, 47, 20, 1), meta=np.ndarray>
    u_verif    (reftime, leadtime, hc_year, latitude, longitude) float32 dask.array<chunksize=(2, 47, 21, 1, 2), meta=np.ndarray>
s2stools.process.concat_era5_before_s2s(s2s: DataArray, era5: DataArray, max_neg_leadtime_days: int = 46) DataArray

Append ERA5 prior to start of forecasts, ERA5 is indicated as negative leadtimes.

Parameters:
  • s2s (xr.DataArray)

  • era5 (xr.DataArray) – requires dimension time

  • max_neg_leadtime_days (int) – maximum negative leadtime (i.e. number of ERA5 days to append)

Returns:

da – dataset with s2s and era5 combined

Return type:

xr.DataArray

s2stools.process.reft_hc_year_to_fc_init_date(s2s_data)

Go from dimensions (reftime, hc_year) to dimension fc_init_date. :param d: :type d: xr.DataArray | xr.Dataset

Returns:

data

Return type:

xr.DataArray | xr.Dataset

s2stools.process.s2sparser(ds)

Will create dimensions reftime, hc_year, leadtime. Coordinate validtime is automatically added. Files need to have the forecast realtime date somewhere in the filename, e.g., s2s_something_2017-11-16.nc.

Parameters:

ds (xr.Dataset) – dataset

Return type:

xr.Dataset

Warning

Realtime and hindcast forecasts are combined in a single dataset. If they have different ensemble sizes, then the resulting dataset is larger than necessary as coordinates span full dimension space, e.g., ensemble members 12-51 are padded with NaN. For a more efficient solution consider using xarray-datatree.

Examples

>>> # Use in the following form:
>>> ds = xr.open_mfdataset("/some/path/filename_2017*.nc", preprocess=s2stools.process.s2sparser)
>>> ds
<xarray.Dataset>
Dimensions:    (leadtime: 47, longitude: 2, latitude: 1, number: 51,
                reftime: 2, hc_year: 21)
Coordinates:
  * leadtime   (leadtime) timedelta64[ns] 0 days 1 days ... 45 days 46 days
  * longitude  (longitude) float32 -180.0 -177.5
  * latitude   (latitude) float32 60.0
  * number     (number) int64 0 1 2 3 4 5 6 7 8 9 ... 42 43 44 45 46 47 48 49 50
  * reftime    (reftime) datetime64[ns] 2017-11-16 2017-11-20
  * hc_year    (hc_year) int64 -20 -19 -18 -17 -16 -15 -14 ... -5 -4 -3 -2 -1 0
    validtime  (reftime, leadtime, hc_year) datetime64[ns] 1997-11-16 ... 201...
Data variables:
    u          (reftime, latitude, longitude, leadtime, hc_year, number) float32 dask.array<chunksize=(1, 1, 2, 47, 20, 1), meta=np.ndarray>
s2stools.process.save_one_file_per_reftime(data: Dataset, path: str, create_subdirectory=None)

Save S2S Dataset with one file per reftime.

Parameters:
  • data (xr.Dataset) – xr.Dataset Data

  • path (str) – str target path including filename. _REFTIME.nc will be added. E.g.: /home/foo/s2s_somefilename

  • data – Dataset to save.

  • path – target path including filename. _REFTIME.nc will be added. E.g.: /home/foo/s2s_somefilename

  • create_subdirectory (str or None) – Check if the subdirectory exists. If yes, raise error. If no, create subdirectory and save files into this subdirectory. Defaults to None, where no subdirectory is created and the files are just saved to ‘path’.

s2stools.process.sel_fc_around_dates(s2s, dates, tolerance_days)

Select forecasts around specific dates.

Parameters:
  • s2s (xr.Dataset) – Dataset with dimensions (‘reftime’, ‘hc_year’, ‘leadtime’)

  • dates (list) – List of dates to select around.

  • tolerance_days (int) – Tolerance in days to select around the date. E.g., if tolerance_days=3, then include forecasts from 3 days before and after the date.

Return type:

xr.Dataset

s2stools.process.stack_ensfc(d, reset_index=True)

Go from dimensions (reftime, hc_year) to dimension fc.

Parameters:
  • d (xr.DataArray | xr.Dataset)

  • reset_index (bool) – If True, drop multiindex and flatten around new index fc.

Returns:

data

Return type:

xr.DataArray | xr.Dataset

s2stools.process.stack_fc(d, reset_index=True)

Go from dimensions (reftime, hc_year, number) to dimension fc.

Parameters:
  • d (xr.DataArray | xr.Dataset)

  • reset_index (bool) – If True, drop multiindex and flatten around new index fc.

Returns:

data

Return type:

xr.DataArray | xr.Dataset