Adding a Custom Reader to SatPy¶
- In order to add a reader to satpy, you will need to create two files:
- a YAML file for describing the files to read and the datasets that are available
- a python file implementing the actual reading of the datasets and metadata
For this tutorial, we will implement a reader for the Eumetsat NetCDF format for SEVIRI data
Naming your reader¶
SatPy tries to follow a standard scheme for naming its readers. These names are used in filenames, but are also used by users so it is important that the name be recognizable and clear. Although some special cases exist, most fit in to the following naming scheme:
<sensor>[_<processing level>[_<level detail>]][_<file format>]
All components of the name should be lowercase and use underscores as the main separator between fields. Hyphens should be used as an intra-field separator if needed (ex. goes-imager).
sensor: | The first component of the name represents the sensor or
instrument that observed the data stored in the files being read. If
the files are the output of a specific processing software or a certain
algorithm implementation that supports multiple sensors then a lowercase
version of that software’s name should be used (e.g. clavrx for CLAVR-x,
nucaps for NUCAPS). The sensor field is the only required field of
the naming scheme. If it is actually an instrument name then the reader
name should include one of the other optional fields. If sensor is a
software package then that may be enough without any additional
information to uniquely identify the reader. |
---|---|
processing level: | |
This field marks the specific level of processing or
calibration that has been performed to produce the data in the files being
read. Common values of this field include: sdr for Sensor Data
Record (SDR), edr for Environmental Data Record (EDR), l1b for
Level 1B, and l2 for Level 2. |
|
level detail: | In cases where the processing level is not enough to completely
define the reader this field can be used to provide a little more context.
For example, some VIIRS EDR products are specific to a particular field
of study or type of scientific event, like a flood or cloud product. In
these cases the detail field can be added to produce a name like
viirs_edr_flood . This field shouldn’t be used unless processing level
is also specified. |
file format: | If the file format of the files is informative to the user or can distinguish one reader from another then this field should be specified. Common format names should be abbreviated following existing abbreviations like nc for NetCDF3 or NetCDF4, hdf for HDF4, h5 for HDF5. |
The existing reader’s table can be used for reference. When in doubt, reader names can be discussed in the github pull request when this reader is added to SatPy or a github issue.
The YAML file¶
- The yaml file is composed of three sections:
- the
reader
section, that provides basic parameters for the reader - the
file_types
section, which gives the patterns of the files this reader can handle - the
datasets
section, describing the datasets available from this reader
- the
The reader
section¶
The reader
section, that provides basic parameters for the reader.
- The parameters to provide in this section are:
- description: General description of the reader
- name: this is the name of the reader, it should be the same as the filename (without the .yaml extension). The naming convention for this is described above in the Naming your reader section above.
- sensors: the list of sensors this reader will support
- reader: the metareader to use, in most cases the
FileYAMLReader
is a good choice.
reader:
description: NetCDF4 reader for the Eumetsat MSG format
name: nc_seviri_l1b
sensors: [seviri]
reader: !!python/name:satpy.readers.yaml_reader.FileYAMLReader
The file_types
section¶
- Each file type needs to provide:
file_reader
, the class that will handle the files for this reader, that you will implement in the corresponding python file (see next section)file_patterns
, the patterns to match to find files this reader can handle. The syntax to use is basically the same asformat
with the addition of time. See the trollsift package documentation for more details.- Optionally, a file type can have a
requires
field: it is a list of file types that the current file types needs to function. For example, the HRIT MSG format segment files each need a prologue and epilogue file to be read properly, hence in this case we have addedrequires: [HRIT_PRO, HRIT_EPI]
to the file type definition.
file_types:
nc_seviri_l1b:
file_reader: !!python/name:satpy.readers.nc_seviri_l1b.NCSEVIRIFileHandler
file_patterns: ['W_XX-EUMETSAT-Darmstadt,VIS+IR+IMAGERY,{satid:4s}+SEVIRI_C_EUMG_{processing_time:%Y%m%d%H%M%S}.nc']
nc_seviri_l1b_hrv:
file_reader: !!python/name:satpy.readers.nc_seviri_l1b.NCSEVIRIHRVFileHandler
file_patterns: ['W_XX-EUMETSAT-Darmstadt,HRV+IMAGERY,{satid:4s}+SEVIRI_C_EUMG_{processing_time:%Y%m%d%H%M%S}.nc']
The datasets
section¶
The datasets section describes each dataset available in the files. The parameters provided are made available to the methods of the implementing class.
- Parameters you can define for example are:
- name
- sensor
- resolution
- wavelength
- polarization
- standard_name: the name used for the dataset, that will be used for knowing what kind of data it is and handle it appropriately
- units: the units of the data, important to get consistent processing across multiple platforms/instruments
- modifiers: what modification have already been applied to the data, eg
sunz_corrected
- file_type
- coordinates: this tells which datasets to load to navigate the current dataset
- and any other field that is relevant for the reader
This section can be copied and adapted simply from existing seviri
readers, like for example the msg_native
reader.
datasets:
HRV:
name: HRV
resolution: 1000.134348869
wavelength: [0.5, 0.7, 0.9]
calibration:
reflectance:
standard_name: toa_bidirectional_reflectance
units: "%"
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b_hrv
IR_016:
name: IR_016
resolution: 3000.403165817
wavelength: [1.5, 1.64, 1.78]
calibration:
reflectance:
standard_name: toa_bidirectional_reflectance
units: "%"
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b
nc_key: 'ch3'
IR_039:
name: IR_039
resolution: 3000.403165817
wavelength: [3.48, 3.92, 4.36]
calibration:
brightness_temperature:
standard_name: toa_brightness_temperature
units: K
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b
nc_key: 'ch4'
IR_087:
name: IR_087
resolution: 3000.403165817
wavelength: [8.3, 8.7, 9.1]
calibration:
brightness_temperature:
standard_name: toa_brightness_temperature
units: K
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b
IR_097:
name: IR_097
resolution: 3000.403165817
wavelength: [9.38, 9.66, 9.94]
calibration:
brightness_temperature:
standard_name: toa_brightness_temperature
units: K
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b
IR_108:
name: IR_108
resolution: 3000.403165817
wavelength: [9.8, 10.8, 11.8]
calibration:
brightness_temperature:
standard_name: toa_brightness_temperature
units: K
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b
IR_120:
name: IR_120
resolution: 3000.403165817
wavelength: [11.0, 12.0, 13.0]
calibration:
brightness_temperature:
standard_name: toa_brightness_temperature
units: K
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b
IR_134:
name: IR_134
resolution: 3000.403165817
wavelength: [12.4, 13.4, 14.4]
calibration:
brightness_temperature:
standard_name: toa_brightness_temperature
units: K
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b
VIS006:
name: VIS006
resolution: 3000.403165817
wavelength: [0.56, 0.635, 0.71]
calibration:
reflectance:
standard_name: toa_bidirectional_reflectance
units: "%"
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b
VIS008:
name: VIS008
resolution: 3000.403165817
wavelength: [0.74, 0.81, 0.88]
calibration:
reflectance:
standard_name: toa_bidirectional_reflectance
units: "%"
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b
WV_062:
name: WV_062
resolution: 3000.403165817
wavelength: [5.35, 6.25, 7.15]
calibration:
brightness_temperature:
standard_name: toa_brightness_temperature
units: "K"
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b
WV_073:
name: WV_073
resolution: 3000.403165817
wavelength: [6.85, 7.35, 7.85]
calibration:
brightness_temperature:
standard_name: toa_brightness_temperature
units: "K"
radiance:
standard_name: toa_outgoing_radiance_per_unit_wavelength
units: W m-2 um-1 sr-1
counts:
standard_name: counts
units: count
file_type: nc_seviri_l1b
The YAML file is now ready, let’s go on with the corresponding python file.
The python file¶
The python files needs to implement a file handler class for each file type that we want to read. Such a class needs to implement a few methods:
- the
__init__
method, that takes as arguments
- the filename (string)
- the filename info (dict) that we get by parsing the filename using the pattern defined in the yaml file
- the filetype info that we get from the filetype definition in the yaml file
This method can also recieve other file handler instances as parameter if the filetype at hand has requirements. (See the explanation in the YAML file filetype section above)
- the
get_dataset
method, which takes as arguments
- the dataset ID of the dataset to load
- the dataset info that is the description of the channel in the YAML file
This method has to return an xarray.DataArray instance if the loading is successful, containing the data and metadata of the loaded dataset, or return None if the loading was unsuccessful.
- the
get_area_def
method, that takes as single argument the dataset ID for which we want the area. For the data that cannot be geolocated with an area definition, the pixel coordinates need to be loadable fromget_dataset
for the resulting scene to be navigated. That is, if the data cannot be geolocated with an area definition then the dataset section should specifycoordinates: [longitude_dataset, latitude_dataset]
- Optionally, the
get_bounding_box
method can be implemented if filtering files by area is desirable for this data type
On top of that, two attributes need to be defined: start_time
and
end_time
, that define the start and end times of the sensing.
# this is nc_seviri_l1b.py
class NCSEVIRIFileHandler():
def __init__(self, filename, filename_info, filetype_info):
super(NCSEVIRIFileHandler, self).__init__(filename, filename_info, filetype_info)
self.nc = None
def get_dataset(self, dataset_id, dataset_info):
if dataset_id.calibration != 'radiance':
# TODO: implement calibration to relfectance or brightness temperature
return
if self.nc is None:
self.nc = xr.open_dataset(self.filename,
decode_cf=True,
mask_and_scale=True,
chunks={'num_columns_vis_ir': CHUNK_SIZE,
'num_rows_vis_ir': CHUNK_SIZE})
self.nc = self.nc.rename({'num_columns_vir_ir': 'x', 'num_rows_vir_ir': 'y'})
dataset = self.nc[dataset_info['nc_key']]
dataset.attrs.update(dataset_info)
return dataset
def get_area_def(self, dataset_id):
# TODO
pass
class NCSEVIRIHRVFileHandler():
# left as an exercise to the reader :)