Chapter 4

Architecture of ocean monitoring and forecasting systems


Avichal Mehra

Roland Aznar, Stefania Ciliberti, Laurence Crosnier, Marie Drevillon, Yann Drillet, Begoña Pérez Gómez, Antonio Reppucci, Joseph Sudheer, Marcos Garcia Sotillo, Marina Tonani, P. N. Vinaychandranand, and Aihong Zhong

4.5 Validation and Verification

Operational ocean services provide routine marine products to an ever-widening community of users and stakeholders. Some of the products delivered are generated by means of ocean models (i.e. forecasts, analyses, or reanalyses). Ocean models are powerful computational tools able to produce useful information in the absence of (or in between) ground truth information. The reliability of this information depends on the realism of the model itself, but also on the accuracy of its initial and boundary conditions, as well as on the capacity to constrain this model with contemporaneous high-quality observations. This information on models’ quality and performance is almost more crucial for the end-users than the model solutions themselves. Thus, the reliability of model solutions must be assessed, and the MPQ must be quantified at the analysis, forecast, and reanalysis stages; it has also to be properly documented for end-users.

The purpose of this section is to give a general overview of the commonly used methodology and processes applied by existing operational ocean services to validate and verify their ocean model products. In particular, standard validation metrics and protocols were designed for oceanography model analyses and forecasts, and agreed among the community of OceanPredict forecasters (Hernandez et al, 2015, 2018). This section is focused on describing these validation methodologies and standards for model products. Specific details on the thematic (process oriented) validation for each kind of model use in the OO community (i.e., waves, storm surge, ocean circulation, biogeochemical, etc.), along with examples, illustrations and use cases, can be found in Chapters 5 to 9.

4.5.1 Basis statistical tools for time series validation

Several metrics can be computed for a quantitative analysis of the model-data time series validation: bias, maximum error MaxErr, RMSE, Pearson correlation coefficient (R) or Scatter Index (SI) are some of the most common examples and are obtained as:

Where Pi and Oi refer to the forecasted and observed signals respectively, N is the number of time records, and (¯) is the mean operator. Other type of skill scores can be used, such as the Coefficient of Efficiency (COE ) (Legates and McCabe, 1999, 2013) obtained as:

A perfect model has a COE = 1.0, COE = 0.0: this implies that the model is no more able to predict the measured values than the measured mean; a negative COE value would indicate that the computed signal performs worse than the measured mean.

4.5.2 Ocean forecasting standard metrics for validation and intercomparison

There are different types of model products (i.e. forecast, analysis, reanalysis) and different types of model evaluation methodologies, which are mostly based on the comparison with reference values, aiming at building performance and skill scores. Among others, some of the most applied methods to assess OO models are:

  1. Analysis (or forecast at various forecast lengths) versus contemporaneous observations (in situ, but also satellite) in the observations’ space. This type of comparison to observations is also performed by the data assimilation system, so it is usually extensively used in operational oceanography. Since ocean in-situ observations are sparse and unevenly distributed, representativeness issues are frequent. Depending on the observation’s coverage, the comparisons are either local (at one given observation location) or the statistics of the differences between model solutions and the observations are computed over rather large areas or long periods of time.
  2. Model forecast versus model analysis (or observation only). In this case, the model forecast for a specific day is compared to the analysis of the same day, assuming that the analysis is the best available estimate of the ocean state for that day; this methodology can be applied only in delayed mode, when the analysis is available. The forecast can also be compared with gridded observations (an analysis of observations only, for instance satellite L4 observations).
  3. Forecast versus persistence. Model fields at various forecast lengths are compared to their initial condition. The forecast is compared with the persistence of the last analysis available (or observations), in other words it is compared to what would have been the best estimate of the ocean state of that day if no model forecast were available. This comparison is performed expecting that the model forecast is more accurate than persistence and allows to quantify the skill of the forecast.
  4. Analysis (or forecast) versus climatology or versus literature estimates for less observed quantities. This approach is commonly used with currents or transports.
  5. Observed versus modelled feature structure. In this case, the structure (location or intensity) of an observed feature (such as an ocean front or eddy) is compared to its modelled counterpart. Categorical scores can be defined from this type of model validation, possibly introducing space and/or time lags.

The results of these comparisons between model outputs and reference values can be combined in different ways to derive MPQ monitoring scores or metrics. In the numerical weather prediction community, there is a long tradition in model forecast verification methods with vigorous progresses related to the advent of probabilistic methods into operational numerical weather prediction (Jolliffe and Stephenson, 2003; Nurmi, 2003). On the other hand, the OO forecasting community, conditioned by the limited number of oceanic observations and their uneven distribution (mostly of them, surface ones), has shown that quality assessment must include four types of metrics to properly assess the consistency, representativeness, accuracy, performance, and robustness of ocean model outputs (Crosnier and Le Provost, 2007; Hernandez et al., 2009). These four classes of metrics (Figure 4.28) were adopted by GODAE OceanPredict and they have been extensively used in different OO initiatives. For instance, these four classes (with specific computation methods and definition of reference geographical areas) have allowed regular intercomparison exercises between global and regional ocean forecasts (see Ryan et al. (2015) for a global ocean forecasts intercomparison). A last type of metrics, defined from user feedback and called “user oriented” (such as categorical scores point 5), is also instrumental for the quantification of uncertainties dedicated to specific applications (Maksymczuk et al., 2016). Categorical scores using space and time lags or specific case studies, can also help considering the double penalty effect that can lower statistical performance while comparing high resolution model outputs with observations, as pointed out by Crocker, et al. (2020).

Figure 4.28. Classes of metrics currently used in the OceanPredict community to monitor the quality of ocean analyses and forecasts: a complete range of statistics and comparisons in space and time are necessary to assess the consistency, representativeness, accuracy, performance, and robustness of ocean model outputs.

4.5.3 Qualification, validation and verification processes in support of operational ocean models’ production

Qualification, validation and verification are terms commonly used in the quality control of OO model products. Usually, qualification refers to model quality assessment at the development stage, during which model parameters are optimised. In OO services, such as the Copernicus Marine Service, the qualification phase refers to a comprehensive scientific assessment of any new/updated operational ocean model application, which is performed before the entry into service of the proposed system (Sotillo et al., 2021). This qualification phase is often used to quantify the added value of the updated model system with respect to its previous existing version, comparing the performances of both system versions (Vn+1 versus Vn) against a well-defined list of metrics, and using the same referential observational data. On the other hand, validation refers to the operational ocean analyses and forecast performance assessment, while in operation. Finally, verification is defined by Hernandez et al. (2015) as the a posteriori quantification of operational ocean forecast skill, preferentially based on independent data, which means observational products not used to constrain the model products; for instance, by means of any kind of data assimilation.

Achieving the best possible MPQ is a major objective for OO centres, and a MPQ itself is a key performance indicator for any OO service. Several model quality assessment stages can be defined along the life of an OO model product. Figure 4.29 illustrates the typical MPQ assurance loop adopted by OO services to ensure and quantify the quality of their model products. This approach is becoming popular across OO services to deal with MPQ at each major stage of development of an operational oceanography model (i.e. development, transition into operations, operational routine, and “after sales service” including delayed mode validation and expertise), using dedicated model assessment processes, and it counts with a long tradition in the operational meteorological and climate community.

Figure 4.29. Schematic view of different Model Product Quality assessment processes applied along the life of an OperationalOceanography (OO) service productin the development and dissemination stages. All processes rely on the use of the standard metrics (Figure 4.28)to compare the model productwith observations aswell aswith other model solutions.

As shown in Figure 4.29, six main steps or phases can be distinguished within the MPQ assurance process. The first one, focused on research and development activities, supports the implementation/update of new/existing model products to be operationally delivered. At this research and development phase, relevant scientific quality information is developed - and that can also later published in peer reviewed publications - mostly ensuring that the ocean model application is state-of-the-art and based as much as possible on cutting-edge science. Both model versus observations (model-obs) comparisons and intercomparisons with other available model solutions (model-model intercomparisons) can be performed in support of this forecasting system development phase, and they are the basis for the evaluation of model sensitivity tests and scenarios. User oriented metrics, such as categorical scores or Lagrangian drift evaluations, (Drévillon et al, 2013) can be used in specific case studies to quantify the impact of changes in the model system, either during the system development phase or to prepare specific OSEs and OSSEs.

When the new model set-up application is scientifically tested and before the model system is scheduled for entry into service, there is a pre-operational qualification stage, along which the expected (reference) products’ quality is established. In the qualification phase, it is critical that the model solution tested is generated in a pre-operational environment that ensures analogous conditions (i.e. same model applications, same type of forcing data, and analogous observational data sources to be assimilated) to the ones that are later applied in operations. It is also important to compare the quality of the product with its previous versions to ensure that there is no regression in terms of MPQ. The stability in time of the performance of the model is also assessed, using a data record of at least one year. Finally, as an outcome from this phase, the OO services can issue the “static” reference documentation on the quality of the product using the different assessment metrics computed. The document can be later delivered to end-users together with the product itself; for instance, see the QUID delivered together with any Copernicus Marine Service ocean product.

Once the model system is in operation, the OO centres perform the scientific validation and verification of the model products delivered on a routine online near-real-time basis, together with the control of the operational production. This on-line validation usually includes forecast model assessments with the available observational data sources (specially from NRT operational products) or with other model solutions (more recent available analysis or, in the case of regional models, comparisons with the parent solution in which are nested). This first on-line validation process is later completed with an extra assessment done in delayed mode. This delayed-mode validation, performed typically monthly, allows to generate more complete and robust validation metrics, extending the obs-model comparisons using observational information from extra data sources or more quality-controlled ones and more complete series of analyses and forecast cycles.

Finally, user feedback focused on specific processes, areas or events, as well as extra model product assessments performed by the producers themselves or by producers in collaborative frameworks (such as scientific research projects or other initiatives with targeted end-users) can significantly enhance the knowledge of the model products.

OO services are continuously progressing towards the regular delivery of up-to-date quality information, although there are remaining gaps in operational capacities to assess model solutions, mostly linked to shortcoming in the availability of ocean observations, and specially in NRT. Observational data used for model skill assessment and validation are mainly originating from drifting profilers, fixed mooring platforms, tide gauges, and remote sensing data. In their review on the operational modelling capacity in the European Seas, Capet et al. (2020), point out that only 20% of operational model services provide a dynamic uncertainty together with the forecast products. This uncertainty would be required for a real-time provision of confidence levels associated with the forecasts as, for instance, is usual in weather forecasts. This lack of uncertainty information, associated with a lack of observations, affects also the data assimilation capacity (Capet et al. (2020) noted that data assimilation is only implemented for 23% of the surveyed models, remaining exceptional in biogeochemical systems). The development of ensemble forecasting and that of probabilistic uncertainty information may help to fill this gap in the future. Peng et al. (2021) stressed the need for findability, accessibility, interoperability and reusability (FAIR data principles) of the information in earth science datasets. This confirms that pertinent product quality information has to be developed further as part of OO services.


