Copyright (C) 2017 Marco van Hulten marco@klimato.org See the end of the file for licence conditions.

ComPlot README.md

ComPlot logo

Summary

ComPlot is a library of Ferret scripts that aims to create intuitive plots comparing one or more model outputs with one or more observational datasets. The main purpose of ComPlot is to compare model results with data by plotting the data onto the model output as dots. It is meant for oceanographers who want to compare ocean model output with observations, and do so in freedom. ComPlot is free software. This means you are free to run it, to study and change it, and to redistribute copies with or without changes. If you publish something using ComPlot, please cite the associated published paper.

Content of directory

This directory tree holds ComPlot which is the following files and directories:

*: The paper is licensed under both CC-BY-SA-3.0 and CC-BY-4.0 to be compatible with the licence requirements of both Savannah and that of the Journal of Open Source Software. These conditions also apply to logo.svg and mang.png.

If you find bugs, please check if they are reported, and, if not, report them through the online bug tracking system. There is also a mailinglist where you can discuss issues and receive release notifications.

Availability

You can download the latest version of the software with Mercurial through this command:

hg clone http://hg.savannah.nongnu.org/hgweb/complot/

or you can browse the repository on the web. Alternatively, you may download a release from the download area. In any case, you need to download the data set file complot_dsets.tar.gz from that location.

Installation

If you have not installed Ferret yet, please do so by following the online instructions. You need at least version 6.95. The 64-bit Linux RH6 files work on most GNU systems. After obtaining the three filesets, follow the installation instructions. On a 32 bit system you need to download the 32-bit Linux RH6 fer_{environment,executables}.tar.gz files, and in case of Debian 9 execute as root apt-get install libcurl4-gnutls-dev && cd /usr/lib/ && ln -s i386-linux-gnu/libcurl-gnutls.so.4 libcurl.so.4. To test your success, execute ferret and try to plot surface salinity from a provided dataset:

use levitus_climatology
shade salt[k=1]

Assuming you have a personal Ferret directory ~/ferret/, download the most recent ComPlot software, for instance:

cd ~/ferret/
wget http://download.savannah.nongnu.org/releases/complot/complot-1.0.0.tar.gz
tar -xzf complot-1.0.0.tar.gz

Alternatively, you can check out the latest version through Mercurial:

cd ~/ferret/
hg clone http://hg.savannah.nongnu.org/hgweb/complot/

The repository does not contain the large NetCDF files in complot_dsets.tar.gz, so you need to download the datasets (complot_dsets.tar.gz) separately and extract them:

cd complot/
wget http://download.savannah.nongnu.org/releases/complot/complot_dsets.tar.gz
tar -xzf complot_dsets.tar.gz && rm complot_dsets.tar.gz

and, if using Bash, add this to your .bashrc (after the line where you source ferret_paths.sh which is included by your Ferret installation):

export FER_GO="${HOME}/ferret/complot/scripts ${FER_GO}"
export FER_PALETTE="${HOME}/ferret/complot/palettes ${FER_PALETTE}"
export FER_DATA=". ${HOME}/ferret/complot/data ${FER_DATA}"

Ferret should provide these scripts already that are required by ComPlot:

land.jnl        fland.jnl
margins.jnl     polymark.jnl

The Unix commands awk(1) and sed(1) are needed as well and should be provided by any GNU or BSD system. You may put lim2lev.awk in your ~/bin/ directory (or symlink it there), assuming your ${PATH} contains ${HOME}/bin; or you can add ~/ferret/complot/bin to your ${PATH}. Sourcing .bashrc and starting Ferret or PyFerret should make Ferret aware of any ${FER_*} variables added to your .bashrc:

. ~/.bashrc
pyferret || ferret

Usage

At the moment you need to put model output in the directory where you run Ferret (and . must be in the FER_DATA shell variable):

ln -s ${HOME}/ferret/complot/data/Fmang-16B05-LD40_Bio_1y_ptrc_T_P80.nc
ln -s ${HOME}/ferret/complot/data/Fmang-16B05-LD40_Bio_1y_diad_T_P80.nc
ferret

Note that those files are not part of the code repository but need to be downloaded seperately as described in the installation instructions above.

Quick demonstation

As you can see in init_tracers.jnl, by default we set study = "demo" so that the relevant demonstation model output and observational datasets are loaded. For a nice demonstration you can issue this command:

go complot_demo

Usage with proper data

You first need to edit init_tracers.jnl. The variable study may be changed to some string that will define what model output and observational datasets to load. In load_data.jnl you may define an array of datasets names (datanames) to be loaded for the study specified. You either need to edit load_{model,data}.jnl and setup_NEMO_filenames. The file load_model.jnl loads the actual model data, but the model data are specified in set_model_files.jnl, again per study. ComPlot has been tested for NEMO output, but other model outputs can be loaded as well. There is a manual way to specify the model files by setting study to "none" and specifying the ptrc (tracer output) and diad (diagnostics) in the respective code block of set_model_files.jnl. They can optionally be arrays of strings, but they must contain the same number of strings. You may set diad equal to ptrc if all of you model variables (defined in set_model_variables.jnl) are defined in that file. There is also a provision for NorESM/CESM in the same file where the filenames are defined. Only the use of NEMO output has been tested so far. So ComPlot may at the moment only accept NEMO-like filename formatting.

Advanced usage and developer information

The filenames are put together based on the choice of study defining the simulation name (simulations), the job number (jobind) and the time index in the output file (timeind):

simulations     String of simulation name
jobind      Integer of job number
timeind     Integer of the time index

Any or all of these variables can be arrays (of the respective type). The variable names are singular, because they are scalars in the trivial case (which may still be treated as arrays in Ferret; it is a sort of polymorphism). You may even mix arrays and scalars, as long as any two arrays of length larger than one have the same length. Furthermore, the simulations can be grouped in different studies. Often these correspond with the different models you are working on, or different reports or papers. At the moment only one study can be loaded. You may want to, for instance, look at different timesteps of one model output, or you may like to analyse several sensitivity simulations all at the same timestep. You also need to set a model output frequency: output_freq = 1y in the case of yearly averaged model output. From these variables two others are built by means of concatenation:

ptrc = simulations + "_($output_freq)_ptrc_T_P" + jobind + ".nc"
diad = simulations + "_($output_freq)_diad_T_P" + jobind + ".nc"

If this does not work for you, e.g. because you do not use this package to analyse NEMO output, you may as well leave study defined in which case you must define the array of filenames ptrc (and diad) directly in load_model.jnl. In any case, the strings that will be in ptrc and diad must refer to existing files in one of your ${FER_DATA} directories. I use the current directory, since load_model.jnl attempts to copy missing files to that location.

After the filenames are defined, load_model.jnl will try to copy files not found in ${FER_DATA} to the current directory, and load the files: first the ptrc (1..n_sims) and then the diad (n_sims+1..2 n_sims) files. You might want to change the server and path names, or you must be sure that all of the model output files are present in the current directory (where you start ferret). Some grid information is loaded and calculated. Finally, tracers based on the model output are defined for comparison with observations.

The file load_data.jnl loads observational data by defining an array of dataset names: datanames. You may want to put the most recent, or the best, datasets further to the end than older datasets, since sometimes locations will overlap and we want the most recent data on top of the rest. After this some constants and units are defined, after which we can find the dataset blocks where each string in datanames is used as a string symbol of the filename. Then with the file command the variables of the concerning file are loaded. The dataset names will be also used as suffixes of the variable names. Assuming datanames = {"GEOSECS", "IDP", ...} and we have O2 from the simulation output, O2_GEOSECS, O2_IDP and so on are the corresponding observational data. The corresponding coordinates are called Longitude, Latitude and Depth for each observational dataset, and should be used as Longitude[d=($GEOSECS)] and so on.

Usually it is not trivial what a model variable corresponds to (in reality). Even if there is a well-defined intention of the modeller, the precise interpretation of a prognostic variable might need to be revised during model development. Similarly, observables may often be defined operationally, which means that the measured value of an observable may define it in a different way than initially intended (so this is a qualitative notion, but there may simply be biases as well). We need to keep track of the meaning and definitions of all variables that somehow refer to reality, or at the very least model variables and observational data must refer to each other! For this we use the file tracers.txt that conceptually describes each tracer that we want to analyse. The descriptions that end with (raw) are verbatim variable names from the simulation output. The other variables are defined at the end of load_model.jnl. Concerning the observations, every variable that we want to analyse should to be defined in at least one observational dataset (otherwise we would only be plotting simulation output). Besides model- and observation-specific variable definitions, general properties of the variables can be defined in init_tracers.jnl. These are the common units and plotting ranges.

To get to the actual plotting, it is advised to use the c2d_ ("compare to data") scripts. These are self-describing and there is a demo script as well:

go c2d_mix PO4      ! to plot phosphate in different ways
go/help c2d_4depths ! to show the helpful script for plotting four depths
go complot_demo     ! needs to be the only one here...

The basic syntax of the c2d_ scripts is explained through this example:

go c2d_4depths tPOC 10

The first argument is the tracer tPOC and is obligatory. The second argument for this script is optional (default is 1) and signifies the plot index. Depending on the settings in set_model_files.jnl, this can be the time index, the job number or year, or the simulation. For instance, if there is only one job number/year and one simulation provided, it will presume here that you want to plot time index 10.

You could also provide multiple tracers, provided that they are separated by commas and encapsuled between quotes:

go c2d_2tracers "sPOC,bPOC" 10

but c2d_2tracers is not (yet) provided by ComPlot.

For an analytically exact placement of the colour key, set the symbol Outside_margin in your c2d_ script, and use it for the margin at the right: the colour bar will be placed in that margin.

Structure

While all script reside in a single directory, the filename prefixes show that there are different types of scripts: these define the structure of the package.

init_*.jnl      Initialise stuff
load_*.jnl      Load model or observational data
set_*.jnl       Set variables, among which file names
c2d_*.jnl       User front-end scripts
render_*.jnl    Actual plotting -- DO NOT CHANGE
*.jnl           Miscellaneous -- DO NOT CHANGE

Typically a user would call a c2d script that makes some figure. Each script is meant for a specific location and orientation in the ocean, as well as how many and which simulations must be considered. They are independent of the set of simulations and observational datasets. Those sets will be determined in the load scripts.

Typically a user would modify the load_ scripts to include different (observational or model) data, or add c2d_ scripts based on any existing c2d_ script.

The init_ scripts contain general initialisation stuff, and should only be changed if you must change the plotting range, vertical domain or whether we want to interpolate the model data. The render_ scripts and miscellaneous scripts should be left alone.

The scripts are called like this:

c2d_
|
|-- init_tracers
|   |
|   |-- load_model
|   `-- load_data
!
|-- init_visuals
`-- render_

Since the c2d scripts may be too specific, contain quirks or you just want to do things differently, you are not bound to call these scripts. You can just as well call load_model.jnl to load the simulation output, or init_tracers.jnl if you also want the observational data (and tracer-specific symbols defined in init_tracers.jnl). If you have done that, you may set-up your own viewport and plot data by directly using the render scripts, but be warned that these may expect certain symbols to be defined (that are usually defined in the c2d scripts)!

When parts of this code is used, please refer to this repository and/or accompanying paper. Where appropriate, also consider the scientific literature cited in this package.

Coding conventions

Indentation will be at four space. Thou shalt not use tab characters. Whereas Ferret is case-insensitive for variable and symbol names, it is still useful to have some naming conventions. Try to use variables instead of symbols, especially for numbers. Variable names should be lower case: different words may be separated with an underscore, but compounds should be connected if they are still readable that way.

let example_variable = 123.
list example_variable
define symbol Example_symbol = "xyz"
show symbol Example_symbol

Symbol names should be upper case, but only to begin them with. Do not use camel case, except for Ferret functions like SampleXY() as this often improves readability.

Local variables and symbols should start with $0_, xor with $$xy$$. Here $$x \in {c,i,l,r,s}$$ signifying the different kind of scripts c2d_* and so on, and $$y$$ equals the first character after the first underscore. For instance, c4_ind can be a generic loop variable in c2d_4depths.jnl and rl_n a local number in render_layer.jnl. Variables and symbols formatted thus should be cancelled at the end of each script:

cancel variable rl_* ! or $0_* if you prefer
cancel symbol rl_*   ! or $0_* if you prefer

Ferret commands should generally be written out, and in lower case, except when this gets too messy (too long lines e.g.), and not after underscores, so write XCat_str(). Lines are preferably limited to 80 characters, but may not be longer than 100 characters (akin to NEMO coding conventions).

If there is no appropriate c2d_*.jnl user script for what you want to plot, you can create new scripts by using c2d_TEMPLATE.jnl:

cp c2d_TEMPLATE.jnl c2d_my_plot.jnl

or you can start from any of the c2d_*.jnl (e.g. c2d_4depths.jnl if you want to plot horizontal sections.

Copying

You may reuse this document according to the conditions given by the GNU Free Documentation License version 1.3 (fdl-1.3.txt) or any later version.

The files logo.svg and logo.png: Copyright (C) 2017 Marco van Hulten. You may reuse this document according to the conditions given by the Creative Commons Attribution--ShareAlike 3.0 (data/CC-BY-SA-3.0.txt).