Tutorial 12: Data Archival#
The final step of any analysis workflow should be to archive your simulation files used in reporting and documenation, both input and output files. The archival task is generally performed once at the end of a project and limited to the final, peer-reviewed simulation results. However, if the task of archiving these files is added to the automated workflow, it is easier to guarantee that the archived files are in sync with the simulation results. Of course, it’s not enough to produce the archive, it must also be stored somewhere for retrieval by colleagues and the analysis report audience.
The archive can include compute environment information and repository version information for improved reproducibility.
For the reproducible version number, it is beneficial to use a versioning scheme that includes information from the
project’s version control system, e.g. git. The WAVES project uses git and setuptools_scm
[55] to build version numbers with a clean version number that is uniquely tied to a single commit,
e.g. 1.2.3
, or a version number appended with the short git hash to uniquely identify the project commit. Setting up
git, git tags, and a setuptools_scm version number is outside the scope of this tutorial, but highly recommended.
References#
Environment#
SCons and WAVES can be installed in a Conda environment with the Conda package manager. See the Conda installation and Conda environment management documentation for more details about using Conda.
Note
The SALib and numpy versions may not need to be this strict for most tutorials. However, Tutorial: Sensitivity Study uncovered some undocumented SALib version sensitivity to numpy surrounding the numpy v2 rollout.
Create the tutorials environment if it doesn’t exist
$ conda create --name waves-tutorial-env --channel conda-forge waves 'scons>=4.6' matplotlib pandas pyyaml xarray seaborn 'numpy>=2' 'salib>=1.5.1' pytest
Activate the environment
$ conda activate waves-tutorial-env
Some tutorials require additional third-party software that is not available for the Conda package manager. This
software must be installed separately and either made available to SConstruct by modifying your system’s PATH
or by
modifying the SConstruct search paths provided to the waves.scons_extensions.add_program()
method.
Warning
STOP! Before continuing, check that the documentation version matches your installed package version.
You can find the documentation version in the upper-left corner of the webpage.
You can find the installed WAVES version with
waves --version
.
If they don’t match, you can launch identically matched documentation with the WAVES Command-Line Utility
docs subcommand as waves docs
.
Directory Structure#
Create and change to a new project root directory to house the tutorial files if you have not already done so. For example
$ mkdir -p ~/waves-tutorials
$ cd ~/waves-tutorials
$ pwd
/home/roppenheimer/waves-tutorials
Note
If you skipped any of the previous tutorials, run the following commands to create a copy of the necessary tutorial files.
$ pwd
/home/roppenheimer/waves-tutorials
$ waves fetch --overwrite --tutorial 11 && mv tutorial_11_regression_testing_SConstruct SConstruct
WAVES fetch
Destination directory: '/home/roppenheimer/waves-tutorials'
Download and copy the
tutorial_11_regression_testing
file to a new file namedtutorial_12_archival
with the WAVES Command-Line Utility fetch subcommand.
$ pwd
/home/roppenheimer/waves-tutorials
$ waves fetch --overwrite tutorials/tutorial_11_regression_testing && cp tutorial_11_regression_testing tutorial_12_archival
WAVES fetch
Destination directory: '/home/roppenheimer/waves-tutorials'
SConscript#
A diff
against the tutorial_11_regression_testing
file from Tutorial 11: Regression Testing is included
below to help identify the changes made in this tutorial.
waves-tutorials/tutorial_12_archival
--- /home/runner/work/waves/waves/build/docs/tutorials_tutorial_11_regression_testing
+++ /home/runner/work/waves/waves/build/docs/tutorials_tutorial_12_archival
@@ -7,6 +7,8 @@
* ``datacheck_alias`` - String for the alias collecting the datacheck workflow targets
* ``regression_alias`` - String for the alias collecting the regression test suite targets
+ * ``archive_prefix`` - String prefix for archive target(s) containing identifying project and version information
+ * ``project_configuration`` - String absolute path to the project SCons configuration file
* ``unconditional_build`` - Boolean flag to force building of conditionally ignored targets
* ``abaqus`` - String path for the Abaqus executable
"""
@@ -25,6 +27,7 @@
# Simulation variables
build_directory = pathlib.Path(Dir(".").abspath)
workflow_name = build_directory.name
+workflow_configuration = [env["project_configuration"], workflow_name]
parameter_study_file = build_directory / "parameter_study.h5"
# Collect the target nodes to build a concise alias for all targets
@@ -191,11 +194,19 @@
)
)
+# Data archival
+archive_name = f"{env['archive_prefix']}-{workflow_name}.tar.bz2"
+archive_target = env.Tar(
+ target=archive_name,
+ source=workflow + workflow_configuration,
+)
+
# Collector alias based on parent directory name
env.Alias(workflow_name, workflow)
env.Alias(f"{workflow_name}_datacheck", datacheck)
env.Alias(env["datacheck_alias"], datacheck)
env.Alias(env["regression_alias"], datacheck)
+env.Alias(f"{workflow_name}_archive", archive_target)
if not env["unconditional_build"] and not env["ABAQUS_PROGRAM"]:
print(f"Program 'abaqus' was not found in construction environment. Ignoring '{workflow_name}' target(s)")
First, we add the new environment keys required by the SConscript
file that will be used by the archive task.
Second, we build a list of all required SCons configuration files for the current workflow, where the
project_configuration
will point to the SConstruct
file and by the project’s naming convention the build
directory name will match the current SConscript
file name. These SCons workflow configuration files will be
archived with the output of the workflow for reproducibility of the workflow task definitions.
For advanced workflows, e.g. Tutorial: Task Definition Reuse, that reuse SConscript
files, it may
be necessary to recover the current SConscript
file name with a Python lambda expression as seen in the SConstruct
modifications below. If the current workflow uses more than one SConscript
file, the workflow_configuration
list
should be updated to include all configuration files for the archive task.
Next, we define the actual archive task using the SCons Tar builder [39]. The archive target is
constructed from a prefix including the current project name and version in the SConstruct
file. Including the
version number will allow us to keep multiple archives simultaneously, provided the version number is incremented
between workflow executions and as the project changes. We append the current workflow name in the archive target for
projects that may contain many unique, independent workflows which can be archived separately. The archive task sources
are compiled from all previous workflow targets and the workflow configuration file(s). In principle, it may be
desirable to archive the workflow’s source files, as well. However, if a version control system is used to build the
version number as in Tutorial: setuptools_scm, the source files may also be recoverable from the version
control state which is embedded in the version number.
Finally, we create a dedicated archive alias to match the workflow alias. Here we separate the aliases because workflows with large output files may require significant time to archive. This may be undesirable during workflow construction and troubleshooting. It is also typical for the archival task to be performed once at reporting time when the post-processing plots have been finalized.
SConstruct#
A diff
against the SConstruct
file from Tutorial 11: Regression Testing is included below to help identify the
changes made in this tutorial.
waves-tutorials/SConstruct
--- /home/runner/work/waves/waves/build/docs/tutorials_tutorial_11_regression_testing_SConstruct
+++ /home/runner/work/waves/waves/build/docs/tutorials_tutorial_12_archival_SConstruct
@@ -3,6 +3,7 @@
import os
import sys
import pathlib
+import inspect
import waves
@@ -59,6 +60,7 @@
unconditional_build=GetOption("unconditional_build"),
print_build_failures=GetOption("print_build_failures"),
abaqus_commands=GetOption("abaqus_command"),
+ TARFLAGS="-c -j",
)
# Conditionally print failed task *.stdout files
@@ -76,13 +78,17 @@
# Set project internal variables and variable substitution dictionaries
project_name = "WAVES-TUTORIAL"
version = "0.1.0"
-project_dir = pathlib.Path(Dir(".").abspath)
+archive_prefix = f"{project_name}-{version}"
+project_configuration = pathlib.Path(inspect.getfile(lambda: None))
+project_dir = project_configuration.parent
project_variables = {
+ "project_configuration": project_configuration,
"project_name": project_name,
"project_dir": project_dir,
"version": version,
"regression_alias": "regression",
"datacheck_alias": "datacheck",
+ "archive_prefix": archive_prefix,
}
for key, value in project_variables.items():
env[key] = value
@@ -114,6 +120,7 @@
"tutorial_08_data_extraction",
"tutorial_09_post_processing",
"tutorial_11_regression_testing",
+ "tutorial_12_archival",
]
for workflow in workflow_configurations:
build_dir = env["variant_dir_base"] / workflow
Note that we retrieve the project configuration SConstruct
file name and location with a Python lambda expression
[40]. We do this to recover the absolute path to the current configuration file and because some projects may
choose to use a non-default filename for the project configuration file. In Python 3, you would normally use the
__file__
attribute; however, this attribute is not defined for SCons configuation files. Instead, we can recover
the configuration file name and absolute path with the same method used in Tutorial 01: Geometry and
Tutorial 02: Partition and Mesh for the Abaqus Python 2 journal files. For consistency with the configuration file
path, we assume that the parent directory of the configuration file is the same as the project root directory.
The environment is also modified to provide non-default configuration options to the SCons Tar builder. Here, we
request the bzip2
compression algorithm of the archive file and a commonly used file extension to match. You can
read more about tar archives in the GNU tar documentation [56] and the SCons Tar builder in the SCons
manpage [39].
Build Targets#
Build the archive target. Note that the usual workflow target does not include the archive task because it is not required until the project developer is ready to begin final reporting.
$ pwd
/home/roppenheimer/waves-tutorials
$ scons tutorial_12_archival_archive --jobs=4
Output Files#
The output should look identical to Tutorial 11: Regression Testing with the addition of a single *.tar.bz2
file. You can inspect the contents of the archive as below.
$ pwd
/home/roppenheimer/waves-tutorials
$ find build -name "*.tar.bz2"
build/tutorial_12_archival/WAVES-TUTORIAL-0.1.0-tutorial_12_archival.tar.bz2
$ tar -tjf $(find build -name "*.tar.bz2") | grep -E "parameter_set0|SConstruct|^tutorial_12_archival"
build/tutorial_12_archival/parameter_set0/rectangle_geometry.cae
build/tutorial_12_archival/parameter_set0/rectangle_geometry.jnl
build/tutorial_12_archival/parameter_set0/rectangle_geometry.stdout
build/tutorial_12_archival/parameter_set0/rectangle_partition.cae
build/tutorial_12_archival/parameter_set0/rectangle_partition.jnl
build/tutorial_12_archival/parameter_set0/rectangle_partition.stdout
build/tutorial_12_archival/parameter_set0/rectangle_mesh.inp
build/tutorial_12_archival/parameter_set0/rectangle_mesh.cae
build/tutorial_12_archival/parameter_set0/rectangle_mesh.jnl
build/tutorial_12_archival/parameter_set0/rectangle_mesh.stdout
build/tutorial_12_archival/parameter_set0/rectangle_compression.inp.in
build/tutorial_12_archival/parameter_set0/rectangle_compression.inp
build/tutorial_12_archival/parameter_set0/assembly.inp
build/tutorial_12_archival/parameter_set0/boundary.inp
build/tutorial_12_archival/parameter_set0/field_output.inp
build/tutorial_12_archival/parameter_set0/materials.inp
build/tutorial_12_archival/parameter_set0/parts.inp
build/tutorial_12_archival/parameter_set0/history_output.inp
build/tutorial_12_archival/parameter_set0/rectangle_compression.sta
build/tutorial_12_archival/parameter_set0/rectangle_compression.stdout
build/tutorial_12_archival/parameter_set0/rectangle_compression.odb
build/tutorial_12_archival/parameter_set0/rectangle_compression.dat
build/tutorial_12_archival/parameter_set0/rectangle_compression.msg
build/tutorial_12_archival/parameter_set0/rectangle_compression.com
build/tutorial_12_archival/parameter_set0/rectangle_compression.prt
build/tutorial_12_archival/parameter_set0/rectangle_compression.h5
build/tutorial_12_archival/parameter_set0/rectangle_compression_datasets.h5
build/tutorial_12_archival/parameter_set0/rectangle_compression.csv
build/tutorial_12_archival/parameter_set0/rectangle_compression.h5.stdout
SConstruct
tutorial_12_archival
Workflow Visualization#
View the workflow directed graph by running the following command and opening the image in your preferred image viewer. First, plot the workflow with all parameter sets.
$ pwd
/home/roppenheimer/waves-tutorials
$ waves visualize tutorial_12_archival_archive --output-file tutorial_12_archival.png --width=60 --height=12 --exclude-list /usr/bin .stdout .jnl .prt .com .msg .dat .sta
The output should look similar to the figure below.

In this image of the archive target’s full directed graph we see that full workflow feeds down into a single archive file on the left hand side. Since the archive target does not include the full workflow, there is only a single connection between the archive alias and the archive file itself. We could specify the archive target by relative path directly, but the alias saves some typing and serves as a consistent command when the project version number changes. This is especially helpful when using a dynamic version number built from a version control system as introduced in the supplemental Tutorial: setuptools_scm.
Now plot the workflow with only the first set, set0
.
$ pwd
/home/roppenheimer/waves-tutorials
$ waves visualize tutorial_12_archival_archive --output-file tutorial_12_archival_set0.png --width=60 --height=8 --exclude-list /usr/bin .stdout .jnl .prt .com .msg .dat .sta --exclude-regex "set[1-9]"
The output should look similar to the figure below.

As in previous tutorials, the full image is useful for describing simulation size and scope, but the image for a single parameter set is more readable and makes it easier to see individual file connections.