Day 3 - Beyond atomistics

As illustrated by the first two days pyiron was originally designed for atomistic simulation or ab intio thermodynamics to be more specific. Still since the publication of pyiron and the release of pyiron as opensource software package on Github pyiron has been extended beyond atomistics. To foster these activities pyiron just like AiiDA joined numfocus as affiliated project to promote open science and open code in the materials science community. So the content of the third day is focused on how to use pyiron beyond atomistics and our general activities in the materials science community to promote open code.

Implement your own class

One very direct way to use pyiron beyond atomistics is by implementing your own job class, either as a ScriptJob as demonstrated yesterday or as a pyiron job class. In this example the pyiron TemplateJob class is used to derive a ToyJob class which basically just copies the input to the output. The input consists of a single entry input_energy which is set to \(100\) and copied to job["output/generic/energy_tot"]. The important part is the simplicity of implementing new classes, all that is required is bascially an write_input() and a collect_output() function. The write_input() function takes the generic input defined in job.input as a python dictionary and writes the corresponding input files for a given simulation code. Afterwards the executable is called, in this simple case it is just a shell command to copy the input in the file input to the output file output using cat input > output. Finally the collect_output() function reads the output file, parses the output variables, in this example just a single one and stores the output in the HDF5 file using the self.project_hdf5 interface, which basically accepts any kind of dictionary. So the taks of the collect_output() function can be summarized as parsing the output and returing a python dictionary for pyiron to store.

from os.path import join
from pyiron_base import TemplateJob, Project
class ToyJob(TemplateJob):
    def __init__(self, project, job_name):
        super().__init__(project, job_name) 
        self.input['input_energy'] = 100
        self.executable = "cat input > output"

    def write_input(self): 
        self.input.write_file( 
            file_name="input",
            cwd=self.working_directory
        )

    def collect_output(self):
        file = join(self.working_directory, "output") 
        with open(file) as f:
            line = f.readlines()[0]
        energy = float(line.split()[1]) 
        with self.project_hdf5.open("output/generic") as h5out: 
            h5out["energy_tot"] = energy

After the ToyJob class is defined it can be used like any other pyiron class:

pr = Project('test')

Only the creation of the job is slightly different, instead of selecting the job_type from pr.job_type.* the ToyJob class is set directly:

job = pr.create_job(job_type=ToyJob, job_name="toy")

Then the job can be executed like any other job - at least when the jupyter notebook is executed inline. For the case of submitting a custom job class defined in a Jupyter notebook to a remote computing cluser it is recommended to just submit the whole jupyter notebook as ScriptJob.

job.run()
The job toy was saved and received the ID: 68

Finally the output can be accessed in the same way as already demonstrated on day one and two:

job['output/generic/energy_tot']
100.0

So while the number of simulation codes currently implemented in pyiron is restricted to the ones we primarly use, it is easy to add new simulation codes. Two examples would be:

Still both of these interfaces are currently in a highly experimental state and we are still looking for experienced users who would be interested to use pyiron for their research and can help us with their code specific knowledge. Apart from this pyiron can in principle used to implement any kind of calculation which benefits form a unified storage layer and the interface to HPC infrastructure.

Publish your workflow

In the same direction pyiron not only supports developers by providing them with a platform to integrate their plugins but also scientists in general with a framework to publish scientific workflows developed with pyiron. The pyiron-publication-template is a combination of continous integration on the Github platform, with jupyterbook for a simple website and mybinder for an interactive user experience.

As a scientist you can upload:

  • your jupyter notebook which defines the physical steps of your method.

  • your conda environment as environment.yml file to fix the versions of the executables you used.

  • your resources, like existing pyiron calculation, which can be extracted using pr.pack() or specific parameter databases like interatomic potentials for example.

While the technology is rather new, there are already a hand ful of examples available which demonstrate the advantage of having such a template to have a unifed way to publish new simulation protocols and workflows:

Open science

Finally beyond the pyiron project in general the developer in the pyiron project also contribute to other opensource software packages in the materials science community like ASE or pymatgen. Other contributions include:

  • The release of components originally developed for pyiron as standalone packages as they might also be relevant to users outside the pyiron community. For example a simple interface to HPC queuing systems based on the idea that mondern queuing systems offer a lot of different settings but most users stick to a set of predefined templates or a prallel interface to the lammps library based on mpi4py which is directly accessible from a jupyter notebook which is executed in a serial process.

  • As demonstrated on the first day, installing pyiron is easy because all dependencies are already included in conda-forge. This was not the case a few years ago. Over the last few years the pyiron developers contributed over 100 packages to the conda-forge community ranging from simple python codes to DFT codes written in Fortran.

  • Finally this workshop as well as our other virtual workshops extensively use jupyterhub in combination with Docker containers to provide virtual environments for the participants. These virtual environments are constructed using conda packages and Github based continous integration. The build process is also publicly available and could be used to automate the creation of virtual environments for other workshops as well.

Summary

The third day highlighted:

  • the development of own classes which do not even have to be related to atomistics or materials science in general.

  • the workflow to publish jupyter notebooks created with pyiron as a new way of sharing your work.

  • further contributions of the pyiron developers to the open science community in general.

Thank you for your attention.