Workflow
A Pyiron Workflow represents a complete scientific pipeline composed of interconnected computational steps. While nodes define individual operations, the workflow orchestrates how these operations connect and execute in sequence to transform inputs into meaningful scientific results. This section explains how to assemble nodes into robust, reproducible workflows that capture your scientific process with precision.
The Conceptual Foundation
Workflows in Pyiron embody the principle of executable scientific methodology - transforming research procedures into structured, verifiable computational processes. At their core, workflows address critical challenges in computational science:
-
Process Documentation: Scientific workflows should serve as executable documentation of research methods, making procedures transparent and shareable.
-
Dependency Management: Complex analyses involve numerous interdependent steps; workflows explicitly map these dependencies to ensure correct execution order.
-
Parameter Propagation: Scientific studies often require systematic parameter variations; workflows provide structured mechanisms for managing these variations.
-
Result Traceability: Understanding how final results derive from initial inputs is essential for validation; workflows maintain this lineage throughout execution.
Workflow Construction Principles
A Pyiron Workflow is created by connecting nodes according to specific principles that ensure scientific rigor and computational reliability.
Basic Workflow Structure
First, let's define the necessary nodes:
import numpy as np
from matplotlib.figure import Figure
from typing import Literal
from pyiron_core.pyiron_workflow import (
as_function_node,
as_out_dataclass_node,
as_inp_dataclass_node,
)
from pyiron_core.pyiron_workflow.data_fields import DataArray, EmptyArrayField
# Define dataclass nodes
@as_inp_dataclass_node
class GridParams:
"""Parameters for creating a computational grid"""
x_min: float = 0.0
x_max: float = 10.0
y_min: float = 0.0
y_max: float = 10.0
nx: int = 100
ny: int = 100
@as_out_dataclass_node
class Grid2D:
"""Holds a 2D grid structure"""
x: DataArray = EmptyArrayField
y: DataArray = EmptyArrayField
X: DataArray = EmptyArrayField
Y: DataArray = EmptyArrayField
@as_out_dataclass_node
class DensityResults:
"""Holds density calculation results"""
density: DataArray = EmptyArrayField
grid: Grid2D = EmptyArrayField
# Define function nodes
@as_function_node
def CreateGrid(params: GridParams) -> Grid2D:
"""Creates a 2D grid based on parameters"""
x = np.linspace(params.x_min, params.x_max, params.nx)
y = np.linspace(params.y_min, params.y_max, params.ny)
X, Y = np.meshgrid(x, y, indexing="ij")
grid = Grid2D().dataclass()
grid.x = x
grid.y = y
grid.X = X
grid.Y = Y
return grid # Output port will be named "grid"
@as_function_node
def ComputeDensity(grid: Grid2D, amplitude: float = 1.0) -> DensityResults:
"""Computes a sample density field on the grid"""
density = amplitude * np.sin(grid.X) * np.cos(grid.Y)
results = DensityResults().dataclass()
results.density = density
results.grid = grid
return results # Output port will be named "results"
@as_function_node
def PlotResults(results: DensityResults, cmap: Literal["viridis", "plasma", "coolwarm"] = "viridis") -> Figure:
"""Creates a visualization of the density results"""
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
pcm = ax.pcolormesh(
results.grid.X,
results.grid.Y,
results.density,
cmap=cmap
)
fig.colorbar(pcm, ax=ax, label="Density")
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_title("Density Field")
return fig # Output port will be named "fig"
Now we can assemble the workflow:
from pyiron_core.pyiron_workflow import Workflow
# 1. Create a workflow instance with a descriptive name
wf = Workflow("density_analysis_workflow")
# 2. Add nodes to the workflow with meaningful attribute names
wf.grid_params = GridParams(
x_min=0.0,
x_max=2 * np.pi,
nx=200 # Only specify non-default parameters
)
wf.grid = CreateGrid(params=wf.grid_params)
wf.density = ComputeDensity(grid=wf.grid, amplitude=np.sqrt(2.0))
wf.figure = PlotResults(results=wf.density, cmap="plasma")
Understanding Workflow Syntax
In Pyiron Workflow, the syntax has specific meaning that's crucial to understand:
wf.grid_1 = CreateGrid(params=wf.grid_params)
wf.grid_2 = CreateGrid(params=wf.grid_params)
-
Left side (
wf.grid_1): This assigns a label (grid_1) to the node instance within the workflow. This label becomes how you reference this specific node instance later in your workflow. -
Right side (
CreateGrid(params=wf.grid_params)): This creates a new instance of theCreateGridnode type, with itsparamsinput connected to the output of thegrid_paramsnode.
When you create multiple instances of the same node type (like grid_1 and grid_2 above), you're creating distinct computational steps that can be executed independently, even if they use the same input parameters.
Output Port Naming Rules
Critical Insight: Output port names are determined by the variable name used in the return statement, not by the function name or return type.
@as_function_node
def CreateGrid(params: GridParams) -> Grid2D:
# ... computation ...
grid = Grid2D().dataclass()
# ... populate grid ...
return grid # Output port will be named "grid"
In this example, the output port is named grid because that's the variable name in the return statement.
Similarly:
@as_function_node
def ComputeDensity(grid: Grid2D, amplitude: float = 1.0) -> DensityResults:
# ... computation ...
results = DensityResults().dataclass()
# ... populate results ...
return results # Output port will be named "results"
Here, the output port is named results.
Single-Output vs. Multi-Output Node Connections
For single-output nodes (most common case):
wf.grid = CreateGrid(params=wf.grid_params)
grid_params node has only one output, so Pyiron automatically connects to that output. This is the recommended approach for single-output nodes.
For multi-output nodes:
@as_function_node
def SplitData(array: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
"""Splits an array into even and odd indexed elements"""
even_elements = array[::2] # Named variable for return
odd_elements = array[1::2] # Named variable for return
return even_elements, odd_elements # Output ports will be "even_elements" and "odd_elements"
# Create a node with multiple outputs
wf.split = SplitData(array=np.arange(10)) # Direct literal assignment
# Connect to specific outputs using named labels
wf.process_even = ProcessData(data=wf.split.outputs.even_elements)
wf.process_odd = ProcessData(data=wf.split.outputs.odd_elements)
.outputs attribute followed by the specific output label.
This distinction ensures clarity about which part of a node's output is being used, especially when nodes produce multiple related values.
Key advantage of input dataclasses: When creating input dataclass nodes, you only need to specify parameters that differ from their defaults. In the example above, y_min, y_max, and ny use their default values, making the workflow definition more concise and readable.
Running Your Workflow
Once you've defined your workflow, you can execute it in several ways:
Full Execution
# Execute the entire workflow
wf.run()
# Now you can access results
density_result = wf.density
figure = wf.figure
Partial Execution (For Debugging)
# Execute only up to a specific node
wf.grid.pull() # Executes all nodes needed to compute wf.grid
# Now wf.grid is available, but wf.density and wf.figure are not yet computed
print(wf.grid.outputs.grid.value.X.shape) # Access the actual grid data
# Continue execution to the next node
wf.density.pull()
Why partial execution matters for debugging: - You can inspect intermediate results without running the entire workflow - If an error occurs, you can pinpoint exactly which node caused it - You can verify that each step produces the expected output before proceeding - It's much faster to test individual components than the entire workflow
On-Demand Execution
# Accessing a node's output automatically triggers execution
density_result = wf.density # Executes all necessary nodes
# Subsequent accesses are fast (results are cached)
density_result_again = wf.density
Understanding Node Output Structure
In Pyiron Workflow, when you create a node like wf.grid = CreateGrid(...), wf.grid is not the actual data but a node object that manages the computation. To access the actual data, you need to navigate through:
# Get the actual Grid2D object produced by the node
grid_data = wf.grid.outputs.grid.value
# Now you can access its fields
print(grid_data.X.shape)
The structure is:
- wf.grid - The node object
- .outputs - Contains all output ports
- .grid - The specific output port (named after the return variable)
- .value - The actual data value produced by the node
Important: When connecting nodes in a workflow, you should connect the node objects directly, not their internal data:
# ✅ CORRECT - connecting node objects
wf.density = ComputeDensity(grid=wf.grid, amplitude=0.5)
# ❌ INCORRECT - trying to connect internal data
wf.density = ComputeDensity(grid=wf.grid.outputs.grid.value, amplitude=0.5)
The workflow system automatically handles the data flow between nodes. Only when you need to access results for analysis or visualization should you use the .outputs.port_name.value pattern.
Key Construction Rules
1. All Computation Must Occur Within Nodes
The workflow assembly code itself should contain no computations - only node connections:
# First define the necessary nodes
@as_function_node
def ComputeDensity(grid: Grid2D, amplitude: float) -> DensityResults:
"""Computes density with the given amplitude"""
density = amplitude * np.sin(grid.X) * np.cos(grid.Y)
results = DensityResults().dataclass()
results.density = density
results.grid = grid
return results # Output port will be "results"
# ✅ Correct - computation that evaluates to a basic type
wf.density = ComputeDensity(amplitude=np.sqrt(2.0)) # This is acceptable
# ❌ Worse practice - magic number
wf.density = ComputeDensity(amplitude=1.41421356237) # Hard to understand what this number represents
# ✅ Best practice - create a node for the computation if you want to track the recipe
@as_function_node
def CalculateAmplitude(base: float) -> float:
result = np.sqrt(base) # Named variable for return
return result # Output port will be "result"
wf.amplitude = CalculateAmplitude(base=2.0)
wf.density = ComputeDensity(grid=wf.grid, amplitude=wf.amplitude)
Important clarification: Expressions like np.sqrt(2.0) are acceptable in workflow assembly because they evaluate directly to a basic type (float). This is preferable to using magic numbers because:
1. It's self-documenting (you can see where the value comes from)
2. It's more precise (avoids rounding errors from manual calculation)
3. It's more maintainable (changing the base value is straightforward)
However, if you want the workflow to track how a parameter was derived (for provenance and reproducibility), create a dedicated node as shown in the best practice example.
2. Inputs Must Be Basic Literals or Node Outputs
Workflow connections should only use:
- Basic literals (int, float, str, bool)
- Literal or Optional[Literal] values
- Outputs from other nodes in the workflow
# First define the necessary nodes
@as_function_node
def ComputeDensity(grid: Grid2D, amplitude: float) -> DensityResults:
"""Computes density with the given amplitude"""
density = amplitude * np.sin(grid.X) * np.cos(grid.Y)
results = DensityResults().dataclass()
results.density = density
results.grid = grid
return results # Output port will be "results"
# ✅ Correct - expression that evaluates to a basic type
wf.density = ComputeDensity(amplitude=1.0 / np.sqrt(2.0))
# ❌ Wrong - complex object creation
wf.density = ComputeDensity(amplitude=np.array([1.0, 2.0, 3.0])) # Should be in a node
3. Explicit Connection Semantics
Each workflow attribute assignment represents a connection between nodes:
# First define the necessary nodes
@as_function_node
def CreateGrid(params: GridParams) -> Grid2D:
"""Creates a 2D grid based on parameters"""
x = np.linspace(params.x_min, params.x_max, params.nx)
y = np.linspace(params.y_min, params.y_max, params.ny)
X, Y = np.meshgrid(x, y, indexing="ij")
grid = Grid2D().dataclass()
grid.x = x
grid.y = y
grid.X = X
grid.Y = Y
return grid # Output port will be "grid"
# This creates a connection where:
# - The output of CreateGrid becomes available as wf.grid
# - The params input of CreateGrid receives wf.grid_params
wf.grid_params = GridParams(x_min=0.0, x_max=10.0, nx=100)
wf.grid = CreateGrid(params=wf.grid_params)
Dataclass Handling in Workflows
When working with dataclass nodes, follow these specific patterns:
Input Dataclasses (Configuration)
# First define the necessary nodes
@as_inp_dataclass_node
class GridParams:
"""Parameters for creating a computational grid"""
x_min: float = 0.0
x_max: float = 10.0
nx: int = 100
ny: int = 100
# Create input dataclass with only non-default parameters
wf.grid_params = GridParams(
x_min=0.0,
x_max=2 * np.pi,
nx=200 # Only specify values that differ from defaults
)
Output Dataclasses (Results)
# First define the necessary nodes
@as_out_dataclass_node
class Grid2D:
"""Holds a 2D grid structure"""
x: DataArray = EmptyArrayField
y: DataArray = EmptyArrayField
X: DataArray = EmptyArrayField
Y: DataArray = EmptyArrayField
@as_function_node
def CreateGrid(params: GridParams) -> Grid2D:
"""Creates a 2D grid based on parameters"""
x = np.linspace(params.x_min, params.x_max, params.nx)
y = np.linspace(params.y_min, params.y_max, params.ny)
X, Y = np.meshgrid(x, y, indexing="ij")
grid = Grid2D().dataclass()
grid.x = x
grid.y = y
grid.X = X
grid.Y = Y
return grid # Output port will be "grid"
@as_out_dataclass_node
class DensityResults:
"""Holds density calculation results"""
density: DataArray = EmptyArrayField
grid: Grid2D = EmptyArrayField
@as_function_node
def ComputeDensity(grid: Grid2D, amplitude: float = 1.0) -> DensityResults:
"""Computes a sample density field on the grid"""
density = amplitude * np.sin(grid.X) * np.cos(grid.Y)
results = DensityResults().dataclass()
results.density = density
results.grid = grid
return results # Output port will be "results"
# Connect nodes using dataclass outputs
wf.grid = CreateGrid(params=wf.grid_params)
wf.density = ComputeDensity(grid=wf.grid, amplitude=0.5)
Important: Always connect the entire dataclass object, not individual fields:
# ❌ Wrong - breaking the Grid2D dataclass by extracting individual fields
wf.plot = PlotDensity(x=wf.grid.x, y=wf.grid.y, density=wf.density.density)
# ✅ Correct - passing the entire Grid2D dataclass
wf.plot = PlotDensity(grid=wf.grid, density=wf.density)
Multiple Output Handling
When working with nodes that produce multiple outputs, access them through named labels:
# First define the necessary nodes
@as_function_node
def SplitData(array: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
"""Splits an array into even and odd indexed elements"""
even_elements = array[::2] # Named variable for return
odd_elements = array[1::2] # Named variable for return
return even_elements, odd_elements # Output ports will be "even_elements" and "odd_elements"
@as_function_node
def ProcessData(data: np.ndarray) -> np.ndarray:
"""Processes the data array"""
processed = data * 2 # Named variable for return
return processed # Output port will be "processed"
# Create a node with multiple outputs
wf.split = SplitData(array=np.arange(10)) # Direct literal assignment
# Connect to specific outputs using named labels
wf.process_even = ProcessData(data=wf.split.outputs.even_elements)
wf.process_odd = ProcessData(data=wf.split.outputs.odd_elements)
Never use numeric indexing to access multiple outputs:
# ❌ Wrong - numeric indexing is forbidden
wf.process = ProcessData(data=wf.split.outputs[0])
# ✅ Correct - named access
wf.process = ProcessData(data=wf.split.outputs.even_elements)
Workflow Execution Model
Pyiron Workflows follow a lazy evaluation model with explicit execution control:
Execution Process
- Workflow Assembly: Define all nodes and connections (no computation occurs)
- Validation: The system verifies all connections are valid
- Execution: When results are requested, the system executes only necessary nodes
- Result Delivery: Final results are returned to the user
Selective Execution
Pyiron intelligently executes only the nodes necessary to produce requested results:
# If you only need the grid, only CreateGrid executes
grid_data = wf.grid
# If you need the density, both CreateGrid and ComputeDensity execute
density_data = wf.density
Common Workflow Patterns
Parameter Sweep Pattern
import numpy as np
from pyiron_core.pyiron_workflow import Workflow, as_function_node, as_inp_dataclass_node
from pyiron_core.pyiron_workflow.data_fields import DataArray, EmptyArrayField
# Define dataclass nodes
@as_inp_dataclass_node
class GridParams:
"""Parameters for creating a computational grid"""
x_min: float = 0.0
x_max: float = 10.0
nx: int = 100
ny: int = 100
@as_out_dataclass_node
class Grid2D:
"""Holds a 2D grid structure"""
x: DataArray = EmptyArrayField
y: DataArray = EmptyArrayField
X: DataArray = EmptyArrayField
Y: DataArray = EmptyArrayField
@as_out_dataclass_node
class DensityResults:
"""Holds density calculation results"""
density: DataArray = EmptyArrayField
grid: Grid2D = EmptyArrayField
amplitude: float = 0.0
# Define function nodes
@as_function_node
def CreateGrid(params: GridParams) -> Grid2D:
"""Creates a 2D grid based on parameters"""
x = np.linspace(params.x_min, params.x_max, params.nx)
y = np.linspace(params.y_min, params.y_max, params.ny)
X, Y = np.meshgrid(x, y, indexing="ij")
grid = Grid2D().dataclass()
grid.x = x
grid.y = y
grid.X = X
grid.Y = Y
return grid # Output port will be "grid"
@as_function_node
def ComputeDensity(grid: Grid2D, amplitude: float) -> DensityResults:
"""Computes a sample density field on the grid"""
density = amplitude * np.sin(grid.X) * np.cos(grid.Y)
results = DensityResults().dataclass()
results.density = density
results.grid = grid
results.amplitude = amplitude
return results # Output port will be "results"
# Create the workflow
wf = Workflow("parameter_sweep")
# Base configuration (only non-default parameters)
wf.base_params = GridParams(x_max=2 * np.pi, nx=200)
# Multiple runs with different amplitudes
amplitudes = [0.1, 0.5, 1.0, 2.0]
for i, amp in enumerate(amplitudes):
# Create unique attribute names for each run
grid_attr = f"grid_{i}"
density_attr = f"density_{i}"
# Connect nodes for this parameter value
setattr(wf, grid_attr, CreateGrid(params=wf.base_params))
setattr(wf, density_attr, ComputeDensity(
grid=getattr(wf, grid_attr),
amplitude=amp
))
# Execute only up to grid_1 for debugging
wf.grid_1.pull()
print(f"Grid shape: {wf.grid_1.outputs.grid.value.X.shape}")
# Execute the full workflow for amplitude 1
wf.density_1.pull()
Conditional Workflow Pattern
import numpy as np
from typing import Literal
from pyiron_core.pyiron_workflow import Workflow, as_function_node, as_inp_dataclass_node, as_out_dataclass_node
from pyiron_core.pyiron_workflow.data_fields import DataArray, EmptyArrayField
# Define dataclass nodes
@as_inp_dataclass_node
class AnalysisParams:
"""Parameters controlling the analysis method"""
method: Literal["simple", "advanced"] = "simple"
@as_out_dataclass_node
class Grid2D:
"""Holds a 2D grid structure"""
x: DataArray = EmptyArrayField
y: DataArray = EmptyArrayField
X: DataArray = EmptyArrayField
Y: DataArray = EmptyArrayField
@as_out_dataclass_node
class Results:
"""Holds analysis results"""
data: DataArray = EmptyArrayField
method: str = ""
# Define function nodes
@as_function_node
def CreateGrid(nx: int = 100, ny: int = 100) -> Grid2D:
"""Creates a standard 2D grid"""
x = np.linspace(0, 10, nx)
y = np.linspace(0, 10, ny)
X, Y = np.meshgrid(x, y, indexing="ij")
grid = Grid2D().dataclass()
grid.x = x
grid.y = y
grid.X = X
grid.Y = Y
return grid # Output port will be "grid"
@as_function_node
def SimpleAnalysis(grid: Grid2D) -> Results:
"""Performs a simple analysis on the grid"""
result = np.mean(grid.X, axis=1) # Named variable for return
results = Results().dataclass()
results.data = result
results.method = "simple"
return results # Output port will be "results"
@as_function_node
def AdvancedAnalysis(grid: Grid2D) -> Results:
"""Performs an advanced analysis on the grid"""
result = np.sin(grid.X) * np.cos(grid.Y) # Named variable for return
results = Results().dataclass()
results.data = result
results.method = "advanced"
return results # Output port will be "results"
@as_function_node
def SelectResults(simple: Results, advanced: Results, method: str) -> Results:
"""Selects the appropriate results based on method"""
selected = advanced if method == "advanced" else simple # Named variable for return
return selected # Output port will be "selected"
# Create the workflow
wf = Workflow("conditional_analysis")
# Base configuration
wf.params = AnalysisParams(method="advanced")
# Create different processing paths
wf.grid = CreateGrid(nx=100)
wf.simple_results = SimpleAnalysis(grid=wf.grid)
wf.advanced_results = AdvancedAnalysis(grid=wf.grid)
# Select output based on parameter
wf.final_results = SelectResults(
simple=wf.simple_results,
advanced=wf.advanced_results,
method=wf.params.method
)
# Debugging: Check intermediate results
wf.grid.pull() # Execute only the grid creation
print(f"Grid shape: {wf.grid.outputs.grid.value.X.shape}")
wf.simple_results.pull() # Execute simple analysis path
print(f"Simple results shape: {wf.simple_results.outputs.results.value.data.shape}")
Common Mistakes to Avoid
Incorrect Output Port Access
# First define the necessary nodes
@as_out_dataclass_node
class Grid2D:
"""Holds a 2D grid structure"""
x: DataArray = EmptyArrayField
y: DataArray = EmptyArrayField
X: DataArray = EmptyArrayField
Y: DataArray = EmptyArrayField
@as_function_node
def CreateGrid(params: GridParams) -> Grid2D:
"""Creates a 2D grid based on parameters"""
x = np.linspace(params.x_min, params.x_max, params.nx)
y = np.linspace(params.y_min, params.y_max, params.ny)
X, Y = np.meshgrid(x, y, indexing="ij")
grid = Grid2D().dataclass()
grid.x = x
grid.y = y
grid.X = X
grid.Y = Y
return grid # Output port will be "grid"
# ❌ Wrong - incorrect output port name
wf.grid.pull()
grid_data = wf.grid.outputs.create_grid.value # No such port
# ✅ Correct - using the return variable name as port name
wf.grid.pull()
grid_data = wf.grid.outputs.grid.value
Numeric Indexing of Multiple Outputs
# First define the necessary nodes
@as_function_node
def SplitData(array: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
"""Splits an array into even and odd indexed elements"""
even_elements = array[::2] # Named variable for return
odd_elements = array[1::2] # Named variable for return
return even_elements, odd_elements # Output ports will be "even_elements" and "odd_elements"
@as_function_node
def ProcessData(data: np.ndarray) -> np.ndarray:
"""Processes the data array"""
processed = data * 2 # Named variable for return
return processed # Output port will be "processed"
# ❌ Wrong - numeric indexing
wf.process = ProcessData(data=wf.split.outputs[0])
# ✅ Correct - named access
wf.process = ProcessData(data=wf.split.outputs.even_elements)
Breaking Dataclass Integrity
# First define the necessary nodes
@as_out_dataclass_node
class Grid2D:
"""Holds a 2D grid structure"""
x: DataArray = EmptyArrayField
y: DataArray = EmptyArrayField
X: DataArray = EmptyArrayField
Y: DataArray = EmptyArrayField
@as_out_dataclass_node
class DensityResults:
"""Holds density calculation results"""
density: DataArray = EmptyArrayField
grid: Grid2D = EmptyArrayField
@as_function_node
def CreateGrid(params: GridParams) -> Grid2D:
"""Creates a 2D grid based on parameters"""
x = np.linspace(params.x_min, params.x_max, params.nx)
y = np.linspace(params.y_min, params.y_max, params.ny)
X, Y = np.meshgrid(x, y, indexing="ij")
grid = Grid2D().dataclass()
grid.x = x
grid.y = y
grid.X = X
grid.Y = Y
return grid # Output port will be "grid"
@as_function_node
def ComputeDensity(grid: Grid2D, amplitude: float = 1.0) -> DensityResults:
"""Computes a sample density field on the grid"""
density = amplitude * np.sin(grid.X) * np.cos(grid.Y)
results = DensityResults().dataclass()
results.density = density
results.grid = grid
return results # Output port will be "results"
@as_function_node
def PlotDensity(grid: Grid2D, density: np.ndarray) -> Figure:
"""Plots the density field"""
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
pcm = ax.pcolormesh(grid.X, grid.Y, density)
return fig # Output port will be "fig"
# Correct workflow setup
wf = Workflow("example")
wf.grid_params = GridParams()
wf.grid = CreateGrid(params=wf.grid_params)
wf.density = ComputeDensity(grid=wf.grid, amplitude=0.5)
# ❌ Wrong - breaking the Grid2D dataclass by extracting individual fields
wf.plot = PlotDensity(x=wf.grid.x, y=wf.grid.y, density=wf.density.density)
# ✅ Correct - passing the entire Grid2D dataclass
wf.plot = PlotDensity(grid=wf.grid, density=wf.density.outputs.results.value.density)
External File Operations
# First define the necessary nodes
@as_function_node
def LoadData(filename: str) -> np.ndarray:
"""Loads data from a file"""
import numpy as np
data = np.loadtxt(filename) # Named variable for return
return data # Output port will be "data"
@as_function_node
def CreateGrid(data: np.ndarray) -> Grid2D:
"""Creates a grid from loaded data"""
# Implementation would use the loaded data
grid = Grid2D().dataclass()
# ...
return grid # Output port will be "grid"
# ❌ Wrong - direct file access
import numpy as np
data = np.loadtxt("input.dat")
wf.grid = CreateGrid(data=data)
# ✅ Correct - file operations in dedicated nodes
wf.loader = LoadData(filename="input.dat")
wf.grid = CreateGrid(data=wf.loader)
Why This Workflow Model Matters
The Pyiron Workflow approach delivers significant advantages for scientific computing:
-
Reproducibility: The complete computational procedure is captured in an executable form, making studies reproducible by others.
-
Modifiability: Individual components can be updated without rewriting the entire workflow, supporting method evolution.
-
Transparency: The workflow structure makes data dependencies explicit, enhancing scientific reviewability.
-
Error Reduction: Strict connection rules prevent common errors like incompatible data connections.
-
Collaboration Support: Well-structured workflows are easier to share and understand among research teams.
-
Provenance Tracking: The system automatically tracks how results were generated, supporting scientific validation.
-
Debugging Efficiency: The ability to execute workflows partially (
wf.grid.pull()) makes identifying and fixing issues significantly easier.
By following these workflow construction principles, researchers can create computational pipelines that are not just functional, but also scientifically rigorous, transparent, and shareable - transforming code into a meaningful part of the scientific record.