Nodes
Nodes are the fundamental building blocks of a Pyiron Workflow. They represent individual computational steps that can be connected to form a complete scientific workflow. Unlike regular functions, nodes are specialized classes that must be instantiated and executed following a specific pattern.
Node Execution Model
In Pyiron Workflow, nodes are not regular functions but specialized classes that must be instantiated and executed. This is a critical distinction that affects how you use nodes:
# Create a node instance
square = SquareNumber(x=2.0)
# Execute the node
result = square.run()
# Access the output
output_value = square.outputs.result.value
The complete workflow is:
1. Instantiate the node with parameters (square = SquareNumber(x=2.0))
2. Execute the node (result = square.run())
3. Access outputs through the structured output system (square.outputs.result.value)
Important: Simply calling a node like a function (SquareNumber(x=2.0)) does not execute it - it only creates a node instance.
Function Nodes
Function nodes perform computations and are created using the @as_function_node decorator. Each function node must follow these key requirements:
Output Naming Requirement
Every function node must have named outputs. There are two primary ways to provide output names:
Option 1: Return named variables (recommended)
@as_function_node
def SquareNumber(x: float) -> float:
result = x * x # The variable name "result" becomes the output name
return result
Option 2: Specify output name in the decorator
@as_function_node("square")
def SquareNumber(x: float) -> float:
return x * x
Important: Simply returning an expression like x * x without assigning it to a variable will not work, as the system needs a valid name for the output port.
Single Return Statement Requirement
Each function node must have exactly one return statement. This ensures consistent output port naming and prevents potential conflicts.
# ❌ Wrong - multiple return statements
@as_function_node
def ProcessValue(x: float) -> float:
if x > 0:
return x * 2
else:
return x / 2
# ✅ Correct - single return statement
@as_function_node
def ProcessValue(x: float) -> float:
result = x * 2 if x > 0 else x / 2
return result
Type Annotations
Function nodes can use any valid Python type annotation, but there are important guidelines for different contexts:
For User-Facing Inputs (Workflow Endpoints)
Only these types should be used for parameters that users will directly set:
- Primitive types: int, float, bool, str
- Literal types for constrained choices: Literal["option1", "option2"]
- Optional[Literal[...]] for optional constrained choices
For Internal Connections
More complex types can be used when connecting to other nodes:
- np.ndarray
- Custom dataclasses (e.g., Mesh2D)
- Other complex types that flow between nodes
Multiple Output Ports
Function nodes can produce multiple outputs in two ways:
Option 1: Return multiple named variables
@as_function_node
def SplitData(array: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
"""Splits an array into even and odd indexed elements"""
even_elements = array[::2]
odd_elements = array[1::2]
return even_elements, odd_elements
even_elements, odd_elements) become the output port names.
Option 2: Specify output labels in the decorator
@as_function_node(labels=["even", "odd"])
def SplitData(array: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
return array[::2], array[1::2]
@as_function_node(["even", "odd"])
def SplitData(array: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
return array[::2], array[1::2]
Important: When accessing multiple outputs, they must be referenced by their named labels (not numeric indices).
Critical Note: The parameter name is labels (not output_labels as previously documented).
Dataclass Nodes
Dataclass nodes provide structured containers for passing multiple related values between function nodes. There are two distinct types with different purposes:
Why Separate Decorators for Input and Output Dataclasses?
The separation between @as_inp_dataclass_node and @as_out_dataclass_node serves important conceptual and technical purposes:
- Different Roles in the Workflow:
- Input dataclasses represent configuration parameters that users set before execution
-
Output dataclasses represent computation results that flow between nodes during execution
-
Workflow Engine Requirements:
- The engine treats inputs and outputs differently for dependency tracking
- Input dataclasses are typically "frozen" after workflow creation
-
Output dataclasses are generated dynamically during workflow execution
-
User Experience:
- Separating inputs from outputs makes workflows more intuitive to understand
-
GUI tools can present input parameters differently from results
-
Provenance Tracking:
- Distinguishing between configuration (inputs) and results (outputs) is critical for scientific reproducibility
Input Dataclass Nodes (@as_inp_dataclass_node)
Use this decorator when:
- You're defining user-configurable parameters (not calculation results)
- The data represents settings that control how calculations should run
- You want values to be easily adjustable by users before workflow execution
- The data doesn't depend on other nodes' calculations
- Examples: grid specifications, simulation parameters, file paths
@as_inp_dataclass_node
class GridParams:
"""Parameters for creating a computational grid"""
x_min: float = 0.0
x_max: float = 10.0
y_min: float = 0.0
y_max: float = 10.0
nx: int = 100
ny: int = 100
Usage:
params = GridParams().run() # Execute to get the dataclass instance
assert params.nx == 100
Important: Input dataclass nodes must be executed with .run(), not .dataclass().
Output Dataclass Nodes (@as_out_dataclass_node)
Use this decorator when:
- You're bundling results from a calculation
- The data will be generated by a node and passed to other nodes
- The data represents intermediate or final results of computations
- The data depends on other nodes' outputs or calculations
- Examples: simulation results, processed data, visualization elements
@as_out_dataclass_node
class SimulationResults:
"""Holds results from a simulation run"""
density: np.ndarray = EmptyArrayField
potential: np.ndarray = EmptyArrayField
energy: float = 0.0
Using Dataclasses in Function Nodes
Creating a dataclass instance:
@as_function_node
def CreateGrid(params: GridParams) -> SimulationResults:
x = np.linspace(params.x_min, params.x_max, params.nx)
y = np.linspace(params.y_min, params.y_max, params.ny)
results = SimulationResults().dataclass()
results.density = np.zeros((params.nx, params.ny))
# ... other calculations
return results
Using a dataclass as input:
@as_function_node
def AnalyzeResults(results: SimulationResults) -> float:
average_density = np.mean(results.density)
return average_density
Dataclass Usage Rules
| Context | Correct Syntax | Reason |
|---|---|---|
| Type hint | def foo(data: SimulationResults): |
SimulationResults is the type descriptor |
| Return annotation | -> SimulationResults |
Matches the factory type |
| Creating instance (inside function node) | obj = SimulationResults().dataclass() |
Produces the actual container |
| Executing input dataclass | params = GridParams().run() |
Gets the dataclass instance |
Node Definition Location
Critical Requirement: Node definitions (both dataclass nodes and function nodes) must be defined at the module level, not inside functions or methods.
✅ CORRECT - Module-level definition:
@as_function_node
def SquareNumber(x: float) -> float:
result = x * x
return result
# Later in your code...
square = SquareNumber(x=2.0)
square.run()
result = square.outputs.result.value
❌ INCORRECT - Definition inside a function:
def process_data():
@as_function_node
def SquareNumber(x: float) -> float:
result = x * x
return result
square = SquareNumber(x=2.0)
square.run()
return square.outputs.result.value
Defining nodes inside functions causes: - Inconsistent output port naming - Workflow execution failures - Difficulty tracking node types - Problems with workflow serialization
Output Access Pattern
After executing a node, you can access its outputs through the structured .outputs attribute:
# For a node that returns a single value
square = SquareNumber(x=2.0)
square.run()
result = square.outputs.result.value # Access the output value
# For a node that returns multiple values
split_data = SplitData(array=np.array([1, 2, 3, 4, 5, 6]))
split_data.run()
even = split_data.outputs.even_elements.value
odd = split_data.outputs.odd_elements.value
The .outputs attribute provides:
- Named access to all output ports
- A .value property for each port that contains the actual data
- Consistent structure regardless of node type
Important: The .run() method returns the primary output value, but to access all outputs you must use the .outputs.port_name.value pattern.
Common Mistakes to Avoid
Incorrect Node Execution
# ❌ Wrong - treating node as a regular function
result = SquareNumber(x=2.0)
# ✅ Correct - proper node execution
square = SquareNumber(x=2.0)
result = square.run()
Incorrect Input Dataclass Usage
# ❌ Wrong - using .dataclass() on input dataclass
params = GridParams().dataclass()
# ✅ Correct - using .run() on input dataclass
params = GridParams().run()
Invalid Output Naming
# ❌ Wrong - no output name
@as_function_node
def Square(x: float) -> float:
return x * x
# ✅ Correct - named variable
@as_function_node
def Square(x: float) -> float:
result = x * x
return result
Multiple Return Statements
# ❌ Wrong - multiple return statements
@as_function_node
def Process(x: float) -> float:
if x > 0:
return x * 2
return x / 2
# ✅ Correct - single return statement
@as_function_node
def Process(x: float) -> float:
result = x * 2 if x > 0 else x / 2
return result
Incorrect Dataclass Usage
# ❌ Wrong - using () in type hints
def Process(data: SimulationResults().dataclass()):
# ✅ Correct - just use the class name
def Process(data: SimulationResults):
Missing Dataclass Instance Creation
# ❌ Wrong - won't work
results = SimulationResults()
# ✅ Correct - must use .dataclass()
results = SimulationResults().dataclass()
Why This Distinction Matters
The separation between input and output dataclasses creates a clear boundary between:
- What the user controls (input dataclasses)
- What the workflow computes (output dataclasses)
This distinction:
- Makes workflows more understandable and maintainable
- Enables better tooling support (e.g., GUIs that highlight configurable parameters)
- Improves scientific reproducibility by clearly separating configuration from results
- Helps prevent common errors like accidentally modifying input parameters during execution
For information on connecting nodes into workflows, see the Workflow Construction section.