2025-06-27 07:10:00
github.com
Document Version: 3.0
Core Concept: A cognitive learning framework designed to transform fixed hyperparameters (like learning rate, model capacity) into dynamic policies driven in real-time by the intrinsic “surprise” (Surprise
) of data. It is essentially an adaptive hyperparameter scheduling algorithm that allows a model to autonomously decide “how much to learn” and “with what capacity to learn” based on the value of the learning content. This framework originates from the Integrated Predictive Workspace Theory, with further details available in the paper at https://github.com/dmf-archive/IPWT.
Traditional training paradigms rely on manually set hyperparameters that are typically fixed or decay according to a predetermined schedule throughout the training process. This “one-size-fits-all” approach ignores the vast differences in learning value contained in different data batches.
PILF’s design philosophy is: to replace static, human-set rules with dynamic, data-driven policies.
It no longer blindly uses a fixed learning rate or model capacity. Instead, it dynamically and proportionally adjusts its learning behavior by assessing the Surprise
from each data batch:
- Dynamic Learning Rate: When
Surprise
is moderate, it signals valuable “learnable zone” information, and the system assigns a higher learning rate. WhenSurprise
is too low (redundant information) or too high (anomalous information), it assigns a learning rate close to zero, naturally achieving “ignore” and “reject” effects. This directly replaces manually set learning rate schedulers. - Dynamic Capacity: In a Mixture-of-Experts (MoE) architecture,
Surprise
not only adjusts the learning rate but also determines the number of “experts”k
to activate. Simple tasks (lowSurprise
) require only a few experts, while complex tasks (highSurprise
) dynamically engage more experts. This replaces fixed Top-K routing.
PILR-S is the direct application of the PILF idea on any standard neural network. It focuses on one question: How to dynamically adjust the learning rate based on Surprise
? This is achieved using the core calculation toolkit from the SigmaPI project, which is a required dependency. The testing framework and experiments for PILF are detailed in Section 3.
It replaces the traditional “gating” logic of whether to execute optimizer.step()
with a smooth, continuous learning rate modulator.
sequenceDiagram
participant Trainer
participant Model
participant SigmaPI_Monitor
participant LRScheduler as PILR-S
participant Optimizer
Trainer->>Model: Feedforward
Model-->>Trainer: Return logits
Trainer->>SigmaPI_Monitor: calculate(model, logits)
SigmaPI_Monitor-->>Trainer: Return pi_metrics (incl. Surprise)
Trainer->>LRScheduler: update(Surprise)
activate LRScheduler
LRScheduler->>LRScheduler: lr_modifier = gaussian(Surprise, EMA, std)
LRScheduler-->>Trainer: Return lr_modifier
deactivate LRScheduler
Trainer->>Trainer: Calculate loss & loss.backward()
Trainer->>Optimizer: Set effective_lr = base_lr * lr_modifier
Trainer->>Optimizer: step()
Trainer->>Optimizer: Restore base_lr
Loading
Mechanism Explained:
Surprise
Calculation: Currently,Surprise
is calculated using the norm of the backpropagation gradients. In the future, it is entirely feasible to use accumulated gradients from the Forward-Forward Algorithm as the source of surprise. This process would not need to wait for expensive backpropagation, allowing for a rapid assessment of learning value.- Dynamic Modulation: The PILR-S module receives the
Surprise
and calculates a smooth modulation factorlr_modifier
(ranging from 0 to 1) using a Gaussian functionexp(-0.5 * ((surprise - mu) / sigma)^2)
, based on its relationship with the Exponential Moving Average (EMA) and standard deviation (std) ofSurprise
. - Weight Update: The standard
loss.backward()
is executed only afterlr_modifier
is calculated. Subsequently, theoptimizer
useseffective_lr = base_lr * lr_modifier
to perform the weight update.optimizer.step()
is always executed, but its update magnitude has been pre-emptively and dynamically scaled bySurprise
.
PILF is the full implementation on an MoE architecture, extending the dynamic scheduling concept to model capacity allocation.
graph TD
Input --> InitialSurprise["Initial Surprise Assessment"]
subgraph DynamicPolicy [Surprise-Driven Dynamic Policy]
direction LR
InitialSurprise -- "g(Surprise)" --> k_Value["k = g(S)"]
InitialSurprise -- "f(Surprise)" --> lr_mod_Value["lr_mod = f(S)"]
end
k_Value --> HierarchicalGatingNetwork["Hierarchical Gating (route to k experts)"]
HierarchicalGatingNetwork --> MicroExpertPool[...]
MicroExpertPool --> Aggregator
Aggregator --> Logits
Logits --> LossCalculation
LossCalculation -- Gradients --> SelectiveUpdate
subgraph SelectiveUpdate [Selective Update Module]
direction LR
lr_mod_Value --> SetLR["Set effective_lr"]
SetLR --> OptimizerStep["Optimizer.step()"]
end
OptimizerStep -- Updates only active experts & gating --> FinalModel
Loading
Training Loop Explained:
- Dual Dynamic Decision: The model receives data and calculates an initial
Surprise
. Based on thisSurprise
, PILF makes two decisions in parallel:- Capacity Decision:
k = g(Surprise)
, determining how many experts to activate. - Learning Rate Decision:
lr_modifier = f(Surprise)
, determining the learning intensity.
- Capacity Decision:
- Dynamic Routing and Computation: The gating network routes the task to the most appropriate experts based on the
k
value. - Dynamic Weight Update: After calculating the loss and gradients, the optimizer uses the effective learning rate modulated by
lr_modifier
to update only the activated experts and the gating network.
Our test suite is now centered around a lightweight (~1M parameter) Vision Transformer architecture to facilitate rapid experimentation on cognitive learning principles. We compare three main variants on CIFAR-10, using SVHN as an Out-of-Distribution (OOD) validation set.
The goal is to observe how different learning strategies perform under resource constraints, providing a clearer view of the benefits of mechanisms like Predictive Integrity Learning Rate Scheduler (PILR-S).
“Don’t just train your model. Understand its mind.”
Baseline ViT | 4×1 MoE-ViT | 16×4 MoE-ViT | 16×4 PILR-S-MoE-ViT with 3σ Learning |
---|---|---|---|
~0.81M | ~1.21M | ~1.23M | ~1.23M |
![]() | ![]() | ![]() | ![]() |
We also conducted rehearsal experiments on MNIST and FashionMNIST datasets to further explore continual learning capabilities.
8×2 all time (FashionMNIST -> MNIST) | 8×2 in pretrain + 8×2 PILR-S in rehearsal (FashionMNIST -> MNIST) | 8×2 PILR-S all time (FashionMNIST -> MNIST) |
---|---|---|
![]() | ![]() | ![]() |
This project relies on the sigma-pi
package for core calculations. To replicate the experiments and use the full testing framework, you must first clone this repository.
git clone https://github.com/dmf-archive/PILF.git
cd PILF
Note: This package does not automatically install PyTorch. Please manually install the appropriate version for your system (CPU or CUDA) before proceeding. For CUDA-enabled systems, it is recommended to use uv
or pip
:
# Example for CUDA 12.1
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
After setting up PyTorch, install the testing framework dependencies:
The testing framework is modular and configuration-driven.
Create or modify a configuration file in test/configs/
. For example, test/configs/base_vit.py
:
# test/configs/base_vit.py
# Model parameters
model_config = {
'model_type': 'base',
'embed_dim': 128,
'depth': 6,
# ... other model params
}
# Training parameters
train_config = {
'epochs': 20,
'batch_size': 256,
# ... other training params
}
Launch the experiment from the root directory using the test/run_experiment.py
script:
python test/run_experiment.py --config test/configs/base_vit.py
To run the other variants, simply point to their respective config files:
# Run MoE-ViT experiment
python test/run_experiment.py --config test/configs/moe_vit.py
# Run PILR-S-MoE-ViT experiment
python test/run_experiment.py --config test/configs/gbp_moe_vit.py
- Transforms Hyperparameters into Policies: Converts learning rate and model capacity from developer-set “static hyperparameters” into “dynamic policies” that the model adjusts autonomously based on data value.
- Unifies “Learning” and “Forgetting”: By linking the learning rate to
Surprise
, PILF provides a unified framework to handle learning, ignoring (lowSurprise
leads to lowlr
), and rejecting (highSurprise
leads to lowlr
), thereby intrinsically mitigating catastrophic forgetting. - On-Demand Resource Allocation: (PILF) achieves true on-demand computation, where simple tasks consume minimal resources, and complex tasks dynamically call upon more resources, significantly improving efficiency.
This project is licensed under the AGPLv3. See the LICENSE
file for details.
Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.
Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!
Help Power Techcratic’s Future – Scan To Support
If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.
As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!
BITCOIN bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge Scan the QR code with your crypto wallet app |
DOGECOIN D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA Scan the QR code with your crypto wallet app |
ETHEREUM 0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a Scan the QR code with your crypto wallet app |
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.