Data-Driven Physics

Physics-informed ML, neural ODEs, and symbolic regression — blending inductive biases with data to learn governing dynamics.

Hybrid modeling: physics and machine learning
Hybrid models merge conservation laws with flexible function approximators, anchoring learning to physical structure.

Introduction

Data-driven physics asks a simple question: when equations are incomplete or parameters unknown, how can data finish the model without violating the laws we already trust? Pure machine learning is flexible but forgetful; pure theory can be elegant but brittle. The modern answer is hybrid modeling — encode conservation, symmetries, and units directly into learning systems so that data fills in only what is missing. This page surveys three pillars of the field: physics-informed neural networks (PINNs), neural differential equations (Neural ODEs/PDEs), and symbolic regression that discovers interpretable equations from measurements.

We adopt two lenses. The physics lens emphasizes invariants, constitutive laws, and causal structure. The computation lens studies optimization landscapes, regularization, and numerical stability. Successful practice lives at their intersection: loss functions must honor physics yet be trainable; architectures must reflect geometry yet be expressive; discretizations must match the data’s resolution.

Physics-Informed Neural Networks (PINNs)

A PINN represents an unknown field \(u_\theta(x,t)\) with a neural network and trains it so that the governing PDE residual is minimized at collocation points while respecting boundary/initial conditions. With automatic differentiation we compute derivatives \(\partial_t u_\theta, \nabla u_\theta, \nabla\!\cdot\!(k\nabla u_\theta)\) exactly with respect to network inputs, building a physics loss like \[ \mathcal{L}_{\text{PDE}}=\frac{1}{N}\sum_{i}\big\|\mathcal{N}\big[u_\theta\big](x_i,t_i)-f(x_i,t_i)\big\|^2, \] where \(\mathcal{N}\) is the differential operator. Data terms \(\mathcal{L}_{\text{data}}\) and boundary terms \(\mathcal{L}_{\text{BC}}\) complete the objective. The result is a mesh-free surrogate that interpolates sparse sensors while satisfying the PDE everywhere.

Practice. Non-dimensionalize inputs; encode boundary conditions by construction (e.g., multiply the network output by functions that vanish on the boundary); balance loss weights adaptively (residuals and data often sit on different scales); and monitor physical quantities (flux, energy) during training. For multiphysics or discontinuities, use domain decomposition or adaptive sampling to place collocation points where the residual is large.

Limits. PINNs trade exact discretization for flexibility; stiff PDEs and high-frequency features can be hard to learn. Spectral bias pulls networks toward smooth solutions; curriculum schedules (coarse-to-fine), Fourier features, or coordinate transforms alleviate this.

PINN training with collocation and boundary losses
A PINN balances PDE residuals, boundary conditions, and data misfit; autodiff supplies derivatives without a mesh.

Neural Differential Equations

Neural ODEs replace a hand-written right-hand side \(\dot y=f(t,y)\) with a learnable vector field \(\dot y=f_\theta(t,y)\) integrated by a numerical solver. Training uses adjoint sensitivity to compute gradients through the integrator with \(\mathcal{O}(1)\) memory, letting the model adapt its effective depth to the dynamics. For systems with known structure, we build Hamiltonian Neural Networks where \(f_\theta\) derives from a learned Hamiltonian \(H_\theta(q,p)\), enforcing symplectic flow and energy preservation. For dissipative systems, Neural SDEs incorporate stochastic forcing and learn drift/diffusion terms consistent with observed statistics.

Numerics Matters. The solver is part of the model. Step size control, stiffness handling, and event detection all affect gradients and generalization. When data are irregularly sampled, continuous-time models naturally interpolate; when data are long and chaotic, Lyapunov growth demands regularization and short prediction windows with receding-horizon training.

Neural PDEs. Parameterize fluxes or closures inside finite-volume/element schemes (e.g., learn subgrid turbulence stress while conserving mass and momentum). Here the neural network is not a surrogate for the full PDE, but a trainable piece inside a stable solver.

Neural ODE flow and adjoint sensitivity
Neural ODEs learn continuous dynamics; the adjoint method backpropagates through the solver without storing intermediate states.

Symbolic Regression & Equation Discovery

Symbolic regression searches over expressions to find compact laws that explain data — think of rediscovering \(F=ma\) or \(u_t=\nu u_{xx}\) from trajectories or fields. Sparse identification of nonlinear dynamics (SINDy) constructs a library of candidate terms and selects a parsimonious model via sparse regression. Genetic programming explores expression trees. Physics constraints (dimensional analysis, symmetries) shrink the search space and improve identifiability; automatic differentiation supplies clean derivatives when noise is moderate, while weak-form approaches integrate against test functions to tolerate noise and coarse sampling.

The reward is interpretability: discovered equations become simulators, suggest conserved quantities, and guide further experiments. The cost is combinatorics; regularization, priors, and active data collection tame it.

Hybrid Modeling Playbook

1) Start with units and scales — non-dimensionalize to expose small parameters and stiffness. 2) Encode hard constraints (mass conservation, boundary values) by construction; penalize soft ones (constitutive laws) in the loss. 3) Split models into trusted physics and learned closures; keep the numerical integrator stable and differentiable. 4) Validate with a priori tests (recovery of known solutions), a posteriori rollouts, and conservation diagnostics. 5) Quantify uncertainty: ensembles, Bayesian layers, and interval bounds distinguish data scarcity from model misfit.

Examples. Learn drag coefficients from sparse trajectory data while enforcing Newton’s laws; recover viscosity fields from pressure/velocity snapshots by minimizing Navier–Stokes residuals; emulate quantum time evolution with a neural correction to a split-operator scheme; infer constitutive relations in materials from stress–strain data with thermodynamic consistency.

Quick Quiz – Data-Driven Physics

1) What makes a neural network “physics-informed” in a PINN?

2) A key advantage of Neural ODEs over discrete-time RNNs is…

3) Symbolic regression is primarily used to…