Extreme Theory of Functional Connections
A physics-informed neural network method for solving parametric differential equations We present a novel, accurate, fast, and robust physics-informed neural network method for solving problems involving differential equations (DEs), called Extreme Theory of Functional Connections, or X-TFC. The proposed method is a synergy of two recently developed frameworks for solving problems involving DEs: the Theory of Functional Connections TFC, and the Physics-Informed Neural Networks PINN. Here, the latent solution of the DEs is approximated by a TFC constrained expression that employs a Neural Network (NN) as the free-function. The TFC approximated solution form always analytically satisfies the constraints of the DE, while maintaining a NN with unconstrained parameters. X-TFC uses a single-layer NN trained via the Extreme Learning Machine (ELM) algorithm. This choice is based on the approximating properties of the ELM algorithm that reduces the training of the network to a simple least-squares, because the only trainable parameters are the output weights. The proposed methodology was tested over a wide range of problems including the approximation of solutions to linear and nonlinear ordinary DEs (ODEs), systems of ODEs, and partial DEs (PDEs). The results show that, for most of the problems considered, X-TFC achieves high accuracy with low computational time, even for large scale PDEs, without suffering the curse of dimensionality.
X-TFC Solutions of Differential Equations
Differential Equations are a powerful tool used for the mathematical modelling of various problems arising in scientific fields such as physics, engineering, finance, biology, chemistry, and oceanography, to name few. We can express DEs, in their most general implicit form, as
$$ \gamma f_t + \mathcal{N}\left[f;\lambda\right] = 0 $$
subject to certain constraints, where \( f(t,x) \) represents the unknown solution, \( \mathcal{N}\left[ f ; \lambda \right] \) is a linear or nonlinear operator acting on \(f \) and parameterized by \( \lambda \), and the subscript \( t \) refers to the partial derivative of \(f \) with respect to \(t \).
The first step in our general physics-informed framework is to approximate the latent solution \(f \) with a constrained expression that analytically satisfies the constraints as follows $$ f(x; \Theta ) = f_{CE}(x, g(x); \Theta) = A(x; \Theta) + B(x, g(x); \Theta) $$ where \(x = [t,x]^T \in \Omega \subseteq \mathbb{R}^{n+1}\) with \(t \geq 0\), \(\Theta = [\gamma, \lambda]^T \in \mathbb{P} \subseteq \mathbb{R}^{m+1}\),\(A (x; \Theta)\) analytically satisfies the constraints, and \(B(x, g (x); \Theta)\) projects the free-function \(g(x)\), which is a real valued function, onto the space of functions that vanish at the constraints. According to the X-TFC method, the free-function, \(g(x)\), is chosen to be a single layer feed forward NN, in particular, an ELM, that is $$ g(x) = \sum_{j=1}^{L} \beta_j\sigma \left(w_j^Tx + b_j \right) = [ \sigma_1,...,\sigma_L ] \, \beta = \sigma^T \beta $$ where \(L\) is the number of hidden neurons, \({w}_j\) is the input weights vector connecting the \(j^{th}\) hidden neuron and the input nodes, \(\beta_j \) is the \(j^{th}\) output weight connecting the \(j^{th}\) hidden neuron and the output node, \(b_j\) is the threshold (aka bias) of the \(j^{th}\) hidden neuron, and \(\sigma(\cdot)\) are activation functions. Both weights and bias are previously randomly selected, according to the ELM algorithm.
For this problem, the free-function was chosen to be an ELM with 150 neurons that used tanh as the activation function. The problem was discretized over \(20 \times 20\) training points that spanned the domain, and each iteration of the nonlinear least-squares was solved using NumPy's lstsq function. The total execution time was 22.48 seconds, and the nonlinear least-squares, which needed 10 iterations, took 52.6 milliseconds. In addition, the training set maximum error was \(7.634\times10^{-11}\), and the training set average error was \(9.497\times10^{-12}\). The test set maximum error was \(8.977\times10^{-11}\), and the test set average error was \(1.068\times10^{-11}\); the test set was a \(100\times100\) grid of uniformly spaced points.
The following figure summarizes the results of this example.
The first step in our general physics-informed framework is to approximate the latent solution \(f \) with a constrained expression that analytically satisfies the constraints as follows $$ f(x; \Theta ) = f_{CE}(x, g(x); \Theta) = A(x; \Theta) + B(x, g(x); \Theta) $$ where \(x = [t,x]^T \in \Omega \subseteq \mathbb{R}^{n+1}\) with \(t \geq 0\), \(\Theta = [\gamma, \lambda]^T \in \mathbb{P} \subseteq \mathbb{R}^{m+1}\),\(A (x; \Theta)\) analytically satisfies the constraints, and \(B(x, g (x); \Theta)\) projects the free-function \(g(x)\), which is a real valued function, onto the space of functions that vanish at the constraints. According to the X-TFC method, the free-function, \(g(x)\), is chosen to be a single layer feed forward NN, in particular, an ELM, that is $$ g(x) = \sum_{j=1}^{L} \beta_j\sigma \left(w_j^Tx + b_j \right) = [ \sigma_1,...,\sigma_L ] \, \beta = \sigma^T \beta $$ where \(L\) is the number of hidden neurons, \({w}_j\) is the input weights vector connecting the \(j^{th}\) hidden neuron and the input nodes, \(\beta_j \) is the \(j^{th}\) output weight connecting the \(j^{th}\) hidden neuron and the output node, \(b_j\) is the threshold (aka bias) of the \(j^{th}\) hidden neuron, and \(\sigma(\cdot)\) are activation functions. Both weights and bias are previously randomly selected, according to the ELM algorithm.
For this problem, the free-function was chosen to be an ELM with 150 neurons that used tanh as the activation function. The problem was discretized over \(20 \times 20\) training points that spanned the domain, and each iteration of the nonlinear least-squares was solved using NumPy's lstsq function. The total execution time was 22.48 seconds, and the nonlinear least-squares, which needed 10 iterations, took 52.6 milliseconds. In addition, the training set maximum error was \(7.634\times10^{-11}\), and the training set average error was \(9.497\times10^{-12}\). The test set maximum error was \(8.977\times10^{-11}\), and the test set average error was \(1.068\times10^{-11}\); the test set was a \(100\times100\) grid of uniformly spaced points.
The following figure summarizes the results of this example.