Simulating the 2-Electron Quantum Dot: Bare-Metal Cython, Tensor Trains, and Strong-Field Dynamics

Simulating the exact quantum mechanics of a 2-body interacting system is a notoriously brutal computational challenge. When dealing with two electrons confined in a harmonic trap (a Quantum Dot or Artificial Atom), the physics is governed by the global trapping potential and the mutually entangled Coulomb repulsion between the particles.

Solving the stationary states and time-dependent dynamics of this system from the ground up requires navigating the curse of dimensionality, bypassing the Coulomb singularity, and writing an engine capable of pushing silicon to its absolute limits. Here is how we engineered a custom, bare-metal quantum solver to crack the 2-body problem natively on Apple Silicon.

1. Bypassing the Coulomb Cusp with Momentum Sinc Basis

The fundamental nightmare of the Coulomb interaction is the $1/|r|$ singularity. In standard real-space grid methods, as two electrons approach each other ($x_i \approx x_j$), the potential energy blows up to infinity. This creates a sharp "cusp" in the multi-particle wavefunction that demands an astronomically dense spatial grid to resolve accurately.

To completely sidestep this numerical trap, the engine calculates the exact Coulomb matrix elements in a momentum sinc basis. By transforming the problem into momentum space, the sharp singularity smooths out into a mathematically well-behaved operator, allowing for high-precision exact matrix elements without artificially inflating the grid resolution.

2. The Algorithmic Cheat Code: Tensor Train Reductions

For $N$ particles on an $L$-point grid, the state tensor naturally scales as $L^{3N}$, causing the memory requirements to explode. To conquer this, the engine employs advanced Tensor Train (TT) reductions (the mathematical sibling of Matrix Product States).

By compressing the massive multi-dimensional Hilbert space into a 1D chain of dense core tensors, the $O(N^3)$memory bottleneck is shattered. This allows for the exact representation of highly entangled 2-electron states using a fraction of the computational overhead.

3. Architecture: The Contiguous Memory Stack

Even with elegant math, fragmented memory can destroy performance. In a standard Python environment, passing thousands of scattered state vectors to a CPU core results in catastrophic cache misses.

To solve this, the engine abandons fragmented objects in favor of a Central Contiguous C-Stack. All Tensor Train cores and quantum states are perfectly aligned in a single block of physical RAM. When the tensor contraction begins, the CPU preemptively loads the data into its ultra-fast L1/L2 cache, completely starving the Python garbage collector and eliminating pointer-chasing overhead.

4. Unleashing the M4: Bare-Metal Cython and nogil

Python’s Global Interpreter Lock (GIL) fundamentally strangles multi-threading, limiting highly parallel quantum tensor contractions to a single core.

The engine was refactored directly into pure C using Cython's nogil compiler directives and OpenMP's prange. By stripping away the Python API entirely inside the matrix generation loop, the engine dynamically distributes the exact Coulomb tensor contractions across the Apple M4 chip's Performance Cores.

The Result: A massive leap in compute efficiency. The engine achieves maximum parallel thread utilization across the M4's Performance Cores while keeping memory overhead virtually nonexistent.

5. Diagonalize the 6D vector space

Using Gorgophone, our third-generation Sums of Products engine, the physical spectrum is digitized across an 85million-point (21**6) lattice in momentum space. The raw matrix elements for the real-valued interacting states are then piped directly into an interactive Streamlit dashboard, allowing users to dynamically visualize the energy gaps, manipulate the Hamiltonian, and explore the highly correlated quantum spectrum in real time.