Adam: Splitting
It isolates the stochastic direction (the sign of the gradient) from the adaptive step size (the relative variance).
It argues that Adam's second moment actually causes word representations to become narrow and directional (anisotropic).
This paper effectively "splits" the Adam algorithm into two distinct components to study them: Splitting Adam
This version of ADAM is used for "splitting" an elite population of particles to better sample rare events or solve multi-objective optimization problems.
It proposes Coupled Adam to fix this specific side effect. It isolates the stochastic direction (the sign of
If you are coming from a statistics or rare-event simulation background, "ADAM" refers to .
It shows that Adam minimizes a specific form of sharpness —specifically the trace of the square root of the Hessian—which is fundamentally different from how SGD behaves. 4. Better Embeddings with Coupled Adam It proposes Coupled Adam to fix this specific side effect
It's often applied to power grid reliability or particle transport. 3. Adam Reduces a Unique Form of Sharpness