Part 5: Switching to GROMACS for higher throughput

Part 5 of the Getting Started with Membrane MD series. Last updated: May 2026. Tested with GROMACS 2025.2 (OpenCL build) on an Apple M1 Ultra.

The previous parts of this series got you running and analyzing membrane simulations in NAMD. NAMD is an excellent choice for learning — its inputs are human-readable, its outputs are easy to inspect, and the same .psf/.pdb files are accepted by VMD without any conversion. But for production research where you need to generate hundreds of nanoseconds of trajectory, NAMD on Apple Silicon is CPU-only and tops out around 8-9 ns/day on a Mac Studio M1 Ultra. A trajectory of 500 ns — typical for publication-quality bilayer work — would take roughly two months.

GROMACS is a different MD engine with the same scientific lineage as NAMD (both implement standard force fields like CHARMM36, both produce equivalent physics) but with much better performance on a wider variety of hardware. Notably, GROMACS 2025 has working OpenCL support for Apple Silicon GPUs, which gives roughly a 3.5× speedup over NAMD on the same hardware. The same 500 ns trajectory now takes about two weeks instead of two months — bringing publication-quality work within reach on a single Mac.

This part of the series covers everything needed to switch: building a GROMACS-format system in CHARMM-GUI, installing GROMACS on Apple Silicon (with a small but unavoidable Xcode quirk), and running the equivalent of the workflow from Part 3. We finish with a quantitative head-to-head comparison.

In this series

Part 1: Setting up your Mac for MD simulations
Part 2: Building a lipid bilayer with CHARMM-GUI
Part 3: Running your first simulation in NAMD
Part 4: Visualization and analysis in VMD
Part 5: Switching to GROMACS for higher throughput (this page)

Benchmark summary

To give you the punchline upfront — here’s what NAMD and GROMACS look like running the identical DOPC bilayer system (200 lipids, 48,412 atoms, 30°C, 0.15 M NaCl, CHARMM36 force field) on the same M1 Ultra Mac Studio:

Metric	NAMD 3.0.2	GROMACS 2025.2	Experimental
Performance	8.5 ns/day	30.0 ns/day	—
Speedup over NAMD	1.0×	3.5×	—
Wall time for 1 ns	2h 50m	47 min	—
Wall time for 10 ns production	28 hours	8 hours	—
GPU acceleration	CPU only	OpenCL (Apple GPU)	—
Mean APL (last 5 ns)	68.55 Å²	67.73 Å²	~67.5 Å²
Mean thickness (last 5 ns)	38.41 Å	38.63 Å	37-39 Å

Both engines reproduce the experimental DOPC values within statistical fluctuation (~±1 Å² for APL). The equivalence in scientific output combined with the throughput difference makes GROMACS the better choice for production work on Apple Silicon.

A note on what we’re not comparing: NAMD on Linux with an NVIDIA GPU (in GPU-resident mode) does roughly 100-300 ns/day on this same system — 12-35× faster than NAMD CPU on Mac. So an NVIDIA Linux workstation remains the right choice for very large or very long simulations. For learning, development, and small-to-medium projects on the Mac you already have, NAMD CPU and GROMACS OpenCL are both viable; GROMACS just gets you results sooner.

Step 1: Rebuild the system in CHARMM-GUI

GROMACS reads different file formats than NAMD: .gro for coordinates, .top for topology, .mdp for run parameters. CHARMM-GUI generates all of these directly if you tell it to. The simplest approach is to start a fresh CHARMM-GUI job with the GROMACS output option checked.

Repeat the workflow from Part 2 end-to-end, with one difference at Step 8 (Force Field Options):

Uncheck NAMD
Check GROMACS (the other options can stay the same)

Everything else — the lipid composition, box geometry, ions, equilibration protocol, force field — should match what you used in Part 2. When CHARMM-GUI generates and you download the tarball, unpack it to a new folder, parallel to (but separate from) your NAMD project:

mkdir -p ~/Dropbox/path/to/MD-simulations/dopc_test_gromacs
mv ~/Downloads/charmm-gui.tgz ~/Dropbox/path/to/MD-simulations/dopc_test_gromacs/
cd ~/Dropbox/path/to/MD-simulations/dopc_test_gromacs/
tar -xzf charmm-gui.tgz
cd charmm-gui-*
ls gromacs/

You should see something like:

index.ndx
README
step5_input.gro
step5_input.pdb
step5_input.psf
step6.0_minimization.mdp
step6.1_equilibration.mdp
... step6.6_equilibration.mdp
step7_production.mdp
topol.top
toppar/

The .mdp files are GROMACS’s equivalent of NAMD’s .inp files — text files specifying run parameters (integrator, timestep, thermostat, etc.). The topol.top is the topology, and the toppar/ folder contains the force field parameter files in GROMACS’s .itp format.

Step 2: Install GROMACS on Apple Silicon

GROMACS doesn’t come as a pre-built binary for macOS — we need to build it from source. The recipe is well-trodden but takes ~30-45 minutes of compile time.

Prerequisites

We need Homebrew (for cmake), a working C++ compiler (Apple’s clang, which comes with Xcode Command Line Tools), and a small fix for a Xcode CLT quirk we’ll explain shortly.

If you haven’t installed Homebrew already (we used it in Part 1 for some adjacent tooling), do so now:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then install cmake:

brew install cmake

The Xcode SDKROOT quirk

Modern Xcode Command Line Tools (version 26.x as of mid-2026) install the C++ standard library headers only inside the macOS SDK, not in the location where Apple’s clang searches by default. This is a known Apple quirk that breaks any C++ build that includes <iostream> (i.e., every C++ project) unless we set the SDKROOT environment variable.

Add this to your ~/.zshrc:

echo 'export SDKROOT=$(xcrun --show-sdk-path)' >> ~/.zshrc
source ~/.zshrc

Verify it took effect:

echo "SDKROOT=$SDKROOT"

Should print something like /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk. Now C++ compilation will find the standard library headers correctly.

Download GROMACS 2025.2

cd ~/software
curl -O https://ftp.gromacs.org/gromacs/gromacs-2025.2.tar.gz
tar -xzf gromacs-2025.2.tar.gz
cd gromacs-2025.2
mkdir build
cd build

Configure with cmake

GROMACS uses out-of-tree builds (build artifacts in a separate build/ directory). The configuration flags below enable Apple GPU support and disable features that don’t work cleanly on macOS:

cmake .. \
  -DGMX_BUILD_OWN_FFTW=ON \
  -DGMX_GPU=OpenCL \
  -DGMX_OPENMP=OFF \
  -DREGRESSIONTEST_DOWNLOAD=ON \
  -DCMAKE_INSTALL_PREFIX=$HOME/software/gromacs-2025.2-install \
  -DGMX_DOUBLE=OFF

What each flag does:

-DGMX_BUILD_OWN_FFTW=ON — GROMACS downloads and builds its own copy of FFTW (avoids Homebrew FFTW version mismatches)
-DGMX_GPU=OpenCL — enables OpenCL GPU support. On macOS this routes to the Apple GPU via Apple’s OpenCL implementation.
-DGMX_OPENMP=OFF — disables OpenMP. Apple’s clang doesn’t ship with OpenMP support, and trying to add it causes more problems than it solves. We get parallelism via thread-MPI instead.
-DREGRESSIONTEST_DOWNLOAD=ON — fetches regression tests we can optionally run later to verify the build
-DCMAKE_INSTALL_PREFIX=... — where make install will copy the final binaries
-DGMX_DOUBLE=OFF — single precision build (good for production; we’ll build double precision separately for the minimization step)

Cmake takes 1-2 minutes. The final summary should show no errors. Harmless warnings about missing Sphinx or LaTeX can be ignored (those are documentation tools, not needed).

Compile and install

make -j8
make install

The compile step is the long one — about 30-45 minutes on an M1 Ultra. The -j8 parallelizes the build across 8 cores. There will be lots of output; warnings about deprecated features and unused variables are normal. As long as you don’t see lines starting with error:, the build is progressing fine.

Build double-precision for minimization

CHARMM-GUI’s GROMACS workflow uses double-precision GROMACS for the minimization step (more numerically stable when relieving severe atom clashes from the initial structure). We need to build that separately because OpenCL isn’t supported in double precision:

cd ~/software/gromacs-2025.2
mkdir build-double
cd build-double

cmake .. \
  -DGMX_BUILD_OWN_FFTW=ON \
  -DGMX_GPU=OFF \
  -DGMX_OPENMP=OFF \
  -DCMAKE_INSTALL_PREFIX=$HOME/software/gromacs-2025.2-install \
  -DGMX_DOUBLE=ON \
  -DGMX_DEFAULT_SUFFIX=OFF \
  -DGMX_BINARY_SUFFIX=_d \
  -DGMX_LIBS_SUFFIX=_d

make -j8
make install

Differences from the single-precision build:

-DGMX_GPU=OFF — no GPU (OpenCL doesn’t support double precision)
-DGMX_DOUBLE=ON — enables double precision
-DGMX_BINARY_SUFFIX=_d — names the binary gmx_d (so it doesn’t collide with the single-precision gmx)

This second build takes another ~30-45 minutes.

Add GROMACS to your shell

GROMACS provides a setup script that adds itself to your PATH and configures environment variables:

echo 'source $HOME/software/gromacs-2025.2-install/bin/GMXRC.zsh' >> ~/.zshrc
source ~/.zshrc

Verify:

which gmx
which gmx_d
gmx --version | head -20

The --version output should report:

Precision: mixed
GPU support: OpenCL
SIMD instructions: ARM_NEON_ASIMD
GPU FFT library: VkFFT internal (1.3.1) with OpenCL backend

If you see those four things, the build is complete and ready to run.

Step 3: Set up the simulation directory

Following the same convention as the NAMD setup (cloud-synced inputs, local-disk active runs):

mkdir -p ~/simulations/dopc_test_gromacs
cd ~/simulations/dopc_test_gromacs
cp -r ~/Dropbox/path/to/MD-simulations/dopc_test_gromacs/charmm-gui-*/gromacs/* .
ls

You should now have all the .mdp, .gro, .top, .ndx files plus the toppar/ folder in your working directory.

A key thing to note: unlike the NAMD workflow, GROMACS doesn’t require any patching of the CHARMM-GUI output. The topol.top references the GROMACS-format parameter files in toppar/, which use a clean syntax that GROMACS reads natively. No WRNLEV/BOMLEV comments, no parameter pruning, no path fixes. You can run the workflow as-is.

Step 4: Examine the production parameters

Before running, let’s look at what GROMACS will actually do. Open step7_production.mdp:

cat step7_production.mdp

You’ll see something like:

integrator              = md
dt                      = 0.002
nsteps                  = 500000
nstxout-compressed      = 50000
...
tcoupl                  = v-rescale
tc_grps                 = MEMB SOLV
tau_t                   = 1.0 1.0
ref_t                   = 303.15 303.15
;
pcoupl                  = C-rescale
pcoupltype              = semiisotropic
tau_p                   = 5.0
compressibility         = 4.5e-5  4.5e-5
ref_p                   = 1.0     1.0

This matches the NAMD setup closely: 2 fs timestep, 500,000 steps per chunk = 1 ns/chunk, trajectory every 100 ps, 303.15 K, 1 bar with semi-isotropic pressure coupling (the membrane equivalent of NPT).

One notable difference from NAMD: GROMACS uses the v-rescale thermostat and C-rescale barostat by default, while NAMD typically uses Langevin dynamics for both. Both are valid NPT ensemble samplers, and both produce the same equilibrium distribution. The instantaneous fluctuation patterns will look slightly different between engines on short timescales but the averages converge to the same values.

Step 5: Build the launcher script

The CHARMM-GUI GROMACS output includes a README file with a recommended workflow in csh syntax. We’ll rewrite it as a zsh script with error handling, analogous to the NAMD launcher from Part 3:

cat > run_gromacs.sh << 'EOF'
#!/bin/zsh
# Run GROMACS minimization + 6 equilibration steps + 10 production chunks
# Mirrors CHARMM-GUI's csh README workflow but in zsh with error checking

set -e  # Exit immediately on any command failure

NCORES=8

# ============================================================
# MINIMIZATION (single step, double precision)
# ============================================================
echo "=========================================="
echo "Starting MINIMIZATION at $(date)"
echo "=========================================="

gmx_d grompp -f step6.0_minimization.mdp \
             -o step6.0_minimization.tpr \
             -c step5_input.gro \
             -r step5_input.gro \
             -p topol.top \
             -n index.ndx \
             -maxwarn 1

gmx_d mdrun -v -deffnm step6.0_minimization -nt $NCORES

echo "Finished MINIMIZATION at $(date)"

# ============================================================
# EQUILIBRATION (6 steps, single precision)
# ============================================================
PREV="step6.0_minimization"
for i in 1 2 3 4 5 6; do
    CURR="step6.${i}_equilibration"

    echo "=========================================="
    echo "Starting ${CURR} at $(date)"
    echo "=========================================="

    gmx grompp -f ${CURR}.mdp \
               -o ${CURR}.tpr \
               -c ${PREV}.gro \
               -r step5_input.gro \
               -p topol.top \
               -n index.ndx \
               -maxwarn 1

    gmx mdrun -v -deffnm ${CURR} -nt $NCORES

    echo "Finished ${CURR} at $(date)"
    PREV=${CURR}
done

# ============================================================
# PRODUCTION (10 chunks of 1 ns each, single precision)
# ============================================================
NCHUNKS=10
for i in $(seq 1 $NCHUNKS); do
    CURR="step7_${i}"

    echo "=========================================="
    echo "Starting production chunk ${i}/${NCHUNKS} (${CURR}) at $(date)"
    echo "=========================================="

    if [ $i -eq 1 ]; then
        gmx grompp -f step7_production.mdp \
                   -o ${CURR}.tpr \
                   -c step6.6_equilibration.gro \
                   -p topol.top \
                   -n index.ndx \
                   -maxwarn 1
    else
        PREV_CHUNK="step7_$((i-1))"
        gmx grompp -f step7_production.mdp \
                   -o ${CURR}.tpr \
                   -c ${PREV_CHUNK}.gro \
                   -t ${PREV_CHUNK}.cpt \
                   -p topol.top \
                   -n index.ndx \
                   -maxwarn 1
    fi

    gmx mdrun -v -deffnm ${CURR} -nt $NCORES

    echo "Finished chunk ${i} at $(date)"
done

echo "=========================================="
echo "All GROMACS phases completed at $(date)"
echo "=========================================="
EOF
chmod +x run_gromacs.sh

Key things to understand about this script:

set -e — exits immediately if any command fails. Replaces the per-step exit-code checking from the NAMD launcher with simpler “fail fast” semantics.
The GROMACS pattern is two steps per simulation segment. First gmx grompp (the preprocessor) compiles the parameter file, structure, and topology into a single binary .tpr file. Then gmx mdrun runs the simulation from that .tpr. This is unlike NAMD, where one command (namd3 input.inp) does both jobs.
Continuation between chunks uses -t prev.cpt — the checkpoint file. This makes production chunks bit-perfect continuations of each other, with velocities preserved across chunk boundaries.
Minimization uses gmx_d (double precision) but everything else uses gmx (single precision with GPU).

Step 6: Launch the pipeline

Same workflow as NAMD: prevent the Mac from sleeping, then launch the script detached from Terminal.

caffeinate -i &
nohup ./run_gromacs.sh > gromacs_master.log 2>&1 &

Monitor progress:

tail -f ~/simulations/dopc_test_gromacs/gromacs_master.log

The script runs minimization (~5-10 min), then six equilibration steps (~14-30 min each), then ten production chunks (~47 min each).

Total wall time for the complete pipeline: roughly 9-10 hours. Compare to ~34 hours for the equivalent NAMD pipeline.

Step 7: Per-chunk benchmark

When each production chunk finishes, GROMACS prints its performance directly at the end of the chunk’s log file. Extract all of them:

grep "Performance:" ~/simulations/dopc_test_gromacs/step7_*.log

You’ll see output like:

step7_1.log:Performance:       29.456        0.815
step7_2.log:Performance:       30.752        0.780
...
step7_10.log:Performance:      30.920        0.776

The two columns are ns/day and hour/ns. On the M1 Ultra we measured a mean of 30.0 ns/day, very consistent across chunks (~28.8 to 30.9). Compared to NAMD’s 8.5 ns/day on the same hardware, that’s a 3.5× speedup.

Step 8: Visualization needs PBC repair

GROMACS writes trajectories with atoms wrapped individually to the simulation box. This is efficient for computation but produces visually confusing renders in VMD — lipid molecules straddling box boundaries appear with bonds crossing the entire box.

NAMD, by contrast, writes molecules whole, so no post-processing is needed.

To fix the GROMACS trajectories for visualization, use gmx trjconv:

for i in {1..10}; do
    echo 0 | gmx trjconv -f step7_${i}.xtc -s step7_${i}.tpr -o step7_${i}_whole.xtc -pbc mol -ur compact
done

Decoded:

-pbc mol — keep molecules whole across periodic boundaries
-ur compact — use the most compact unit cell representation
echo 0 | — pipe in the group selection (group 0 = “System” — the entire system)

This generates 10 new _whole.xtc files alongside the originals. Load these into VMD instead of the originals.

Step 9: Load and analyze the GROMACS trajectory in VMD

The GROMACS workflow includes the same step5_input.psf file we used for NAMD, so VMD can read the system the same way. Save this as load_production_gromacs.tcl in your working directory:

cat > load_production_gromacs.tcl << 'EOF'
mol new step5_input.psf type psf waitfor all
for {set i 1} {$i <= 10} {incr i} {
    mol addfile step7_${i}_whole.xtc type xtc waitfor all
}
puts "Loaded [molinfo top get numframes] frames"
EOF

Launch VMD and source it:

vmd

In the Tk Console:

source load_production_gromacs.tcl

You should see ~110 frames loaded (10 chunks × 11 frames each — GROMACS writes an extra frame at t=0 of each chunk, which produces 10 duplicate frames at chunk boundaries; the duplicates are harmless for analysis).

Compute APL and thickness using the same Tk Console scripts from Part 4. The values will match the NAMD results within statistical fluctuation, confirming that the two engines produce equivalent physics on this system.

Where to go from here

You now have working installations of both NAMD and GROMACS, with verified-equivalent output on a benchmark system. Most Heberle Lab work going forward will use GROMACS for the 3.5× throughput, while NAMD remains useful as a sanity check and as a teaching tool because of its more readable inputs.

Some natural next directions:

Longer production runs. With 30 ns/day throughput, 500 ns of trajectory takes about 17 days. This is the typical length needed for converged equilibrium averages of slow membrane properties (lipid diffusion, headgroup order parameters, etc.) and brings publication-quality work within reach.
More complex systems. CHARMM-GUI can build asymmetric bilayers, mixed-lipid bilayers, and membrane-protein systems with the same workflow. The GROMACS pipeline handles all of these without modification.
More advanced analyses. GROMACS includes a suite of analysis tools (gmx rdf, gmx density, gmx order, gmx msd) that can compute properties like radial distribution functions, electron density profiles, lipid tail order parameters, and mean squared displacements. The gmx help command lists what’s available.

Troubleshooting

Build fails with 'iostream' file not found or similar standard library errors

The SDKROOT variable isn’t set. Check echo $SDKROOT — should print a path. If empty, add the export to ~/.zshrc and re-source it (see Step 2 above).

gmx mdrun aborts with “Cannot find force field…“

Make sure the working directory contains the toppar/ subfolder copied from the CHARMM-GUI output. The topol.top file references files in toppar/ with relative paths.

Visualization shows bilayer with bonds crossing the entire box

GROMACS wrote the trajectory with atoms wrapped individually. Apply the gmx trjconv -pbc mol post-processing step described in Step 8 to generate whole-molecule trajectories for visualization.

Performance much lower than 30 ns/day

Check that GPU acceleration is actually engaged. In any production step7_*.log, you should see lines like PME tasks will do all aspects on the GPU and PP tasks will do (non-perturbed) short-ranged interactions on the GPU. If you see only CPU tasks, the OpenCL build isn’t being used — verify gmx --version reports GPU support: OpenCL.

Production chunks run at slightly different speeds

Normal. Background system activity (web browsing, etc.) reduces available CPU/GPU resources. Variation of ~5% between chunks is expected; larger variations suggest something else is competing for resources.

Previous: Part 4 — Visualization and analysis in VMD