Matrix-Free Methods

PhoXonic.jl provides matrix-free implementations for large-scale calculations where storing dense matrices is prohibitive.

Overview

Memory and Computational Complexity

OperationDenseMatrix-Free
MemoryO(N²)O(N)
Matrix constructionO(N³)O(N)
Matrix-vector productO(N²)O(N log N)

For a 3D calculation with N = 10,000 plane waves and 3 components:

  • Dense: ~14 GB for matrices
  • Matrix-free: ~few MB

Supported Wave Types

Wave TypeDimensionLHS FFTsRHS FFTs
TEWave2D42
TMWave2D42
SHWave2D42
PSVWave2D164
Photonic1D1D22
Longitudinal1D1D22
FullVectorEM3D246
FullElastic3D426

Grid Resolution Constraint

Important

The solver's grid resolution must be large enough to contain all plane wave indices.

The required resolution in each dimension is:

resolution >= 2 * cutoff + 1
CutoffMin ResolutionRecommended
7 (default)1516 or 32
102132
153132

Dense vs Matrix-Free Equivalence

When the resolution constraint is satisfied, matrix-free and dense methods produce mathematically identical results (up to floating-point rounding errors ~10⁻¹⁵).

Dense method (convolution matrix):

M[i,j] = ε̃(Gᵢ - Gⱼ)
y = M * x                    # O(N²) matrix-vector product

Matrix-free method (FFT-based):

y = FFT{ ε(r) · iFFT{x} }    # O(N log N) via FFT

By the convolution theorem of discrete FFT, both compute the same cyclic convolution.

Why Aliasing Occurs

When the grid is too small, aliasing corrupts the results:

cutoff=7 → plane wave indices: -7 to +7

With resolution 16×16:
  - Grid can hold indices -8 to +7
  - Problem: multiplying two plane waves extends the range
    Example: G=5 × G=5 → needs G=10, but grid can only hold up to G=7
  - High-frequency components wrap around (cyclic convolution artifact)
  - Result: ~4% error

With resolution 32×32:
  - Grid can hold indices -16 to +15
  - Sufficient margin for all products of plane waves
  - Result: ~10⁻¹⁶ error (machine precision)

Example: Correct vs Incorrect Resolution

# Incorrect: resolution too small for cutoff=7
solver_bad = Solver(TEWave(), geo, (8, 8); cutoff=7)  # Error ~96%

# Correct: resolution >= 2*7+1 = 15
solver_good = Solver(TEWave(), geo, (16, 16); cutoff=7)  # Error ~10⁻¹⁶
solver_good = Solver(TEWave(), geo, (32, 32); cutoff=7)  # Also correct

Usage with Iterative Eigensolvers

See also: examples/208_solver_comparison.jl - Dense vs LOBPCG performance comparison

Matrix-free operators are used automatically with iterative solvers:

using PhoXonic

# Create geometry
lat = square_lattice(1.0)
mat_rod = Dielectric(8.9)
mat_air = Dielectric(1.0)
rod = Circle([0.5, 0.5], 0.2)
geo = Geometry(lat, mat_air, [(rod, mat_rod)])

# Use resolution >= 2*cutoff+1
solver = Solver(TEWave(), geo, (32, 32), KrylovKitMethod(); cutoff=7)

# Solve - uses matrix-free internally
k = [0.5, 0.0]
ω, modes = solve(solver, k; bands=1:5)

For 3D calculations with matrix-free methods, FullVectorEM with shift-and-invert is used:

lat = cubic_lattice(1.0)
geo = Geometry(lat, Dielectric(1.0), [(Sphere([0.5,0.5,0.5], 0.3), Dielectric(12.0))])

# Resolution must be >= 2*cutoff+1 in each dimension
solver = Solver(FullVectorEM(), geo, (16, 16, 16), KrylovKitMethod(shift=0.01); cutoff=7)
ω, modes = solve(solver, [0.1, 0.1, 0.1]; bands=1:6)

Note: For most 3D photonic crystal calculations, TransverseEM with DenseMethod() is recommended as it eliminates spurious modes without requiring shift-and-invert. See 3D Calculations for details.

Unified Green's Function API

PhoXonic provides a unified API for Green's function, DOS, and LDOS calculations. The same GFMethod types work with all three functions.

For detailed documentation on DOS/LDOS functions, see DOS / LDOS.

Available Methods

MethodMemoryDescription
DirectGF()O(N²)LU factorization, most accurate
RSKGF()O(N²)ReducedShiftedKrylov.jl
MatrixFreeGF()O(N)Matrix-free RSCG, best for large systems

Basic Usage

using PhoXonic

solver = Solver(TEWave(), geo, (32, 32); cutoff=7)
k = [0.5, 0.0]
ω_values = [0.3, 0.4, 0.5, 0.6]
k_points = [[0.0, 0.0], [0.5, 0.0], [0.5, 0.5]]

# Default (DirectGF) - no method argument needed
G = compute_greens_function(solver, k, ω_values, source; η=1e-2)
dos = compute_dos(solver, ω_values, k_points; η=1e-2)
ldos = compute_ldos(solver, [0.5, 0.5], ω_values, k_points; η=1e-2)

# With explicit method
G = compute_greens_function(solver, k, ω_values, source, DirectGF(); η=1e-2)
dos = compute_dos(solver, ω_values, k_points, MatrixFreeGF(); η=1e-2)
ldos = compute_ldos(solver, [0.5, 0.5], ω_values, k_points, RSKGF(); η=1e-2)

MatrixFreeGF Options

MatrixFreeGF has an rhs_inv_method option for RHS⁻¹ application:

OptionDescriptionSpeedAccuracy
ApproximateRHSInv()Element-wise 1/ε (default)FastApproximate for inhomogeneous media
CGRHSInv()Inner CG iterationSlowerExact
# Fast approximate method (default)
ldos = compute_ldos(solver, pos, ω_values, k_points, MatrixFreeGF(); η=1e-2)

# Exact CG method
ldos = compute_ldos(solver, pos, ω_values, k_points, MatrixFreeGF(rhs_inv_method=CGRHSInv()); η=1e-2)

# CGRHSInv with custom parameters
ldos = compute_ldos(solver, pos, ω_values, k_points,
    MatrixFreeGF(rhs_inv_method=CGRHSInv(atol=1e-12, rtol=1e-12, maxiter=200)); η=1e-2)
When to use CGRHSInv
  • ApproximateRHSInv() is sufficient for most cases, especially for peak detection
  • Use CGRHSInv() when accurate absolute values are needed for high-contrast structures
RSCG Convergence

The RSCG solver may not fully converge for all problems. This affects RSKGF and MatrixFreeGF equally.

Symptoms of incomplete convergence:

  • Relative errors of a few percent compared to direct solve

Mitigation strategies:

  • Increase itmax parameter
  • Use smaller η (broadening) values
  • For critical applications, use DirectGF()
Practical Usage

MatrixFreeGF() is recommended for:

  • Peak/resonance frequency detection (accurate even without full convergence)
  • Large-scale calculations where Dense methods are memory-prohibitive
  • Qualitative spectral analysis

DirectGF() is recommended for:

  • Accurate absolute LDOS/DOS values
  • Quantitative analysis requiring high precision
  • Smaller systems where O(N²) memory is acceptable

Method Comparison

# Direct solve - O(N²) memory, O(N³) per frequency
G_direct = compute_greens_function(solver, k, ω_values, source, DirectGF(); η=1e-2)

# Dense RSK - O(N²) memory, efficient for many frequencies
G_rsk = compute_greens_function(solver, k, ω_values, source, RSKGF(); η=1e-2)

# Matrix-free - O(N) memory, O(N log N) per iteration
G_mf = compute_greens_function(solver, k, ω_values, source, MatrixFreeGF(); η=1e-2)

# Matrix-free with exact RHS⁻¹
G_mf_cg = compute_greens_function(solver, k, ω_values, source, MatrixFreeGF(rhs_inv_method=CGRHSInv()); η=1e-2)

Low-Level API

FFTContext and MatrixFreeWorkspace

For optimal performance in loops, separate FFT plan creation from workspace allocation:

# FFTContext: holds FFT plans (create once, reuse)
ctx = FFTContext(solver)

# MatrixFreeWorkspace: holds work arrays (one per thread)
workspace = MatrixFreeWorkspace(ctx)

# Create operator with explicit context and workspace
op = MatrixFreeOperator(solver, k, ctx, workspace)
Thread Safety

FFT plan creation is NOT thread-safe (FFTW.jl limitation). Create FFTContext from a single thread before entering parallel regions.

FFT plan execution IS thread-safe. The same FFTContext can be shared across threads, but each thread must have its own MatrixFreeWorkspace to avoid data races.

MatrixFreeOperator

# Simple usage (creates new FFT plans internally)
op = MatrixFreeOperator(solver, k)

# Optimized usage (reuse FFT plans)
ctx = FFTContext(solver)
workspace = MatrixFreeWorkspace(ctx)
op = MatrixFreeOperator(solver, k, ctx, workspace)

# Apply LHS: y = LHS * x
y = zeros(ComplexF64, N)
apply_lhs!(y, op, x)

# Apply RHS: y = RHS * x
apply_rhs!(y, op, x)

MatrixFreeEffectiveHamiltonian

For RSCG integration:

# Create effective Hamiltonian H = RHS⁻¹ * LHS
op = MatrixFreeOperator(solver, k)
H = MatrixFreeEffectiveHamiltonian(op)

# Negate for RSCG: A = -H
A = NegatedOperator(H)

# Solve (σI + A)x = b for multiple shifts
shifts = [ω^2 + im*η for ω in ω_values]
x_solutions, stats = ReducedShiftedKrylov.rscg(A, b, shifts)

LinearMap Interface

using LinearMaps  # https://julialinearalgebra.github.io/LinearMaps.jl/stable/

# Convert to LinearMap for use with other iterative solvers
A_lhs = to_linear_map_lhs(op)  # LHS as LinearMap
B_rhs = to_linear_map_rhs(op)  # RHS as LinearMap

Performance Tips

  1. Choose resolution wisely: Use resolution = 2^n for optimal FFT performance (e.g., 16, 32, 64 instead of 15, 31, 63).

  2. Reuse FFT plans: Use FFTContext and MatrixFreeWorkspace to avoid recreating FFT plans in loops:

    ctx = FFTContext(solver)           # Create once (not thread-safe)
    workspace = MatrixFreeWorkspace(ctx)  # One per thread
    
    for k in k_points
        op = MatrixFreeOperator(solver, k, ctx, workspace)
        # ... use op ...
    end
  3. Thread safety: FFT plan execution is thread-safe, but creation is not. For parallel k-point loops:

    ctx = FFTContext(solver)  # Create once in main thread
    Threads.@threads for k in k_points
        ws = MatrixFreeWorkspace(ctx)  # Each thread needs its own workspace
        op = MatrixFreeOperator(solver, k, ctx, ws)
        # ... use op ...
    end
  4. 3D calculations: Matrix-free is essential for 3D. Dense matrices for N = 10,000 require ~14 GB, while matrix-free uses only a few MB.

  5. RSCG convergence: Matrix-free RSCG often converges better than dense RSCG due to better numerical conditioning.

API Reference

  • Advanced API - MatrixFreeOperator, FFTContext, EffectiveHamiltonian