TensorFlow Backprop: Mixed Real & Complex Functions
Learn how TensorFlow handles backpropagation through mixed real and complex-valued functions using Wirtinger derivatives, GradientTape, and complex-aware ops. Covers chain rule, domain boundaries, and practical tips for CVNNs in comms systems.
How does TensorFlow handle backpropagation through mixed real and complex-valued functions?
I’m training an end-to-end communication system using TensorFlow and Sionna that involves multiple functions with mixed real and complex domains. My system pipeline looks something like this:
import tensorflow as tf
# input: real-valued bits
bits = tf.random.uniform([batch_size, num_bits], 0, 2, dtype=tf.int32)
# f1: Neural mapper (real → complex)
symbols = neural_mapper(bits) # output: complex64
# f2: Channel (complex → complex)
received = OFDMChannel(symbols) # output: complex64
# f3: Neural demapper (complex → real)
llrs = neural_demapper(received) # output: float32
# loss: real-valued
loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=bits, logits=llrs)
When I use tf.GradientTape() to compute gradients and update my neural mapper parameters, training seems to work without errors (the loss backpropagates through the multiple functions back to the neural mapper). However, I don’t understand:
-
How does the chain rule work when we’re mixing complex and real domains? The mathematical chain rule I learned assumes everything is in the same domain.
-
What happens at domain boundaries? For example:
- When
neural_mapperoutputs complex values from real inputs - When operations like
tf.abs(),tf.angle(), ortf.real()convert complex → real - When the loss backpropagates into complex-valued layers
- Does TensorFlow:
- Automatically convert complex tensors to real (e.g., treat complex as two real channels)?
- Use complex differentiation (e.g., Wirtinger derivatives)? If so, does it handle real and complex operations differently?
Which approach does TensorFlow use for gradient computation across mixed real/complex graphs, and are there any caveats or recommended practices when training models that mix real and complex tensors?
TensorFlow runs backpropagation through mixed real ↔ complex graphs by treating complex dtypes as first‑class tensors and using complex-aware gradients (i.e., Wirtinger-style differentiation) inside its autodiff engine (tf.GradientTape). At domain boundaries (tf.complex, tf.real, tf.abs, tf.angle, etc.) TensorFlow uses registered complex gradients when available; if an op lacks complex support you’ll get None and must provide a custom gradient or split into real/imag. In practice the gradient that reaches your mapper will be a complex-typed value (or real pieces if the weights are stored as separate real/imag variables) and optimizers update the real/imag components accordingly.
Contents
- How TensorFlow’s autodiff computes complex gradients
- Chain rule across mixed real/complex domains
- Domain boundaries: complex→real and real→complex ops
- Practical caveats & recommended practices for TensorFlow complex gradients
- Minimal end-to-end TensorFlow example
- Sources
- Conclusion
How TensorFlow’s autodiff computes complex gradients
TensorFlow’s automatic differentiation (tf.GradientTape) records the forward operations and then walks them backwards, applying each op’s registered gradient rule to accumulate gradients — the same mechanism works for complex dtypes as for real dtypes. The official guide explains the recording/backprop mechanism in general and shows that ops executed inside a GradientTape are replayed in reverse to compute gradients [https://www.tensorflow.org/guide/autodiff]. TensorFlow also exposes complex dtypes (complex64/complex128) and functions to build complex tensors (e.g., tf.complex) so complex-valued graphs are representable as first‑class tensors [https://www.tensorflow.org/api_docs/python/tf/dtypes/complex].
At the implementation level TensorFlow’s eager backprop machinery treats complex tensors as differentiable types and relies on gradient registrations for each op; see the backprop source where complex/real dtypes are considered differentiable [https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/eager/backprop.py]. Academic and community work confirms the conceptual approach: frameworks compute complex gradients using Wirtinger calculus (treating z and z* as independent) when needed, and TensorFlow’s complex-autodiff behaviour has been described in the literature [https://arxiv.org/pdf/2312.06087, https://arxiv.org/pdf/2302.08286].
So: TensorFlow does not “throw away” the imaginary part silently. It propagates gradients using complex-aware rules when ops provide them; otherwise you must intervene.
Chain rule across mixed real/complex domains
The math you learned for the chain rule extends — but you must express derivatives in the complex plane using Wirtinger derivatives. Let z = x + i y. Define the Wirtinger operators
For a real scalar loss L that depends on complex tensors (L = L(z,z*)), the chain rule for a parameter θ that influences z is
Why two terms? Because most useful functions in communications are not holomorphic (they depend on both z and z*), so you must account for both partials. TensorFlow’s autodiff accumulates these contributions by applying each op’s registered gradient function. If the op’s gradient is implemented using Wirtinger rules (most standard complex-aware ops are), the contributions are correct and the resulting gradient, when projected to your trainable variables, yields the same real partial derivatives on the real and imaginary parts that you’d get by manual differentiation.
Reference: this is the standard approach used in complex-valued neural network theory and backpropagation analyses [https://pmc.ncbi.nlm.nih.gov/articles/PMC4012068/, https://papers.neurips.cc/paper_files/paper/2022/file/dc06d4d2792265fb5454a6092bfd5c6a-Paper-Conference.pdf].
Domain boundaries: complex→real and real→complex ops
How the gradient is propagated depends on the operation at the boundary. Below are common cases you’ll meet in an end‑to‑end comms stack.
Real → Complex (neural mapper outputs complex values)
Most mapper implementations produce two real channels (real part and imaginary part) and combine them with tf.complex(real, imag). The tf.complex construction is linear: if z = x + i y, the derivative of z wrt x and y is trivial (1 and i factors), so gradients back to the real-valued parameters that produced x and y flow normally. TensorFlow supports tf.complex and complex dtypes directly [https://www.tensorflow.org/api_docs/python/tf/dtypes/complex].
Example pattern:
- Build two real-valued heads (Dense layers) →
r, i symbols = tf.complex(r, i)
This is the most compatible approach with standard Keras layers.
Complex → Real ops (tf.real, tf.abs, tf.angle, other measurements)
- tf.real / tf.math.real and tf.math.imag are linear extracts of Re/Im; their gradients flow back to the corresponding components (they’re simple and well-behaved).
- tf.abs (magnitude) and tf.angle are non-holomorphic. Using Wirtinger calculus you get closed-form partials; for example (z ≠ 0)
So tf.abs produces a real tensor and a complex-valued gradient that points along the conjugate direction — TensorFlow will propagate that if tf.abs has a complex gradient registered (it does for standard TF ops). tf.angle has gradients too but beware of branch-cut behaviour and non-differentiability at z = 0; that can cause instabilities.
Loss (real) → complex-valued layers
When your final loss is real (like cross-entropy computed on real logits), the backward pass computes the appropriate Wirtinger partials upstream. Practically:
- If a parameter is stored as separate real and imag variables (common when you implement complex layers using two real weights), you’ll see real gradients for each part.
- If a parameter is a single complex tf.Variable (less common but supported), the gradient returned for that variable will be complex dtype. TensorFlow’s backprop machinery and optimizers handle complex dtypes by updating real/imag parts consistently (but check behavior if you implement custom optimizers).
If any op in the chain has no gradient for complex inputs you’ll get None for that path — that’s your signal to implement a gradient or change design.
(Background references: TF autodiff behavior and code-level handling of complex dtypes [https://www.tensorflow.org/guide/autodiff, https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/eager/backprop.py]; theoretical justification via Wirtinger calculus [https://arxiv.org/pdf/2312.06087].)
Practical caveats & recommended practices for TensorFlow complex gradients
- Check op gradient support early. If
tape.gradient(loss, var)returns None for some var, find the op that severs the path. That usually means the op lacks a complex gradient (or you forgot to mark a variable trainable). - If an op lacks complex gradients, either:
- implement a custom gradient with
@tf.custom_gradient, or - re-express the computation in real and imaginary channels (common practical workaround — see community examples) — e.g., feed
tf.concat([tf.math.real(z), tf.math.imag(z)], axis=-1)into standard Keras layers. StackOverflow threads cover this pattern [https://stackoverflow.com/questions/47721615/how-to-backpropagate-with-complex-valued-weights, https://stackoverflow.com/questions/57108959/how-does-tf-gradients-manages-complex-functions]. - Prefer numeric safety: ops like
tf.absdivide by magnitude — add a small epsilon when you implement custom formulas to avoid NaN at z = 0. - Beware
tf.angleand branch cuts. Angle’s gradient is valid away from the origin; avoid relying on angle in optimization unless you handle discontinuities. - Keras layer compatibility: most built-in Keras layers are real-valued. If you want true complex layers, either:
- build layers from two real heads (my preferred), or
- use community CVNN libraries that add complex layers/activations on top of TF (examples:
cvnn,keras-complex) [https://github.com/NEGU93/cvnn, https://github.com/JesperDramsch/keras-complex]. - Debugging tips:
- Print dtypes and check for
Nonegradients:grads = tape.gradient(loss, vars); for v,g in zip(vars,grads): print(v.name, g is None, None if g is None else g.dtype) - Do a small finite-difference test (perturb real and imag parts separately) to sanity-check an autodiff gradient for a particular tensor.
- Optimizers: TensorFlow’s optimizers generally work with complex dtypes by keeping slot variables of matching dtype; training that “just worked” in your run suggests your chosen optimizer accepted complex gradients. If you run into surprising behavior, split weights into real/imag and update them separately.
Practical reading: community libraries and papers on CVNNs are useful because they gather tested layers and activation choices that make training stable [https://github.com/NEGU93/cvnn, https://arxiv.org/pdf/2302.08286].
Minimal end-to-end TensorFlow example
Below is a compact pattern that matches your pipeline: the mapper produces complex symbols from real inputs, channel is complex, demapper consumes complex signals by first converting to real/imag channels, and the loss is real. This demonstrates how gradients flow back to mapper parameters.
import tensorflow as tf
batch_size = 16
num_bits = 8
bits = tf.cast(tf.random.uniform([batch_size, num_bits], 0, 2, dtype=tf.int32), tf.float32)
# Mapper: two real heads -> complex symbols
class Mapper(tf.keras.Model):
def __init__(self, symbol_dim):
super().__init__()
self.r_head = tf.keras.layers.Dense(symbol_dim)
self.i_head = tf.keras.layers.Dense(symbol_dim)
def call(self, x):
r = self.r_head(x)
i = self.i_head(x)
return tf.complex(r, i) # complex64
# Demapper: take complex input, split to real/imag, then real nets -> logits
demapper = tf.keras.Sequential([
tf.keras.layers.Lambda(lambda z: tf.concat([tf.math.real(z), tf.math.imag(z)], axis=-1)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(num_bits) # logits (real)
])
mapper = Mapper(symbol_dim=4)
opt = tf.keras.optimizers.Adam(1e-3)
# Single training step
with tf.GradientTape() as tape:
symbols = mapper(bits) # complex64
# Simple channel: complex gain + AWGN
h = tf.complex(1.2, -0.3)
noise = tf.complex(tf.random.normal(tf.shape(symbols), stddev=0.05),
tf.random.normal(tf.shape(symbols), stddev=0.05))
received = h * symbols + noise # complex64
llrs = demapper(received) # float32 logits
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=bits, logits=llrs))
vars_all = mapper.trainable_variables + demapper.trainable_variables
grads = tape.gradient(loss, vars_all)
for v, g in zip(vars_all, grads):
print(v.name, "dtype:", v.dtype, "grad is None?", g is None, "grad dtype:", None if g is None else g.dtype)
opt.apply_gradients(zip(grads, vars_all))
Notes about the example:
mapperhere has real-valued weights (two Dense heads) so gradients to those weights are real. If you instead implement complex kernels (store real/imag weights and combine into a complex kernel inside a custom layer), TF will differentiate w.r.t. the real/imag weights; gradients will match the Wirtinger-derived partials.- If
gradscontainsNonefor some variable, inspect the ops between that variable and the loss. That usually signals a missing complex gradient for an op or a non-differentiable operation (quantization, argmax, stop_gradient).
Sources
- TensorFlow: Introduction to gradients and automatic differentiation
- tf.dtypes.complex | TensorFlow API
- TensorFlow backprop (eager) — source
- How does tf.gradients manage complex functions? — Stack Overflow
- Complex-Valued Neural Networks - Theory and Analysis (arXiv)
- Theory and Implementation of Complex-Valued Neural Networks (arXiv)
- Convergence analysis of fully complex backpropagation algorithm based on Wirtinger calculus (PMC)
- cvnn — Library to help implement a complex-valued neural network using TensorFlow (GitHub)
- Keras Complex — complex-valued convolutional neural networks (GitHub)
- Real-Valued Backpropagation is Unsuitable for Complex-Valued Neural Networks (NeurIPS paper)
- Autograd mechanics — PyTorch docs (comparative notes on complex gradients)
- How to backpropagate with complex valued weights — Stack Overflow
Conclusion
TensorFlow does backpropagation across mixed real and complex-valued functions by treating complex tensors as first-class dtypes and applying complex-aware gradient rules (consistent with Wirtinger calculus) via tf.GradientTape. Domain boundaries (tf.complex, tf.real, tf.abs, tf.angle, etc.) are handled by per-op gradient implementations; when an op lacks complex support you’ll see None and should either provide a custom gradient or express the computation in real/imag channels. In short: TensorFlow complex gradients work end‑to‑end for common communications pipelines (including Sionna channels), but watch for ops without complex gradients, numerical edge cases (|z|≈0, angle branch cuts), and Keras layer compatibility — and test by printing gradients and doing small finite-difference checks.