Training end-to-end comms systems with TensorFlow and Sionna mixes real/complex domains. How does chain rule work across boundaries like tf.complex, tf.abs? Does it use Wirtinger derivatives?

TensorFlow uses complex-aware gradients via Wirtinger calculus in GradientTape. It treats complex dtypes as first-class, propagates via registered op gradients. At boundaries (tf.complex, tf.real, tf.abs), uses specific rules; lacks lead to None—split real/imag or custom grads. Optimizers update real/imag parts.

Programming

TensorFlow Backprop: Mixed Real & Complex Functions

Learn how TensorFlow handles backpropagation through mixed real and complex-valued functions using Wirtinger derivatives, GradientTape, and complex-aware ops. Covers chain rule, domain boundaries, and practical tips for CVNNs in comms systems.

1 answer• 1 view

01/06/2026, 11:02 AM

How does TensorFlow handle backpropagation through mixed real and complex-valued functions?

I’m training an end-to-end communication system using TensorFlow and Sionna that involves multiple functions with mixed real and complex domains. My system pipeline looks something like this:

python

import tensorflow as tf

# input: real-valued bits
bits = tf.random.uniform([batch_size, num_bits], 0, 2, dtype=tf.int32)

# f1: Neural mapper (real → complex)
symbols = neural_mapper(bits) # output: complex64

# f2: Channel (complex → complex) 
received = OFDMChannel(symbols) # output: complex64

# f3: Neural demapper (complex → real)
llrs = neural_demapper(received) # output: float32

# loss: real-valued
loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=bits, logits=llrs)

When I use tf.GradientTape() to compute gradients and update my neural mapper parameters, training seems to work without errors (the loss backpropagates through the multiple functions back to the neural mapper). However, I don’t understand:

How does the chain rule work when we’re mixing complex and real domains? The mathematical chain rule I learned assumes everything is in the same domain.
What happens at domain boundaries? For example:

When neural_mapper outputs complex values from real inputs
When operations like tf.abs(), tf.angle(), or tf.real() convert complex → real
When the loss backpropagates into complex-valued layers

Does TensorFlow:

Automatically convert complex tensors to real (e.g., treat complex as two real channels)?
Use complex differentiation (e.g., Wirtinger derivatives)? If so, does it handle real and complex operations differently?

Which approach does TensorFlow use for gradient computation across mixed real/complex graphs, and are there any caveats or recommended practices when training models that mix real and complex tensors?

TensorFlow runs backpropagation through mixed real ↔ complex graphs by treating complex dtypes as first‑class tensors and using complex-aware gradients (i.e., Wirtinger-style differentiation) inside its autodiff engine (tf.GradientTape). At domain boundaries (tf.complex, tf.real, tf.abs, tf.angle, etc.) TensorFlow uses registered complex gradients when available; if an op lacks complex support you’ll get None and must provide a custom gradient or split into real/imag. In practice the gradient that reaches your mapper will be a complex-typed value (or real pieces if the weights are stored as separate real/imag variables) and optimizers update the real/imag components accordingly.

How TensorFlow’s autodiff computes complex gradients
Chain rule across mixed real/complex domains
Domain boundaries: complex→real and real→complex ops
Practical caveats & recommended practices for TensorFlow complex gradients
Minimal end-to-end TensorFlow example
Sources
Conclusion

How TensorFlow’s autodiff computes complex gradients

TensorFlow’s automatic differentiation (tf.GradientTape) records the forward operations and then walks them backwards, applying each op’s registered gradient rule to accumulate gradients — the same mechanism works for complex dtypes as for real dtypes. The official guide explains the recording/backprop mechanism in general and shows that ops executed inside a GradientTape are replayed in reverse to compute gradients [https://www.tensorflow.org/guide/autodiff]. TensorFlow also exposes complex dtypes (complex64/complex128) and functions to build complex tensors (e.g., tf.complex) so complex-valued graphs are representable as first‑class tensors [https://www.tensorflow.org/api_docs/python/tf/dtypes/complex].

At the implementation level TensorFlow’s eager backprop machinery treats complex tensors as differentiable types and relies on gradient registrations for each op; see the backprop source where complex/real dtypes are considered differentiable [https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/eager/backprop.py]. Academic and community work confirms the conceptual approach: frameworks compute complex gradients using Wirtinger calculus (treating z and z* as independent) when needed, and TensorFlow’s complex-autodiff behaviour has been described in the literature [https://arxiv.org/pdf/2312.06087, https://arxiv.org/pdf/2302.08286].

So: TensorFlow does not “throw away” the imaginary part silently. It propagates gradients using complex-aware rules when ops provide them; otherwise you must intervene.

Chain rule across mixed real/complex domains

The math you learned for the chain rule extends — but you must express derivatives in the complex plane using Wirtinger derivatives. Let z = x + i y. Define the Wirtinger operators

\frac{\partial}{\partial z}=\frac{1}{2}\Big(\frac{\partial}{\partial x}-i\frac{\partial}{\partial y}\Big),\qquad \frac{\partial}{\partial z^*}=\frac{1}{2}\Big(\frac{\partial}{\partial x}+i\frac{\partial}{\partial y}\Big).

For a real scalar loss L that depends on complex tensors (L = L(z,z*)), the chain rule for a parameter θ that influences z is

\frac{dL}{d\theta} \;=\; \frac{\partial L}{\partial z}\frac{dz}{d\theta} \;+\; \frac{\partial L}{\partial z^*}\frac{dz^*}{d\theta}.

Why two terms? Because most useful functions in communications are not holomorphic (they depend on both z and z*), so you must account for both partials. TensorFlow’s autodiff accumulates these contributions by applying each op’s registered gradient function. If the op’s gradient is implemented using Wirtinger rules (most standard complex-aware ops are), the contributions are correct and the resulting gradient, when projected to your trainable variables, yields the same real partial derivatives on the real and imaginary parts that you’d get by manual differentiation.

Reference: this is the standard approach used in complex-valued neural network theory and backpropagation analyses [https://pmc.ncbi.nlm.nih.gov/articles/PMC4012068/, https://papers.neurips.cc/paper_files/paper/2022/file/dc06d4d2792265fb5454a6092bfd5c6a-Paper-Conference.pdf].

Domain boundaries: complex→real and real→complex ops

How the gradient is propagated depends on the operation at the boundary. Below are common cases you’ll meet in an end‑to‑end comms stack.

Real → Complex (neural mapper outputs complex values)

Most mapper implementations produce two real channels (real part and imaginary part) and combine them with tf.complex(real, imag). The tf.complex construction is linear: if z = x + i y, the derivative of z wrt x and y is trivial (1 and i factors), so gradients back to the real-valued parameters that produced x and y flow normally. TensorFlow supports tf.complex and complex dtypes directly [https://www.tensorflow.org/api_docs/python/tf/dtypes/complex].

Example pattern:

Build two real-valued heads (Dense layers) → r, i
symbols = tf.complex(r, i)
This is the most compatible approach with standard Keras layers.

Complex → Real ops (tf.real, tf.abs, tf.angle, other measurements)

tf.real / tf.math.real and tf.math.imag are linear extracts of Re/Im; their gradients flow back to the corresponding components (they’re simple and well-behaved).
tf.abs (magnitude) and tf.angle are non-holomorphic. Using Wirtinger calculus you get closed-form partials; for example (z ≠ 0)

\frac{\partial |z|}{\partial z^*} = \frac{z}{2|z|}.

So tf.abs produces a real tensor and a complex-valued gradient that points along the conjugate direction — TensorFlow will propagate that if tf.abs has a complex gradient registered (it does for standard TF ops). tf.angle has gradients too but beware of branch-cut behaviour and non-differentiability at z = 0; that can cause instabilities.

Loss (real) → complex-valued layers

When your final loss is real (like cross-entropy computed on real logits), the backward pass computes the appropriate Wirtinger partials upstream. Practically:

If a parameter is stored as separate real and imag variables (common when you implement complex layers using two real weights), you’ll see real gradients for each part.
If a parameter is a single complex tf.Variable (less common but supported), the gradient returned for that variable will be complex dtype. TensorFlow’s backprop machinery and optimizers handle complex dtypes by updating real/imag parts consistently (but check behavior if you implement custom optimizers).

If any op in the chain has no gradient for complex inputs you’ll get None for that path — that’s your signal to implement a gradient or change design.

(Background references: TF autodiff behavior and code-level handling of complex dtypes [https://www.tensorflow.org/guide/autodiff, https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/eager/backprop.py]; theoretical justification via Wirtinger calculus [https://arxiv.org/pdf/2312.06087].)

Practical caveats & recommended practices for TensorFlow complex gradients

Check op gradient support early. If tape.gradient(loss, var) returns None for some var, find the op that severs the path. That usually means the op lacks a complex gradient (or you forgot to mark a variable trainable).
If an op lacks complex gradients, either:
implement a custom gradient with @tf.custom_gradient, or
re-express the computation in real and imaginary channels (common practical workaround — see community examples) — e.g., feed tf.concat([tf.math.real(z), tf.math.imag(z)], axis=-1) into standard Keras layers. StackOverflow threads cover this pattern [https://stackoverflow.com/questions/47721615/how-to-backpropagate-with-complex-valued-weights, https://stackoverflow.com/questions/57108959/how-does-tf-gradients-manages-complex-functions].
Prefer numeric safety: ops like tf.abs divide by magnitude — add a small epsilon when you implement custom formulas to avoid NaN at z = 0.
Beware tf.angle and branch cuts. Angle’s gradient is valid away from the origin; avoid relying on angle in optimization unless you handle discontinuities.
Keras layer compatibility: most built-in Keras layers are real-valued. If you want true complex layers, either:
build layers from two real heads (my preferred), or
use community CVNN libraries that add complex layers/activations on top of TF (examples: cvnn, keras-complex) [https://github.com/NEGU93/cvnn, https://github.com/JesperDramsch/keras-complex].
Debugging tips:
Print dtypes and check for None gradients: grads = tape.gradient(loss, vars); for v,g in zip(vars,grads): print(v.name, g is None, None if g is None else g.dtype)
Do a small finite-difference test (perturb real and imag parts separately) to sanity-check an autodiff gradient for a particular tensor.
Optimizers: TensorFlow’s optimizers generally work with complex dtypes by keeping slot variables of matching dtype; training that “just worked” in your run suggests your chosen optimizer accepted complex gradients. If you run into surprising behavior, split weights into real/imag and update them separately.

Practical reading: community libraries and papers on CVNNs are useful because they gather tested layers and activation choices that make training stable [https://github.com/NEGU93/cvnn, https://arxiv.org/pdf/2302.08286].

Minimal end-to-end TensorFlow example

Below is a compact pattern that matches your pipeline: the mapper produces complex symbols from real inputs, channel is complex, demapper consumes complex signals by first converting to real/imag channels, and the loss is real. This demonstrates how gradients flow back to mapper parameters.

python

import tensorflow as tf

batch_size = 16
num_bits = 8
bits = tf.cast(tf.random.uniform([batch_size, num_bits], 0, 2, dtype=tf.int32), tf.float32)

# Mapper: two real heads -> complex symbols
class Mapper(tf.keras.Model):
 def __init__(self, symbol_dim):
 super().__init__()
 self.r_head = tf.keras.layers.Dense(symbol_dim)
 self.i_head = tf.keras.layers.Dense(symbol_dim)
 def call(self, x):
 r = self.r_head(x)
 i = self.i_head(x)
 return tf.complex(r, i) # complex64

# Demapper: take complex input, split to real/imag, then real nets -> logits
demapper = tf.keras.Sequential([
 tf.keras.layers.Lambda(lambda z: tf.concat([tf.math.real(z), tf.math.imag(z)], axis=-1)),
 tf.keras.layers.Dense(64, activation='relu'),
 tf.keras.layers.Dense(num_bits) # logits (real)
])

mapper = Mapper(symbol_dim=4)
opt = tf.keras.optimizers.Adam(1e-3)

# Single training step
with tf.GradientTape() as tape:
 symbols = mapper(bits) # complex64
 # Simple channel: complex gain + AWGN
 h = tf.complex(1.2, -0.3)
 noise = tf.complex(tf.random.normal(tf.shape(symbols), stddev=0.05),
 tf.random.normal(tf.shape(symbols), stddev=0.05))
 received = h * symbols + noise # complex64
 llrs = demapper(received) # float32 logits
 loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=bits, logits=llrs))

vars_all = mapper.trainable_variables + demapper.trainable_variables
grads = tape.gradient(loss, vars_all)

for v, g in zip(vars_all, grads):
 print(v.name, "dtype:", v.dtype, "grad is None?", g is None, "grad dtype:", None if g is None else g.dtype)

opt.apply_gradients(zip(grads, vars_all))

Notes about the example:

mapper here has real-valued weights (two Dense heads) so gradients to those weights are real. If you instead implement complex kernels (store real/imag weights and combine into a complex kernel inside a custom layer), TF will differentiate w.r.t. the real/imag weights; gradients will match the Wirtinger-derived partials.
If grads contains None for some variable, inspect the ops between that variable and the loss. That usually signals a missing complex gradient for an op or a non-differentiable operation (quantization, argmax, stop_gradient).

Sources

Conclusion

TensorFlow does backpropagation across mixed real and complex-valued functions by treating complex tensors as first-class dtypes and applying complex-aware gradient rules (consistent with Wirtinger calculus) via tf.GradientTape. Domain boundaries (tf.complex, tf.real, tf.abs, tf.angle, etc.) are handled by per-op gradient implementations; when an op lacks complex support you’ll see None and should either provide a custom gradient or express the computation in real/imag channels. In short: TensorFlow complex gradients work end‑to‑end for common communications pipelines (including Sionna channels), but watch for ops without complex gradients, numerical edge cases (|z|≈0, angle branch cuts), and Keras layer compatibility — and test by printing gradients and doing small finite-difference checks.

Authors

NeuroAnswers

Author

Verified by moderation