C nan(tagp) string rationale and IEEE 754 payloads
Why C's nan(const char *tagp) accepts a string: WG14 rationale, how tagp maps to IEEE 754 QNaN payloads, common parsing strategies and portability implications.
Why does the C standard specify the nan functions to accept a string tag (for example, double nan(const char *tagp);) to encode a quiet-NaN payload instead of an integer argument (for example, double nan(unsigned long int tag))? What was the rationale for the string-based interface, how is the tagp string parsed into a QNaN payload, and what are the portability and implementation implications (payload size, endianness, parsing complexity, and behavior when quiet NaNs are unsupported)?
The C library uses nan(const char *tagp) so implementations can map a flexible, human-readable string into an implementation-defined quiet-NaN payload while following existing C string-based API conventions; this avoids hard-coding a fixed integer width or byte layout into the standard. Under IEEE 754 the NaN payload bits are not prescribed by the standard, so tagp is parsed by implementations (commonly as hex/decimal numbers or by packing/hashing ASCII bytes) into the available payload bits — but payload width, canonicalization, and byte-ordering are implementation-defined, so relying on payload contents across platforms is non‑portable and some implementations fall back (or canonicalize) when quiet NaNs aren’t supported.
Contents
- Why C chose a string for nan(tagp)
- NaN payload and IEEE 754 background
- C standard rationale and the WG14 decision (N1957)
- How implementations parse tagp into a QNaN payload
- Implementation strategies, examples and NaN‑boxing
- Portability: payload size, endianness, parsing complexity, and no‑QNaN behavior
- Practical advice and best practices
- Sources
- Conclusion
Why C chose a string for nan(tagp)
Short answer: a string keeps the interface flexible and consistent with other C library functions that accept textual arguments, and it lets implementations decide how to encode diagnostic information into the limited and format‑dependent NaN payload. The WG14 working note explicitly argues that the tag argument should be a pointer to a string “analogously to all other library functions that operate on strings” so library authors and callers can use a familiar textual convention rather than bind the API to a particular integer size or layout [https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1957.htm]. POSIX follows the ISO C semantics for nan/nanl and similarly treats the tag as a character sequence that the implementation interprets [https://pubs.opengroup.org/onlinepubs/9699919799/functions/nanl.html].
Why not an integer? An integer argument would force callers and implementers into a particular width/encoding (what size? where do the bits go?) and would be a poor fit with the rest of the C library’s string-oriented text interfaces. A string is source-code friendly (you can write nan("0xDEAD") or nan("foo")), and the standard deliberately leaves the exact mapping into payload bits up to implementations so they can adapt to hardware and ABI constraints.
NaN payload and IEEE 754 background
IEEE 754 provides space for multiple NaN encodings and allows NaNs to carry a payload — extra bits in the significand that have no defined arithmetic meaning but can encode diagnostic info. The standard leaves payload semantics implementation-defined: the sign bit and a set of fraction bits are available to encode a payload, and the most-significant fraction bit is commonly used to distinguish quiet vs signaling NaNs while the remaining fraction bits form the payload [https://www-users.cse.umn.edu/~vinals/tspot_files/phys4041/2020/IEEE Standard 754-2019.pdf; https://en.wikipedia.org/wiki/NaN].
Typical payload capacities (common binary formats):
- binary32 (float): fraction = 23 bits; with a quiet/signaling distinction the usable payload is typically ~22 bits.
- binary64 (double): fraction = 52 bits; usable payload is typically ~51 bits.
Those numbers are “typical” because hardware, compiler runtimes, or libc implementations may reserve or canonicalize some bits.
C standard rationale and the WG14 decision (N1957)
WG14 recorded the decision to require tagp to point to a character string primarily to follow existing library conventions and to leave the encoding flexible for implementations [https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1957.htm]. POSIX mirrors ISO C on this point and documents that nan/nanl accept a char sequence that implementations interpret [https://pubs.opengroup.org/onlinepubs/9699919799/functions/nanl.html]. The standard intentionally does not mandate a numeric encoding or bit layout.
An important consequence: since the standard leaves encoding up to implementations, behavior can vary. Some references and implementation notes also point out a fallback: if an implementation cannot produce a quiet NaN at all, it may return zero or otherwise fail to provide an implementation-specific payload (practical behavior varies by implementation) [https://stackoverflow.com/questions/79866918/the-nan-functions-what-is-the-rationale-for-choosing-a-string-based-interface-o].
How tagp is parsed into a QNaN payload
There is no single parsing rule guaranteed by the standard — implementations pick a strategy. Common approaches you’ll see in practice:
- Numeric parse: if the string looks like a number (often supporting
0xhex), callstrtoull(tagp, NULL, 0)(or equivalent) and insert the low-order bits into the payload. Example usesnan("0xF")to produce a payload with value 0xF [https://en.cppreference.com/w/c/numeric/math/nan]. - Pack ASCII: take 1…N bytes of the ASCII characters from
tagpand pack them into the available payload bits (possibly truncating or dropping excess bytes). This letsnan("ERR")encode a short textual tag. - Hash/truncate: compute a small hash (e.g., CRC or truncated SHA1) of the string and store the truncated hash in the payload, useful when the payload width is smaller than the tag.
- Hybrid: try numeric parse first; if that fails, fall back to ASCII packing or hashing.
Implementation example (toy sketch — illustrates the idea, not standard code):
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
double nan_from_tag(const char *tagp) {
if (!tagp || !*tagp) return nan("");
unsigned long long v = strtoull(tagp, NULL, 0); // try numeric parse
if (v == 0 && tagp[0] && !(tagp[0] >= '0' && tagp[0] <= '9')) {
// pack ASCII into v (up to 7 bytes for double payload example)
v = 0;
for (size_t i = 0; i < 7 && tagp[i]; ++i)
v = (v << 8) | (unsigned char)tagp[i];
}
// mask to payload width and set canonical qNaN bit (example for binary64)
const unsigned long long payload_mask = ((1ULL<<51) - 1); // example: 51-bit payload
const unsigned long long qnan_bit = 1ULL << 51;
unsigned long long bits = (0x7FFULL << 52) | (qnan_bit | (v & payload_mask));
double d;
memcpy(&d, &bits, sizeof d);
return d;
}
To inspect payloads many examples use nan("1"), nan("0xF") and then memcpy the double into an integer to print the bit pattern; cppreference shows this exact kind of demonstration [https://en.cppreference.com/w/c/numeric/math/nan].
Implementation strategies, examples and NaN‑boxing
Different projects use the payload for different purposes:
- Diagnostics: encode small provenance IDs (which function produced the NaN) so logs can show where an invalid value came from.
- NaN‑boxing (used by some JS engines): use the payload bits in a double to store pointer/type tags when representing multiple runtime types in 64 bits; see the JavaScriptCore/V8-style descriptions in blog posts such as “The Secret Life of NaN” [https://anniecherkaev.com/the-secret-life-of-nan].
- Intermediary metadata: temporarily stash small state values in payload bits when moving data in-memory within a controlled runtime.
Two pitfalls implementers and users must watch:
- Canonicalization: some runtimes or instruction sets canonicalize NaNs on certain operations or on store/load across different ABIs; that destroys or replaces payload bits. IEEE 754-2019 discusses canonical encodings and allows implementations to choose canonicalization policies [https://www-users.cse.umn.edu/~vinals/tspot_files/phys4041/2020/IEEE Standard 754-2019.pdf].
- ABI/HW interaction: compiler optimizations, library routines, or hardware conversions may clear or alter payloads.
Because of these issues, many implementations either document their payload encoding or restrict payload use to internal diagnostics.
Portability: payload size, endianness, parsing complexity, and no‑QNaN behavior
Payload size and layout:
- The number of bits available for payload depends on the floating format (single/double/extended) and whether one fraction bit is reserved for the quiet/signaling indicator. Typical usable sizes: ~22 bits for binary32, ~51 bits for binary64, but exact numbers depend on format and implementation [https://en.wikipedia.org/wiki/IEEE_754; https://stackoverflow.com/questions/19800415/why-does-ieee-754-reserve-so-many-nan-values].
- Implementation-defined variation means you can’t rely on a fixed payload width across compilers, OSes, or hardware.
Endianness:
- If an implementation packs ASCII bytes into the payload via memcpy or byte shifts, the way bytes map into the 64-bit representation will depend on endianness and packing order. That makes the numeric interpretation of the payload or the textual reconstruction non-portable between little-endian and big-endian platforms.
Parsing complexity and invalid tags:
- Implementations must handle non-numeric or oversized tags (truncate, hash, error out, or treat as empty). POSIX and ISO require
tagpto be a pointer to a sequence of characters; how nonconforming values are handled is left implementation-dependent [https://pubs.opengroup.org/onlinepubs/9699919799/functions/nanl.html].
Behavior when quiet NaNs are unsupported:
- Some environments that lack support for quiet NaNs or that choose not to expose payloads may return zero or a canonicalized NaN; practical behavior varies and is implementation-specific — do not assume a particular fallback across systems [https://stackoverflow.com/questions/79866918/the-nan-functions-what-is-the-rationale-for-choosing-a-string-based-interface-o; https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1957.htm].
Portability summary: because payload encoding, size, and preservation are all implementation-defined (and sometimes lossy), using NaN payloads for cross-platform protocol or persistent storage is brittle. Analyses of NaN propagation and portability warn strongly that binary NaN payloads are not portable across systems [https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/nan-propagation.pdf].
Practical advice and best practices
- Don’t rely on NaN payloads for program logic that must work across different compilers, OSes, or machines. Use explicit metadata channels (struct fields, tagged unions, separate logs) for portable designs.
- Use
nan(tag)only for local diagnostics where you control both writer and reader (e.g., debugging within a single runtime). - If you must use payloads in a controlled environment (NaN‑boxing), document the encoding rigorously and test for preservation (generate several distinct tagged NaNs and read back the bit patterns across your target toolchain and platforms). Example test pattern:
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <inttypes.h>
#include <math.h>
int main(void) {
double a = nan("0x1");
double b = nan("0x2");
uint64_t A, B;
memcpy(&A, &a, sizeof A);
memcpy(&B, &b, sizeof B);
printf("a bits = 0x%016" PRIx64 "\n", A);
printf("b bits = 0x%016" PRIx64 "\n", B);
return 0;
}
- Prefer
strtoull(..., base=0)parsing if you want numeric tag support (recognizes0xhex), and mask to the implementation’s payload width. But remember: masking and bit-setting assume a particular NaN layout (you should confirm it on your target platform first). - When creating NaNs for interchange (files, networks), do not rely on payloads — explicitly encode diagnostic fields in the interchange format.
Sources
- WG14 working note N1957 — “nan should take a string argument”: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1957.htm
- POSIX nanl documentation (follows ISO C): https://pubs.opengroup.org/onlinepubs/9699919799/functions/nanl.html
- cppreference — nan examples and notes: https://en.cppreference.com/w/c/numeric/math/nan
- cplusplus.com — nan function reference: https://cplusplus.com/reference/cmath/nan-function/
- IEEE Standard 754‑2019 (PDF): https://www-users.cse.umn.edu/~vinals/tspot_files/phys4041/2020/IEEE Standard 754-2019.pdf
- “The Secret Life of NaN” (NaN‑boxing blog): https://anniecherkaev.com/the-secret-life-of-nan
- Stack Overflow — “The nan functions: rationale for string-based interface”: https://stackoverflow.com/questions/79866918/the-nan-functions-what-is-the-rationale-for-choosing-a-string-based-interface-o
- Stack Overflow — “What is the char-sequence argument to NaN generating functions?”: https://stackoverflow.com/questions/30087061/what-is-the-char-sequence-argument-to-nan-generating-functions-for
- Stack Overflow — “Why does IEEE 754 reserve so many NaN values?”: https://stackoverflow.com/questions/19800415/why-does-ieee-754-reserve-so-many-nan-values
- Agner Fog — notes on NaN payload propagation / portability: https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/nan-propagation.pdf
- TutorialKart — nan() examples: https://www.tutorialkart.com/c-programming/c-math-nan/
- Wikipedia — NaN: https://en.wikipedia.org/wiki/NaN
- Wikipedia — IEEE 754: https://en.wikipedia.org/wiki/IEEE_754
Conclusion
The C API uses nan(const char *tagp) because a textual tag is flexible, matches C library string conventions, and leaves the actual bit-level encoding to implementations — which is necessary because IEEE 754 specifies NaN payloads but not a universal payload format. Implementations commonly parse tagp as hex/decimal or pack/hash ASCII into the available payload bits; however, payload width, byte ordering, canonicalization, and the fallback when quiet NaNs aren’t available are all implementation-defined. Bottom line: nan(tagp) is useful for local diagnostics or tightly controlled runtimes (e.g., NaN‑boxing), but you should not rely on NaN payload contents for portable or persistent program logic.