Programming

Preserve All Digits: Polars Read Excel to Utf8

Learn how to preserve up to 12 decimal digits when reading Excel with Polars to string (Utf8). Fix truncation using xlsx2csv_options, infer_schema_length=0, schema_overrides, or openpyxl engine for exact precision.

1 answer 1 view

How can I preserve all digits (up to 12 decimal places) when reading an Excel column with Polars and casting it to String (Utf8)?

I’m reading an XLSX file uploaded from a React frontend to a FastAPI backend. On the frontend, I use the xlsx library to read and display the data as JSON:

js
reader.onload = (evt) => {
 const data = new Uint8Array(evt.target.result);
 const workbook = XLSX.read(data, { type: 'array' });
 const sheetName = workbook.SheetNames[0];
 const sheet = workbook.Sheets[sheetName];
 const jsonData = XLSX.utils.sheet_to_json(sheet, { raw: true });
 setTableData(jsonData);
};

The numbers in the table have up to 12 decimal digits, e.g. 283.528580810862.

I then upload the file to my FastAPI backend using Axios:

js
const formData = new FormData();
formData.append("file", file, file.name);

const response = await axios.post(`${API_URL}/data/upload`, formData, {
 headers: {
 Authorization: `Bearer ${token}`,
 'Content-Type': 'multipart/form-data'
 },
});

On the backend, I read the Excel file with Polars, attempting to cast one column to string:

py
def parse_user_data(contents: bytes):
 buffer = BytesIO(contents)

 df_raw = pl.read_excel(
 buffer,
 has_header=False,
 schema_overrides={"column_5": pl.Utf8},
 ).with_columns(pl.col(pl.Utf8).str.to_lowercase())

 print(df_raw.select(pl.col("column_5")).head(20))

However, the printed output truncates the value to 283.528580811.

To my understanding, casting the column to String/Utf8 should preserve the exact digits as displayed on the frontend. How can I read Excel with Polars in a way that preserves all decimal digits as a string (Utf8)?

You can preserve all digits by preventing Polars from converting Excel cells to Float64 before you cast to Utf8 — either tell the default xlsx2csv engine to ignore numeric formats and force the column to Utf8 (plus disable inference), or read with engine=‘openpyxl’ which preserves cell text. Both approaches stop the intermediate Float64 rounding that produced 283.528580811 and will keep values like 283.528580810862 as the exact string. After reading as Utf8 you can optionally parse to a fixed-scale decimal type if you need numeric math.


Contents


polars read excel — why digits get lost

Polars’ default Excel path converts the sheet to CSV (via xlsx2csv) and then parses that CSV with read_csv, which performs dtype inference. The conversion+parsing pipeline can coerce high-precision numbers into Float64 — and once a value is a Python/IEEE float the printed representation and internal precision can differ from the Excel-displayed digits (Python floats give ~15 significant digits and may round or reformat values like 283.528580810862 to 283.528580811). See the Polars read_excel notes about the xlsx2csv engine and conversion flow for details: polars.read_excel — Polars documentation and the Excel IO guide: Excel - Polars user guide.

Why doesn’t casting to Utf8 after reading fix it? Because the rounding already happened during the CSV/Float64 phase. Casting a Float64 to string just formats the (already rounded) float — it doesn’t recover digits that were lost earlier. The fix is to stop numeric inference at read time so the column never becomes Float64.


polars read excel & polars string — concrete fixes to preserve decimals as Utf8

You have two practical, reliable options that preserve the full decimal text:

  • Option A (fast): keep the default xlsx2csv engine but tell the converter not to treat numeric cells as floats, and disable dtype inference so Polars reads the column as Utf8.
  • Option B (robust): use engine=‘openpyxl’ so Polars reads cell text directly (no CSV intermediate).

Both are shown below.

Option A — keep xlsx2csv but disable numeric inference

Why this works: xlsx2csv does the Excel→CSV conversion. If you pass xlsx2csv_options={‘ignore_formats’: [‘float’,‘number’]} you prevent numeric-format interpretation in that step; combined with infer_schema_length=0 and schema_overrides you force Polars to keep the column as Utf8.

Example (BytesIO from your FastAPI upload):

py
from io import BytesIO
import polars as pl

def parse_user_data(contents: bytes):
 buffer = BytesIO(contents)

 df = pl.read_excel(
 buffer,
 has_header=False, # keep what you currently use
 engine="xlsx2csv", # explicit but this is the default
 infer_schema_length=0, # disable dtype inference
 schema_overrides={"column_5": pl.Utf8}, # force the target column to string
 xlsx2csv_options={"ignore_formats": ["float", "number"]}, # critical: read numeric cells as text
 )

 # If you intended to lowercase that string column, target it by name:
 df = df.with_columns(pl.col("column_5").str.to_lowercase())

 # Inspect raw strings:
 print(df["column_5"][:20].to_list())
 return df

Notes:

  • infer_schema_length=0 disables Polars’ schema guessing so schema_overrides takes effect cleanly; see the GitHub issue discussion about inference causing truncation: https://github.com/pola-rs/polars/issues/18612.
  • Make sure xlsx2csv is available in your environment (Polars uses it for the default engine).
  • This approach preserves exactly what xlsx2csv emits as the cell text, so values like ‘283.528580810862’ remain intact.

Option B — use engine=‘openpyxl’ to read cell text directly

openpyxl reads workbook cells and preserves formatted text without the CSV round-trip. It’s a bit slower but straightforward and reliable for precision-critical data.

py
from io import BytesIO
import polars as pl

def parse_user_data_openpyxl(contents: bytes):
 buffer = BytesIO(contents)

 df = pl.read_excel(
 buffer,
 engine="openpyxl", # direct read of the xlsx file
 has_header=False,
 schema_overrides={"column_5": pl.Utf8}, # still useful if you want a string column
 )

 print(df["column_5"][:20].to_list())
 return df

Install openpyxl if you don’t have it: pip install openpyxl. This path is simple: no CSV, no early Float64 inference, so you’re safe for exact text.


Verifying results and optional Decimal parsing

Quick checks you can run after reading:

  • Confirm the backend string equals the frontend text:
py
s = df["column_5"][0] # first value as Python string
assert s == "283.528580810862"
  • If you need to do numeric math without floating-point rounding, parse the Utf8 into a fixed-scale decimal type rather than Float64. Polars supports parsing strings to decimal-like types; for example (adjust precision/scale to your needs):
py
df = df.with_columns(
 pl.col("column_5")
 .str.to_decimal(precision=30, scale=12)
 .alias("column_5_decimal")
)

Parsing from the Utf8 string is safe because no precision was lost during the read step. Contrast that with trying to format or increase precision after a Float64 has already been created — you can’t recover the original digits.

Also: you might see suggestions to use pl.Config.set_float_precision(12). That only affects how floats are displayed; it does not restore digits lost by earlier conversion to Float64. See discussions about display vs stored precision: https://stackoverflow.com/questions/75121315/float-decimal-point-display-setting-in-polars.


Common pitfalls and quick troubleshooting

  • Casting after inference: calling .with_columns(pl.col("column_5").cast(pl.Utf8)) or similar after Polars already inferred Float64 won’t recover digits. Prevent inference instead.
  • Wrong schema key: if your file has headers, use the header name in schema_overrides; when has_header=False Polars uses names like “column_1”, “column_2”, etc.
  • Typo in with_columns: your snippet used .with_columns(pl.col(pl.Utf8).str.to_lowercase()) — that’s incorrect. Target the column by name: .with_columns(pl.col("column_5").str.to_lowercase()).
  • Missing engine packages: install openpyxl for engine=‘openpyxl’ or xlsx2csv if you rely on the default converter.
  • Mixed-type columns: if a column contains some numbers and some text, disabling inference (infer_schema_length=0) plus schema_overrides keeps everything as string; then handle parsing explicitly.

Sources


Conclusion

To preserve all digits (up to 12 decimals) when reading an XLSX column with Polars, stop numeric inference at read time and read the column as Utf8: either pass xlsx2csv_options={‘ignore_formats’: [‘float’,‘number’]} with infer_schema_length=0 plus schema_overrides, or use engine=‘openpyxl’ to read text directly. After that, parse to a fixed-scale decimal type if you need numeric operations — but don’t rely on casting to string after a Float64 has already been created. This approach ensures polars read excel returns exact polars string values like “283.528580810862”.

Authors
Verified by moderation
Moderation
Preserve All Digits: Polars Read Excel to Utf8