XCom (cross-communication) in Airflow stores return values in the metadata DB. Default limit is ~48KB — pushing large DataFrames or lists through XCom silently fails or causes DB bloat.

Fixes:

  1. Use object storage — write results to S3/GCS and pass only the path via XCom
  2. Use a custom XCom backend — configure xcom_backend in airflow.cfg to store in S3
  3. Restructure the DAG — avoid passing large data between tasks entirely

The root cause is usually treating XCom like a data pipe instead of a signaling mechanism. Keep XCom for small metadata (IDs, paths, status), not payloads.

← All TIL