BANANDRE
NO ONE CARES ABOUT CODE

Navigation

HomeCategories

Categories

Artificial Intelligence(406)
Software Development(213)
Software Architecture(190)
Data Engineering(110)
Engineering Management(56)
Enterprise Architecture(35)
Product Management(27)
tech(1)

Tagged with

#pyspark

3 articles found

Data Modeling Is Killing Your PySpark Performance, Not Join Optimization
dag-complexity
Featured

Data Modeling Is Killing Your PySpark Performance, Not Join Optimization

A technical post-mortem on why a 50k-row table brought down a Databricks cluster, exposing the dangerous gap between software engineering instincts and distributed data architecture

#dag-complexity#data-modeling#databricks...
Read More
The Persistent Nightmare of Datetime Handling in Data Engineering
aws-glue

The Persistent Nightmare of Datetime Handling in Data Engineering

Despite decades of computing progress, datetime formatting remains a major pain point for data engineers, leading to bugs, pipeline breaks, and widespread frustration across systems and timezones.

#aws-glue#data-engineering#datetime...
Read More
JSON in PySpark: The Performance Trap You’re Probably Falling For
data-engineering

JSON in PySpark: The Performance Trap You’re Probably Falling For

Why writing large PySpark DataFrames as JSON to S3 is fundamentally flawed – and what you should do instead

#data-engineering#json#pyspark...
Read More
BANANDRE
NO ONE CARES ABOUT CODE

Connect

2026 BANANDRE
Privacy PolicyTermsImpressum
Built with 🍌