Stop the Garbage Before It Lands: A Deep Look at Databricks Labs DQX
A practical guide to DQX, the PySpark-native data quality framework from Databricks Labs that quarantines bad data before it reaches your gold tables.
Data & AI Architect
I integrate software and AI applications in enterprise settings, build digital products, and write about innovations in tech.
Writing
A practical guide to DQX, the PySpark-native data quality framework from Databricks Labs that quarantines bad data before it reaches your gold tables.
A technical deep dive into PaperBanana, an open-source agentic pipeline that generates publication-quality methodology diagrams and statistical plots from text.
Why Stanford's DSPy framework treats prompts as compiled artifacts, and how GEPA — the ICLR 2026 Oral — outperforms reinforcement learning with 35x fewer rollouts.
Exploring Microsoft Research's Data Formulator, the concept-binding paradigm, and how its AI agents remove the tidy-data tax from visualization authoring.
How the Open Service Broker API turned the N x M problem of platform-service integration into N + M, and how it fits alongside Kubernetes Operators today.
A walkthrough of turbovec, a Rust + Python vector index built on Google Research's TurboQuant algorithm — 16x compression, faster than FAISS, no codebook training.
How Microsoft Research's bocpy library brings deadlock-free, ownership-based concurrency to Python through cowns, behaviors, and CPython sub-interpreters.
How modern enterprises can share data while maintaining control and compliance