GrumpyStack's Newsletter
Posts
🐱 Docker Series + Spark Prep = Happy Containers (Mostly)

🐱 Docker Series + Spark Prep = Happy Containers (Mostly)

Grumpy Stack
June 13, 2025

Welcome to this week’s GrumpyStack post. I hope you had an amazing week.

If you’ve ever copy-pasted a docker run and prayed it worked, this is for you.
I’ve kicked off a Docker tutorial series and four solid episodes are already live.

Whether you're a data engineer, backend dev or just someone trying to stop losing containers to the void, you’re in the right place.

🧠 Wait, what’s Docker again?

Docker is a tool that lets you package up your app (code, dependencies, weird Linux quirks and all) into neat, portable containers that run anywhere. It's like a zip file with superpowers (and a runtime). It also avoids the famous “it works on my machine” effect that we all go through at least once in our life (usually on a Friday at 6pm).

🐳 Docker Series So Far:

Docker 101: What Are Containers and Why They Matter
A plain breakdown of why containers are more than just hype (and what problems they actually solve).
Docker Under the Hood: The Linux Magic Behind Containers
Spoiler: Docker is just Linux wearing sunglasses. We dive into namespaces, cgroups and other kernel party tricks.
Docker Fundamentals: Your First Steps with Containers
Learn the commands, build your first image, run a container without feeling like you’re defusing a bomb.
Docker Data Persistence Explained (and How to Stop Losing Your Data)
Volumes, bind mounts and why your PostgreSQL container keeps forgetting stuff.

Click here to to start your Docker journey!

✨ Bonus article: Mastering Apache Spark for Data Engineering Interviews

I also published a new article that’s part cheat sheet, part survival guide.

This one’s for anyone who’s stared at a whiteboard being asked about data skew, joins, or caching, while their brain quietly reboots.

We’ll cover in the article:

🔥 Main Apache Spark concepts
🔀 Broadcast vs shuffle joins (and when they stab you in the back)
🧠 Memory, partitions, and window functions
⚠️ Common mistakes + how to sound like you’ve actually used Spark

It’s based on real prep I did for an interview.
(They asked zero Spark questions. Sadge.)

👉 Read it here

🔗 7 Real-World Python Project Ideas That Solve Actual Business Problems

Great article from Raphael Schols in which you can find 7 great Python Project ideas with details:

Automating Excel Reports
Auto-Generate PowerPoint Presentations
Extract Structured Data from PDFs or Images
Automate Email File Extraction
Create a Self-Service Analytics Portal
Build a Scalable Web & Cloud Data Pipeline
Build a Natural Language SQL Assistant

For each project, he details the problem, the solution and why it’s a valuable project to get better in data world.

🔗 Why Cron Jobs Are Dead — And CDC Is the Killer

I don’t agree with the conclusion of the article (“Cron jobs belong to the past. CDC is how you future-proof your backend.”), however it introduces what is CDC (Change Data Capture). Basically a CDC is a system that is connected to a database to know the data changes without polling it every time.

It is indeed a revolution and a really powerful tool. However, Cron Jobs will be the rule just like jQuery & PHP are still ruling on most websites even if many great libraries were supposed to take the control over them.

Also Cron Jobs are still super useful for their simplicity to set up.

That’s it for this week!

Thanks to all the new subscribers and thanks for reading this.

👉 Got feedback or an idea for the next issue? Just hit reply, I read everything.

I read everything, and if your idea makes it into the next post, you'll get a shout-out (or at least a virtual high-five from GrumpyStack 🐾).

Thanks again for being here — it's genuinely appreciated.

Have ideas or questions? Just hit reply — I might feature your thoughts in the next issue.

Thanks for reading — Pierre (a.k.a. GrumpyStack)
📬 Reply anytime — I might feature you next.