Category: data science

  • Managing Python Environments: pyenv and uv Tutorial (Data Science Engineering Gap Part 1)

    Managing Python Environments: pyenv and uv Tutorial (Data Science Engineering Gap Part 1)

    This is the second post in a series about bridging the gap from beginner programmer to advanced data science practitioner. These aren’t programming concepts – they’re software engineering practices that enable you to build robust, maintainable systems.

    How to Fix the “Works On My Machine” Problem in Python

    You’ve written some Python code that works perfectly on your laptop. You share it with a colleague, and suddenly nothing runs. Or worse – you come back to your own project from last year, and it’s completely broken. Python has been updated, some packages followed the new version, others didn’t, and your carefully crafted solution is now a pile of import errors.

    This isn’t a hypothetical scenario. It’s the daily reality of working with Python without proper environment management.

    I’ve seen this play out in painful ways. A colleague once spent hours trying to figure out why a package was running slowly, only to discover that the original implementation used PyPy (a super-fast Python implementation), but nobody had documented this crucial detail. Another project mysteriously failed because one developer used conda’s Python, another used the system Python, and a third had installed vanilla Python from python.org. Same code, three different Python installations, three different sets of problems.

    The fundamental issue: Python isn’t just Python. There are different versions (3.10, 3.11, 3.12), different implementations (CPython, PyPy, Jython), and countless package versions that may or may not work together. Without managing these variables explicitly, reproducibility becomes impossible.

    (more…)
  • The Data Science Engineering Gap: Part 0 – Your Development Environment

    The Data Science Engineering Gap: Part 0 – Your Development Environment

    This is the first post in a series about bridging the gap from beginner programmer to advanced data science practitioner. This transition isn’t just about learning more Python – it’s about adopting the software engineering practices and tools that enable you to build robust, maintainable systems.

    The Hidden Complexity of Professional Practice

    Here’s what nobody tells you about becoming an advanced data science practitioner: the hardest part isn’t mastering algorithms or learning new libraries. It’s developing the software engineering discipline that separates beginners from professionals.

    You can solve problems with Python. You understand pandas, numpy, and scikit-learn. You might even know some deep learning frameworks. But there’s still a massive gap between “I can write code that works” and “I can build systems that others can use, maintain, and extend.”

    This gap isn’t about programming knowledge – it’s about engineering practices. And honestly? It’s complex and takes time to master. We’re talking about a completely different skillset from the algorithmic thinking you’ve been developing. These are the practices that make the difference between code that works once on your machine and code that works reliably for everyone.

    (more…)
  • Navigating Industry Transitions: Books That Helped Me Lead Data Science Across Domains

    Navigating Industry Transitions: Books That Helped Me Lead Data Science Across Domains

    If you’re a data scientist stepping into leadership roles or moving between industries, this post is for you.

    Leading data science teams across different industries has taught me that technical expertise alone isn’t enough—each domain comes with its language, stakeholders, and business logic. Over the years, I’ve moved from enterprise search to fintech/regtech, and now to social media analytics for the FMCG sector. Each transition meant learning not just new technical challenges, but entirely different ways of thinking about business problems.

    As a head of data science, I’ve discovered that the most challenging part of these transitions isn’t adapting algorithms or learning new tools—it’s understanding how each industry operates and communicating effectively with stakeholders who have completely different backgrounds and priorities. Here are the books that became essential guides through these domain shifts.

    (more…)
  • Statistical Thinking as Philosophy: Essential Readings – Part I.

    Statistical Thinking as Philosophy: Essential Readings – Part I.

    “Philosophy of science without history of science is empty; history of science without philosophy of science is blind.” — Imre Lakatos

    Statistics isn’t just a collection of mathematical techniques—it’s a way of thinking about the world, addressing uncertainty, and drawing conclusions from incomplete information. As data scientists, machine learning engineers, and AI practitioners, we often apply statistical methods without reflecting on their theoretical foundations. Yet our work implicitly embodies philosophical stances about knowledge, evidence, and inference.

    This series presents foundational readings that shed light on the philosophical aspects of statistics. They are not intended to turn data practitioners into philosophers, but to offer accessible ways to reflect on the assumptions that underlie our daily work.

    (more…)
  • Rust: Python’s New Best Friend – A Data Scientist’s Journey

    Rust: Python’s New Best Friend – A Data Scientist’s Journey

    As Python continues to dominate data science, a quiet revolution is happening underneath the surface. Increasingly, Rust is powering our most critical Python tools—bringing unprecedented performance while maintaining the Python interface we know and love. This hybrid approach transforms our work as data scientists, enabling rapid development and production-grade performance.

    My journey with Rust began six years ago as a distant curiosity. I heard the name in conference talks and saw it climbing GitHub’s language popularity charts, but it remained just another programming language on my “maybe someday” list.

    (more…)