Category: Python

  • Managing Python Environments: pyenv and uv Tutorial (Data Science Engineering Gap Part 1)

    Managing Python Environments: pyenv and uv Tutorial (Data Science Engineering Gap Part 1)

    This is the second post in a series about bridging the gap from beginner programmer to advanced data science practitioner. These aren’t programming concepts – they’re software engineering practices that enable you to build robust, maintainable systems.

    How to Fix the “Works On My Machine” Problem in Python

    You’ve written some Python code that works perfectly on your laptop. You share it with a colleague, and suddenly nothing runs. Or worse – you come back to your own project from last year, and it’s completely broken. Python has been updated, some packages followed the new version, others didn’t, and your carefully crafted solution is now a pile of import errors.

    This isn’t a hypothetical scenario. It’s the daily reality of working with Python without proper environment management.

    I’ve seen this play out in painful ways. A colleague once spent hours trying to figure out why a package was running slowly, only to discover that the original implementation used PyPy (a super-fast Python implementation), but nobody had documented this crucial detail. Another project mysteriously failed because one developer used conda’s Python, another used the system Python, and a third had installed vanilla Python from python.org. Same code, three different Python installations, three different sets of problems.

    The fundamental issue: Python isn’t just Python. There are different versions (3.10, 3.11, 3.12), different implementations (CPython, PyPy, Jython), and countless package versions that may or may not work together. Without managing these variables explicitly, reproducibility becomes impossible.

    (more…)
  • Rust: Python’s New Best Friend – A Data Scientist’s Journey

    Rust: Python’s New Best Friend – A Data Scientist’s Journey

    As Python continues to dominate data science, a quiet revolution is happening underneath the surface. Increasingly, Rust is powering our most critical Python tools—bringing unprecedented performance while maintaining the Python interface we know and love. This hybrid approach transforms our work as data scientists, enabling rapid development and production-grade performance.

    My journey with Rust began six years ago as a distant curiosity. I heard the name in conference talks and saw it climbing GitHub’s language popularity charts, but it remained just another programming language on my “maybe someday” list.

    (more…)
  • Why Probabilistic Programming? A Journey Through the Monty Hall Problem

    Why Probabilistic Programming? A Journey Through the Monty Hall Problem

    Even brilliant minds can be led astray by probability puzzles. When presented with the Monty Hall Problem, renowned mathematician Paul Erdős initially rejected the correct solution – and he wasn’t alone. Thousands of readers, including PhDs in mathematics and statistics, wrote angry letters to Marilyn vos Savant when she published the correct solution in Parade magazine. Their passionate resistance reveals something fascinating about how humans reason about uncertainty.

    To explore these ideas hands-on, we’ve created a Jupyter notebook that implements both traditional and probabilistic programming approaches to the Monty Hall Problem. The notebook includes code for simulating the game, modeling player behavior, and analyzing how people learn from experience.

    (more…)
  • Introducing chronowords: A Python Package for Diachronic Word Embeddings

    Introducing chronowords: A Python Package for Diachronic Word Embeddings

    We’re excited to announce the release of chronowords, a Python package designed to facilitate the analysis of semantic change in text over time. Through our research, we frequently encountered the need for temporal text analysis, which led us to develop this package to make diachronic (time-based) word embedding analysis more accessible.

    (more…)