[Visual Data Storytelling #31] The one book every data scientist should read

The answer eluded me for a long time, as data science is as wide as could be. Can one book answer it all?

What is a thing?

I'm often asked:

what's a book you'd recommend to all data scientists?

First, you'd need to understand what a data scientist is.

In recent years, I've had various roles, all of which fell under the same umbrella term of "data scientist."

  • I've done research, statistics, data viz, machine learning, web scraping ;

  • I've written reports, slides, websites, dashboards, papers… ;

  • I've used Pycharm, VSCode, notebooks, Excel, Powerpoint… ;

  • I've written code in Python, JS, SQL, R, Java… ;

  • I've had meeting with business execs and proximity stores cashiers… ;

  • I've worked with geopoliticians, computer network operators, climatologists, retailers...

There's a plethora of technical books that expertly describe each of these tasks.

The best books even come with snippets of code from the trendiest frameworks that can be executed directly in a notebook—whether for training logistic regression models, visualizing PCA, or other tasks.

But these works age, and the trendy frameworks become obsolete, replaced by newer tools, requiring us to relearn everything.

Thus, the only constant I've found in all the roles I've held is the necessity to question and re-question constantly.

This lesson, I had to learn over and over again.

  • Data science teaches us to correct for inherent biases;

  • Design teaches us to question our users' goals;

  • Epistemology teaches us that the path of science is paved with false opinions, and that language often misguides with its deceptive metaphors—river of time, I look at you [1].

  • Linguistics, or comparative grammar, teaches us to question the meaning of words;

So, when embarking on a data project, any data project, the first thing to do is to ask questions:

Who? How? Why?

Is there a book that teaches us to question like this?

There is one that has escaped the usual recommended reading lists for aspiring data scientists, and this is the book I'm going to summarize here.

*

This short book was published by one of the most influential philosophers of the 20th century, Martin Heidegger (1889–1976).

It's not his magnum opus, "Being and Time," which is still a reference, but a transcript of a lecture given at the University of Freiburg in 1935, which poses a question that was supposedly settled long ago to general satisfaction:

What is a thing?

Martin Heidegger

With this trivial question, Heidegger examines a concept that seems fundamentally useless.

Does a data scientist need to ask such a fundamental question that takes us beyond even the stars, considering that the "thingness" of a thing—the essence of being a thing—cannot be a thing itself?

One ends up far from stable ground, farther even than Thales, who, while contemplating the moon, fell into a well, a famous anecdote related by Plato in his "Theaetetus."

How dizzying philosophers can be.

From philosophy, nothing practical can be undertaken directly. You don't optimize a business KPI or an ROI with philosophy.

Philosophy is only a claim to knowledge.

But does it bring anything, this claim, other than dissatisfaction?

And what does it bring, this simple question: what is a thing?

*

Let's indulge in this exercise for a moment, and consider the thing, asking ourselves first what sense the word conveys.

In a narrow sense, is it the tangible thing at hand? Or is it anything that's a reflection, a plan, "are you doing anything tomorrow?"? Or is it anything that's somehow a little something that's not nothing?

The question, "what is a thing?" should not be confused with "what is a flint?" or "what is a fern?" or "what is a frog?", but rather what the flint is as a thing, what the fern is as a thing, what the frog is as a thing.

By questioning the thing, one questions the truth. But the truth has many faces.

To a shepherd, the sun rises, and it's time to take out the flock. But the astrophysicist might tell the shepherd that, in reality, the sun never rises or sets. So, who's right, the shepherd or the astrophysicist?

Is there a third truth between these two, a truth that reconciles them?

Things stand in various truths, which is why we must always question and re-question.

But the thing is not the sun. So, what's the truth of the thing?

*

This specific thing I point to, it exists in space, as I point it out.

This determination of the thing by "this," it indicates that we determine the thing by pointing it out. Instead of being a characteristic of the thing, "this" is merely a subjective addition from us, a subjective truth.

Ultimately, the truth of the thing in everyday experience—is it subjective, or objective?

This cup on my desk, for example, if it had an identical twin elsewhere, it would exist in a different place.

And if it were at the exact same place, it would be at a different time. Two distinct things cannot coexist in the same place at the same time.

It's the place and the time that make things that are absolutely similar these specific things, distinct things. The thingness of the thing is based on the essence of space and time.

But are space and time in the thing, or are they external additions to the thing?

We even say "space and time," with a handy conjunction, without asking what connects these two concepts. Einstein connected these concepts, and the hyphen in “space-time” is no less than General Relativity.

Let's try externalizing this truth, which seems so dependent on us—though we'd prefer it to be solely dependent on the thing—by writing on a piece of paper: "Here is the flint." Place the paper next to the flint. We have a truth that no longer depends on us.

But let a gust of wind blow the paper away, and whoever finds it miles away will wonder who could have written such a lie.

If we took a pocketknife and carved into a tree near the frog: "Now there's a frog," and came back to the tree in a few days, we'd wonder which princess's kiss turned frog invisible.

Strange truth that turns into untruth in a night or with a gust of wind...

*

The question: "What is a thing?" has only one power; to awaken, perhaps, what has fallen asleep; to correct what has sunk into confusion.

Every question carries in it fundamental positions inherited from history.

It's not so much the definition of the essence of the thing that this question poses; it's the question pointing toward these fundamental positions, what they intend to become (Heidegger speaks of "advent").

"The era is guilty if our eyes don't open."

Martin Heidegger

So, data scientists from all backgrounds, rush to your local bookstore to read: "What Is a Thing?" by Martin Heidegger (1935) (in its English version).

There's still everything left to invent.

[1] Etienne Klein, "What Do We Know About Time?" (in French, YouTube, 1h18) https://www.youtube.com/watch?v=NDYIdBMLQR0&t

If you think philosophy is a skill that will never be replaced by generative AI, tell me in the comments about a philosophy book that inspired you!

Reply

or to participate.