The main reason we collect and analyse data is to extract information. And data science is the new branch of science that is in charge of collecting all those methods, algorithms and systems which can be used to gain knowledge from collected data. But what exactly is knowledge?

Historically many philosophers dedicated their attention to understand what is knowledge, how it can be acquired and how much we can know. This gave birth to that branch of philosophy known as epistemology, literally meaning discourse on knowledge, that investigates the origin, limits, nature and methods of human knowledge.

Some philosophers think that we acquire knowledge though experience, relying on our senses, a doctrine known as empiricism. Its founder was the great British philosopher John Locke. In his book Essay Concerning Human Understanding he states that there are no such things as innate ideas and that the mind of a new born child is like a white paper, void of all character, and lacking any ideas; Ideas are derived by sensation and perception, therefore none of our knowledge can exist before experience. Our senses are the only source of information of external reality, therefore our knowledge is limited by our perception of the world.

The Irish philosopher George Berkeley, to whom the eponymous University is dedicated, pushed forward Locke’s arguments about the limits of knowledge stating that material substance cannot be proven to exist at all. Since what we can apprehend is limited by our consciousness, we can only have an indirect sensory experience of things. A table for example, as far as we are concerned, is just the view of the light reflected from its surface reaching our eyes, its hardness when touched and its noise when we knock on it. Material things exist only when experienced. This can lead to weird scenarios. Imagine for example what happens when we open the door of our kitchen fridge. We see a light turning on and all our food on the shelves. But how can we be sure the light goes off when we close the door, or that the food is still there when we don’t watch it? If reality exists only through experience we have no reason to believe that the fridge content exists when we don’t observe it. Another famous example is of the tree falling in the middle of a forest where nobody is there to observe it. Does it make any noise? Did it actually fall? Berkeley’s doctrine, known as immaterialism, is in my opinion very fascinating but probably too extreme. It seems quite reasonable that material things keep on existing when we don’t observe them. Or do they?

Imagine this situation: a chicken receives food from the farmer every day at the same hour. This makes the chicken think that every time the man arrives he will get his meal. But one day the man comes and kills the chicken to cook it. The chicken had no logical reason to believe that the man would always bring him food, therefore he shouldn’t be surprised that the farmer killed him. Similarly, we shouldn’t be surprised if the analysis we performed on our set of data gives unexpected results that don’t make any sense at all!

This is more or less the opinion on knowledge held by the most important of the empiricists, the Scottish Philosopher David Hume. While agreeing with Locke that we can acquire knowledge only through experience, he realised that this leads to a big problem. When we observe the pan of a weighing scale being pulled towards the floor after an object is placed on it, we might think that this happens because the weight of the object is exerting a force on the plate, and so moving it. But just because we saw two events happening in sequence that doesn’t mean that there is a relationship of cause and effect between them. Experience is not enough to justify a relation of cause and effect. Hume’s critic of the principle of causality leads to a lot of troubles for science, and more than anything on data science. Causal relationships help us to see a structure to the physical world rather than a collection of separate, unrelated events, but they might be just an illusion of our perception, an habit acquired after having seen many events behaving in the same way; an unjustified expectation. Denying causality means we have no way to describe the physical world. Bertrand Russel described Hume’s doctrine as following:

  1. When we saw that A causes B, all that we have a right to say is that, in past experience, A and B have frequently appeared together or in rapid succession
  2. No matter how many instances of the conjunction of A and B we might have observed, that gives no reason for expecting them to be conjoined again in future.

Just because something has been observed many times in the past, it doesn’t mean that will always happen. Therefore deriving a general rule from a series of individual observations is not logically valid. Right, so what’s the point in collecting and analysing data then? All that Hume suggests is that induction by simple enumeration is not a valid form of argument. Since we cannot be sure that B will happen after A we cannot say that A is the cause of B. He would then say that since we cannot rely on causality the only result we can get out of inductive methods is a probability.

The problem of induction is a large and difficult subject and it leads to conclusions that are difficult to accept. Hume was well aware of that and came up with the conclusion that we must use our experience to weigh the evidence and make judgements. We must use common sense. And regarding data science I believe that even if we cannot logically justify the correctness of a prediction we can always say that, given a sufficiently vast accumulation of cases when A was followed by B, the likelihood that B will follow A in future instances is high enough to accept the predictions. In summary, induction is widely accepted by the scientific world and for our purposes we can be confident that our predictive algorithms, although based on empirical data with all the connected limitations, will still be valid. Or will they?

Bibliography:

  • D. Gilles, Giulio Giorello — La filosofia della scienza nel XX secolo, Editori Laterza
  • David Hume — An Enquiry Concerning Human Understanding — Oxford World’s Classics
  • John Lock — Essay Concerning Human Understanding — Oxford World’s Classics
  • Karl Popper — The Logic of Scientific Discovery — Routledge Classics
  • Bertrand Russel — History of Western Philosophy — Routledge
  • Nigel Warburton — A Little History of Philosophy, Yale
  • Marcus Weeks — Philosophy in Minutes — Quercus

Originally posted here