ImageNet is a large crowd-sourced database of coded images, widely used for machine learning. This database can be traced to an idea articulated by Fei-Fei Li in 2006:
We’re going to map out the entire world of objects. In a blogpost on the Limitations of Machine Learning, I described this idea as naive optimism.
Such datasets raise both ethical and epistemological issues. One of the ethical problems thrown up by these image databases is that objects are sometimes also subjects. Bodies and body parts are depicted (often without consent) and labelled (sometimes offensively); people are objectified; and the objectification embedded in these datasets are then passed on to the algorithms that use them and learn from them. Crawford and Paglen argue convincingly that categorizing and classifying people is not just a technical process but a political act. And thanks to some great detective work by Vinay Prabhu and Abeba Birhane, MIT has withdrawn Tiny Images, another large image dataset widely used for machine learning.
But in this post, I’m going to focus on the epistemological and metaphysical issues – what constitutes the world, and how can we know about it. Li is quoted as saying
Data will redefine how we think about models. The reverse should also be true, as I explain in my blogpost on the Co-Production of Data and Knowledge.
What exactly is meant by the phrase
the entire world of objects and what would mapping this world really entail? Although I don’t believe that philosophy is either necessary or sufficient to correct all of the patterns of sloppy thinking by computer scientists, even a casual reading of Wittgenstein, Quine and other 20th century philosophers might prompt people to question some simplistic assumptions of the relationships between Word and Object underpinning these projects. According to Donna Haraway,
what counts as an object is precisely what world history turns out to be about.
The first problem with these image datasets is the assumption that images can be labelled according to the objects that are depicted in them. But as Prabhu and Birhane note,
real-world images often contain multiple objects. Crawford and Paglen argue that
images are laden with potential meanings, irresolvable questions, and contradictions and that
ImageNet’s labels often compress and simplify images into deadpan banalities.
One photograph shows a dark-skinned toddler wearing tattered and dirty clothes and clutching a soot-stained doll. The child’s mouth is open. The image is completely devoid of context. Who is this child? Where are they? The photograph is simply labeledtoy. Crawford and Paglen
Implicit in the labelling of this photograph is some kind of ontological precedence – that the doll is more significant than the child. As for the emotional and physical state of the child, ImageNet doesn’t seem to regard these states as objects at all. (There are other image databases that do attempt to code emotions – see my post on Affective Computing.)
Given that much of the Internet is funded by companies that want to sell us things, it would not be surprising if there is an ontological bias towards things that can be sold. (This is what the word
everything means in the Everything Store.) So that might explain why ImageNet chooses to focus on the doll rather than the child. But similar images are also used to sell washing powder. Thus the commercially relevant label might equally have been
But not only do concepts themselves (such as toys and dirt) vary between different discourses and cultures (as explored by anthropologists such as Mary Douglas), the ontological precedence between concepts may vary. People from a different culture, or with a different mindset, will jump to different conclusions as to what is the main thing depicted in a given image.
The American philosopher W.V.O. Quine argued that translation was indeterminate. If a rabbit runs past, and a speaker of an unknown language, Arunta, utters the word
gavegai, we might guess that this word in Arunta corresponds to the word
rabbit in English. But there are countless other things that the Arunta speaker might have been referring to. And although over time we may be able to eliminate some of these possibilities, we can never be sure we have correctly interpreted the meaning of the word
gavegai. Quine called this the inscrutability of reference. Similar indeterminacy would seem to apply to our collection of images.
The second problem has to do with the nature of classification. I have talked about this in previous posts – for example on Algorithms and Governmentality – so I won’t repeat all that here.
Instead, I want to jump to the third and final problem, arising from the phrase
the entire world of objects – what does this really mean? How many objects are there in the entire world, and is it even a finite number? We can’t count objects unless we can agree what counts as an object. What are the implications of what is included in
everything and what is not included?
I occasionally run professional workshops in data modelling. One of the exercises I use is to display a photograph and ask the students to model all the objects they can see in the picture. Students who are new to modelling can always produce a simple model, while more advanced students can produce much more sophisticated models. There doesn’t seem to be any limit to how many objects people can see in my picture.
ImageNet boasts 14 million images, but that doesn’t seem a particularly large number from a big data perspective. For example, I guess there must be around a billion dogs in the world – so how many words and images do you need to represent a billion dogs?
Bruhl found some languages full of detail
Words that half mimic action; but
generalization is beyond them, a white dog is
not, let us say, a dog like a black dog.
Pound, Cantos XXVIII
Kate Crawford and Trevor Paglen, Excavating AI: The Politics of Images in Machine Learning Training Sets (19 September 2019)
Mary Douglas, Purity and Danger (1966)
Dave Gershgorn, The data that transformed AI research—and possibly the world (Quartz, 26 July 2017)
Donna Haraway, Situated Knowledges (Feminist Studies 14/3, 1988) pp 575-99
Vinay Uday Prabhu and Abeba Birhane, Large Image Datasets: A pyrrhic win for computervision? (Preprint, 1 July 2020)
Katyanna Quach, MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs Top uni takes action after El Reg highlights concerns by academics(The Register, 1 July 2020)
W.V.O. Quine, Word and Object (MIT Press, 1960)
Related posts: Co-Production of Data and Knowledge (November 2012), Have you got big data in your underwear (December 2014), Affective Computing (March 2019), Algorithms and Governmentality (July 2019), Limitations of Machine Learning (July 2020)