10 months, 4 days ago

Buckets and Balls

Link: https://www.strategicstructures.com/?p=1889

Linked Data is still largely unknown, or misunderstood and undervalued. Often, people find it simply too difficult. So I keep looking for new ways to make Linked Data more accessible. And with some success. In my training courses so far over 60% of the participants had no IT background. I hope even to increase this percentage in the future.

What seems to be most challenging is writing SPARQL queries. The specification is written for IT people. There are some great courses and books but they also target people with some or more IT experience. If anything, that scares the rest and keeps SPARQL away from the masses.

I keep learning what is challenging. A recurring problem – and an unexpected one – is the concept of variable.

What is a variable in SPARQL? Just a placeholder. But how can you imagine a placeholder? It’s abstract. We have no way of grasping abstract things unless we associate them with something physical and concrete. It’s difficult to imagine time, but once we draw it in space it gets easier. We can’t picture furniture, but we have no problem with chair.

The other issue is how a SPARQL query looks. While working with SPARQL helps to understand how a knowledge graph works, a SPARQL query doesn’t look like one. It is like with symbols in mathematics. “5 doesn’t look like five, while ||||| is five”. The problem with SPARQL is similar:

You want to query knowledge graph.
You want to learn new things.
But your query doesn’t look like knowledge graph.
It looks like lines of strings.

So, how to handle together the problems with grasping variables and with the look of SPARQL?

My suggestion is to imagine every SPARQL query as a graph of linked buckets and balls.

Variables are placeholders but abstract. We need a physical container to fill with things. We need buckets. And nodes are like balls. So, think of running a query as filling buckets with balls.

A graph pattern then will look like this:

A bucket ?A should be filled with those balls which have a relation R to ball B.

But it looks nicer when we abbreviate it like this:

This is a graph pattern in Buckets’n’Balls notation. The direction of the relation R is not shown but it’s always from left to right.

The process of writing and running a SPARQL query would then go through the following steps:

  1. Select your buckets (in them you are going to gather the balls you want).
  2. Compose your conditions as a graph of buckets and balls.
  3. Run your query to fill your buckets with balls.

Now let’s write a query to get all heads of government of the member states of the European Union, following these steps.

  1. We need two buckets, one for member states, one for heads of governments.
  2. The bucket for member state should be linked to the ball “European union” with relation “member of”; and it should be linked to the bucket for “Head of Government” with, well, the relation “head of government” which, respecting the direction should be thought of as “has head of government”.

Let’s make it more interesting and add another bucket for images. Here’s now the query in Buckets’n’Balls notation:

When we run the query, our buckets should, hopefully, look like this:



Now let’s go to Wikidata and write this query there.

Start by selecting the buckets to fill in.

SELECT DISTINCT ?EUcountry ?headOfgovernment ?image

Next we need to write the conditions to be met by the balls to fill the buckets with. All conditions in SPARQL are written within curly brackets {}.

When we want a specific relation or a ball, we need to use its identifier. Wikidata provides a nice service that writes the identifier for you, once you select the relation (property) or the ball (item) by its label. The common part of the direct relations in the knowledge graph of Wikidata is abbreviated with wdt, and of the items (our balls) with wd.

Following our Buckets’n’Balls drawing, we start writing the conditions by putting the first bucket, ?EUcountry. Then we need to write the first relation. Since the common part of the identifiers of direct relations is wdt, we write wdt: and then Ctrl+Space to trigger the Wikidata autocomplete service.

Following this, we reproduce the model from the Buckets’n’Balls drawing into an actual SPARQL query.

When writing a SPARQL query the common way to align it is to resemble a table with three columns: subject, predicate, object, or – in the language of wikidata – item, property, value.

But that means reinforcing thinking in tables. So maybe, it would be better, at least the first queries you right to resemble a graph. Putting a bucket as the subject of a new graph pattern (aka triple pattern) below the same bucket when it is the object of a previous triple pattern makes it look more like the graph from the Buckets’n’Balls drawing above. This way our query will look like this:

SELECT DISTINCT ?EUcountry ?headOfgovernment  ?image
?EUcountry wdt:P463 wd:Q458;
           wdt:P6  ?headOfgovernment .
                   ?headOfgovernment  wdt:P18 ?image .


Now see how this query looks in Wikidata and run it. Then click on the eye icon on the left and select to view the results as Image Grid.

That’s not bad as a first result but now, on each image, we see only the ID of the persons and the ID of their country. If we click on an ID, we’ll get plenty of information about that item. But it would be nicer if, apart from the IDs, we can see their labels in the query result. This basically means to put labels on our buckets.

The label is something each ball is linked to but Wikidata provides it as a service so you don’t have to think about it. All you need to do is: add the word “Label” to the variables and evoke the label service. The latter you do by pressing Ctrl+Space again in a new line inside {}, and when you start typing “label”, the labelling service will appear. By default, you get it with the language of the interface and English as an alternative, if the label is not available in the language of the chosen Wikidata interface. Wikidata is full of such nice services and for the final query, we’ll use one more. To get the result in image grid by default, put somewhere in your query the following comment:


In fact you don’t need to write it all. When you start typing, the autocomplete service will suggest it.

Our final query looks like this:

SELECT DISTINCT ?EUcountry ?headOfgovernment ?EUcountryLabel
?headOfgovernmentLabel ?image
?EUcountry wdt:P463 wd:Q458;
           wdt:P6  ?headOfgovernment .
                   ?headOfgovernment  wdt:P18 ?image .

SERVICE wikibase:label
{ bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }



Try it.

I hope that thinking of SPARQL queries as linked buckets and balls can be helpful, at least in the beginning. And of course, each metaphor has limitations. For example, you can’t put the same ball in two different real buckets but in these virtual ones, you can. Buckets’n’balls can be a useful ladder. Once you climbed up, you can through it away.

When reading the query result you may wonder why UK appears since it’s no longer in EU and why the Netherlands appears also as the Kingdom of Netherlands. More importantly, how to improve the query to get better results. This will probably come in another post.

Before you go, let me share with you a nice query I found recently, which also brings back an image grid of people portraits as a result. This one, however, is not for prime ministers but for all ministers and not current, but all time. The link below will run it for France but you can select another country. Just press edit visually on the right:

What I find particularly nice is that when you scroll down, it feels like travelling back in time.