TECH | Jul 24, 2019

Artificial Intelligence: towards new transparency requirements?

Open source makes codes open. But is it really possible to understand (and replicate) Artificial Intelligence algorithms?

I remember the day my parents bought a new refrigerator. I think I was fifteen or sixteen years old. Behind the appliance the coil was wrapped in a membrane with the incomprehensible inscription “cyclo-isopentane”. It appeared to be a new substance, required to thermally protect the heat exchanges created by the “refrigerant fluid”. But why was there fluid in my refrigerator? It was then that I realized that I had no idea how a freezer worked. Or what exactly my TV contained. And seeing how difficult it was to find any answers to my questions, I soon realized that my ignorance was no exception, it was the norm.

A new need: to understand how things work

For decades, much of the Western world has lived happily, without bothering to know how their refrigerator, dishwasher, computer, washing machine or car worked. Yet in recent years the rise of the cultural level and the increasing pervasiveness of new technologies are bringing out an unprecedented need to understand “why” things happen.

Thus, not a day goes by without reading articles that invoke a definitive clarity and transparency in the world of algorithms. It is seen as a problem of democracy and in many cases it really is. However, as a solution, in the name of Open Source we hope for a publication of the algorithm code, starting from the assumption that transparency is synonymous with reproducibility, therefore with scientific knowledge and new opportunities for all. This is where the problems begin.

For many years, owning the source code of a program had ensured perfect domination of the logic which determined the result, through sequences of instructions and fixed rules. As if to say: if everyone could access the source code of the washing programs of their dishwasher, they would be able to predict their duration, without having to read the instruction manual or turn on the machine, and they could even intervene on the characteristics and duration of the cycle.

Code transparency and Machine Learning

In Machine Learning instead the software learns new trends from the data, which determine its choices and behavior. The software is encoded in a precise language and lists of instruction, but the output depends on a large number of parameters (the “Model”), learned from the data, in a way which is difficult to predict. Let’s imagine that the level of dirt on the plates determines how my dishwasher works, based on the thousands (or millions) of parameters learned during the previous washes. Although I know the source code of my dishwasher, it would be extremely difficult to predict or intervene on the duration of the programs, before beginning a dinner with friends.

In Machine Learning, knowing the source code of an algorithm does not guarantee a priori the complete reproducibility of the software outputs without having a complete knowledge of the starting conditions, i.e. the so-called “training data” or, better, the “Model” ones.

GitHub, the most famous site in the world for hosting free software repositories, has been surveying the most “active” Open Source projects for some years now, analyzing the contributors, dimensions and growth (Octoverse Report). Among the projects reported, two platforms of Artificial IntelligenceTensorflow and PyTorch, stand out, the first of which is distributed by Google itself. And it is exciting to learn that on github.com there is a proliferation of very powerful free Artificial Intelligence algorithms for the recognition of images and the artificial generation of faces and panoramas, for the transcription of sounds and the recognition of songs, for the analysis and the generation of lyrics.

Is this the victory of free software?

Partly, yes. On the one hand, there has never been an era during which anyone can reuse free software produced and used by large and prestigious American universities (Stanford, Princeton, Brown, etc.), intelligence agencies (NSA, NGA, etc.) or multinational IT companies (Microsoft, Google, etc.), and even be able to modify the source code. It is without doubt the signal of a paradigm shift in the world of software, which has allowed an increasingly wider public to access increasingly complex algorithms, that is, the basic “building blocks” of Artificial Intelligence.

On the other hand, it is increasingly evident that free software alone is not enough to reproduce the amazing results we know as consumers of Artificial Intelligence for at least two reasons: it is virtually impossible to access all the data necessary to reproduce its performances; AI software is based on a large number of orchestrated algorithms to mathematically optimize the value of the output. This triggers a dependence between the logic of the software and the nature of the input, which further complicates the understanding of the processes, making their reuse increasingly difficult. Furthermore, generally logics do not describe sequences of simple operations, but reproduce operations which are difficult to translate outside the mathematical context in which they were created, thus being almost inaccessible to non-experts.

And the transparency of algorithms?

If these are the preambles, a question arises: what exactly do we mean by “transparency” of algorithms? Would it really be enough to know the nature of the software to remove the feeling of opacity towards choices that (it seems to us) others make for us?

Artificial Intelligence and the growing complexity of the world pose new challenges, which require that we should improve our understanding of ourselves, our culture, the reason for our purchasing choices and, sometimes, even of our political choices. There are at least two ways to address this challenge: as consumers, by watching over institutions so as to promote practices and regulations which protect the right to knowledge and the access to data; as citizens, by cultivating an education which trains us to analyze the complexity through the countless intellectual tools that History has left us.

This will probably not be enough to make everyone understand the complex logic of Deep Learning, but it will reduce the real risk of a definite transformation from citizens to mere consumers.

Michele Gabusi