Thursday, 12 March 2015

What does it mean that a software system is fragile, robust, or antifragile? Considerations, ideas, and examples

In what follows is my answer to the second question of the Webinar on Antifragility: "Antifragility Webinars: Practice Beyond the Rhetoric!" that I mentioned in my previous post:
How do I envision, and how am I actually translating Professor Taleb's antifragility into practice?

In order to answer this question I will need to spend a few words about systems made of software.

Software is nowadays becoming more and more complex; this is because it's becoming easier and easier to create complex applications from off the shelf components. Complexity can be easily manipulated, combined and recombined into ever more powerful software systems. On the other hand said systems become more and more fragile, as there is often little or no guarantee about the quality of the "bricks" one uses for their construction.
Are those bricks robust? Are they fragile? Antifragile? There's no easy way to tell and no standard that helps at the moment.
So what we have today is gigantic castles of sand that are precariously built, in that their solidity and stability depend on a chain of assumptions: A assumes B is going to be reliable and available, B assumes C and D will work as expected, and so on and so forth.
(Regrettably this chain of dependencies extends beyond software. Today's critical infrastructure are based on the same principle and share the same weakness.)
OK so what do you do to prevent failures? The typical answer is that of using redundant resources. Instead of using a single component, you use several replicas. If one fails, you use another one. Or you use them all at once and then you select the output based on some criterion for instance a voting scheme. If there's a majority in consensus, you assume the majority is right.

The key word here is redundancy. To better understand what this word means, let me describe you a videogame.
You play General Grant; you want to send an important message to a part of your troops so that they are informed of the next steps in your war strategy.  The message has to go through a battlefield that is under the sphere of action of your Enemy. What do you decide to do?
A possibility is, you send a cavalryman with your message. Of course the carrier of your message may be hit; in other words, this is a fragile scheme
Grant knows better, so he sends several cavalrymen in the hope that at least one will reach their destination. For instance, he may choose to have three cavalrymen. This is a better scheme, 'cause it shields from up to two failures; on the other hand, this is a scheme that does not take into account how the situation evolves on the battlefield. You use three cavalrymen because you think that this number is big enough; but your reference is an estimation of the current condition. In fact, conditions may vary. Say the enemy doubles in number, or is joined by an artillery team that increases considerably its firepower. What then? The three cavalrymen may be all wiped out and the message be lost. If you compare it to sending just one cavalryman, this second scheme is much more robust; and though, this is not at all sufficient to counterbalance changing conditions — conditions that mutate, possibly unexpectedly, and possibly very rapidly. The technical word that is typically used is turbulent environments. A simple robust scheme is one that "does not care too much" [as Prof. Taleb says] about the evolution of its environment, and because of this often ends up caring too little.

What then? Well, if one could track the environment (e.g. the firepower) and the way our current scheme matches the environment (basically, how many cavalrymen are left at any point in time) then one could have a more robust scheme — one that is resilient, namely adaptive to changing conditions. New cavalrymen could be added in dire conditions, and their number could even be reduced through more relaxing conditions.

But this is still not antifragile. In fact, the system stays the same: each time you face the problem you launch the same solution — a solution that is not changed by the experience. What we are trying to do is to change this. To change the software "DNA" after each "run" while taking into account the past runs.
What we do in practice is, we use web services (representing our cavalrymen); the system tracks the performance of our group of cavalrymen considering both "the parts" and "the whole": each individual "cavalryman" is tracked (one checks whether he's loyal and trustworthy, and to what extent he is) and the ability of the overall group is also tracked (how close we are to failure and disasters over time). 
(For more information about the above schemes and especially on
Distance-To-Failure please refer to this and this paper.)
When performance is not satisfactory, the scheme is revised. Not just the amount of cavalrymen, but even the choice of which "cavalryman" to use is constantly revised.
(For more information please
refer to this and this paper.)
Next steps will be to include machine learning schemes to tell which solution works better and best-matches the foreseen next condition. And we want this match to feed back on the solution itself, and be persisted in future runs. In other words we want to change the "genetic code" of the solution. For instance, instead of individual cavalrymen (webservices), we could learn that the scheme could work well with teams of cavalrymen (webservice groups). Said teams could work as a specialized "organism", with different roles within each group. Instead of working independently of one another, those teams could... team up into a fractal organization of webservices functioning as a Fractal Social Organization.
(For more information about
Fractal Social Organizations, please
have a look at my ERACLIOS
posts [a, b, c]  and the papers
here and here.)

Creative Commons License
What does it mean that a software system is fragile, robust, or antifragile? Considerations, ideas, and examples by Vincenzo De Florio is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
Permissions beyond the scope of this license may be available at

Wednesday, 11 March 2015

A Behavioral Interpretation of Antifragility

I have been recently invited to participate to a Webinar on Antifragility: "Antifragility Webinars: Practice Beyond the Rhetoric!"

Quoting from the referenced site, on March 18, 2015 Dr. Russ Miles and Dr. Si Alhir
"will host a panel of practitioners to explore:
  • How these practitioners have interpreted Taleb’s concept of Antifragility,
  • How these practitioners have translated their interpretation into practice, and
  • The results and impacts of their efforts — Practice Beyond the Rhetoric!"

Here I would like to share with you my answers to the first question. If time allows I will address the other two questions the coming days.

So, how have I interpreted Professor Taleb's antifragility?

Mine is a behavioral interpretation, meaning the focus is not on the way a system is structured and constructed. Rather, it is on the way the system responds to change. [cf. the work of Wiener and others that brought to the concept of cybernetics]
This approach focuses on systems and their output, regardless of the nature of those systems. Therefore it applies to biological systems ("beings"); artificial systems (cyber-physical "things"...); and it also applies to collective systems made of beings and things.
Wiener and others used behavior to characterize all types of systems — to tell how smart a system was when facing change. There's a famous paper, called "Behavior, Purpose and Teleology", where they distinguish systems according to their behaviors; they have
  • systems that react with no concern about the situation (elastic systems)
  • systems that check what's going on and try to adjust to it (adaptive systems)
  • and systems that keep track of what's going on and try to "tell the future" (anticipate conditions that could be black swans or maybe gold swans) (predictive / extrapolatory systems).
Obviously if you consider resilience, the above classification is somewhat in line with Prof. Taleb's vision of fragile, robust, and antifragile systems. I say "somewhat" because Wiener & co. did not take into account the effect over time of facing change: the genetic feedback produced by the experience. In other words, a systems' evolvability.

If we want to extend the behavioral classification with Prof. Taleb's antifragile systems, we have to consider an extra dimension. I call this dimension the one of evolving feedback behaviors (EFB) — behaviors, that is, that leave a trace in the system, and actually modify the system. It is important to understand that such systems do not preserve the "self" — their identity. It is more difficult to make sure that such systems "stay the same" — namely, comply to their specifications; behave as expected; and so forth. Is this a big problem? Yes it is. Quoting Professor Hawking, an artificial system that can evolve

"would take off on its own, and re-design itself at an ever increasing rate. [..] Humans, who are limited by slow biological evolution, couldn't compete, and would be superseded." (cf. my post "What System Is The Most Resilient?")

Antifragile behaviors may be considered as a particular type of EFB: one in which the self-modification improves the system-environment fit — one that makes it more probable for the system to survive in the current (or the hypothesized future) environment. (Note that being able to improve one's system-environment fit has nothing to do with guaranteeing that what the system does is "right". In other words, special care must be taken to make sure that the drifting of system identity associated with antifragile behaviors does not translate into "dangerous" or counterproductive behaviors. Some form of safety enforcing  invariants should probably be embedded into antifragile behaviored systems (cf. Asimov's Laws of Robotics)
Creative Commons License
A Behavioral Interpretation of Antifragility by Vincenzo De Florio is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Permissions beyond the scope of this license may be available at