The following is the text of my original letter sent Jan 7/2004 to Jon Udel at InfoWord
(firstname.lastname@example.org) in follow up to "A Tale of Two Cultures", Infoworld, Dec 31, 2003
Hello Jon -
I sincerely believe that you could not be more right about the convergence of traditions between
Unix scripting and Xerox/Apple/Microsoft windowing.
The comment about Unix suffering from multiple legacy "mini-parsers' is well taken, even in the
view of this Unix sympathizer. And as discussed later, I also subscribe to the notion that the GUI is
a side effect of the application, not the other way around as treated in the first generation of
commercial windowing systems basically left over from the 1980s. But I would suggest taking your
analysis a bold step further to suggest that the change is much more profound than only the
intersection of two cultures, but rather goes the core of what distributed systems mean and how
they are built.
What is happening is no less than a major aspect-orient interweaving of the three elements that
have existed since von Neumann's 1946 paper pretty much defined the modern computer. While
von Neumann et al introduced the now standard model of memory which freely intermixes
program and data, very little use has been made of self modifying software and almost none
commercially. For nearly fifty years data and executable code have been rigidly separated in all
commonly accepted methodologies. One consequence of this separation is the notorious
"impedance mis-match" between the currently dominant 00 programming model and the currently
very dominant Relational Data Model.
I believe your column is a piece of a bigger puzzle - the relationship between data and computation
in networked environments. I would suggest there are three units from which any distributed
system is made: the units of persistence, the units of computation, and the units of communication.
Within the total system, all knowledge content represented, stored, manipulated and distributed
must ultimately map to these three units and combinations of these three units.
These basic units have always been mixed together to limited degrees, for instance program code
in a file can be viewed as an executable unit (the object module after loading) contained in a unit of
persistence (the file). Obviously the role of units of communication is to deliver the information
content of other two types to wherever they need to go. In nearly all situations permitting
modification of code, it is between levels, such as a compilers which mediate between source code
and executable code. But internal modification with within a level is strictly forbidden. For example,
the act of compilation generally does not result in modification of the compiler itself. Interpreters
and code generators have much the same relationship between their inputs and outputs.
But starting with HTML and accelerating with XML, units of communications, persistence, and
computation have become intermingled on a massive scale in fairly radical new ways.
It is standard fare now for webservers to take units of persistence such as .jsp and .asp files,
by the browser, and then transmit the resulting page as a single unit of communication to the
a general sense, there has been a distinct increase in the use of code generation on the fly that was
previously encountered relatively rarely, and a very great increase in the delivery of units of
communication which contain a mixture of data and computation.
It is the last part that is particularly significant and your article touched on it. In many ways, the
intentional combining of data and computation is a new paradigm (to use an over extended term).
In traditional data processing, reaching its zenith with the Relation Data Model, the content of
interest is oriented very heavily towards passive units of persistence. In traditional programming,
reaching its zenith in OO design and programming, the emphasis is on units of communication that
carry only data (message oriented "signature" parameters) to be delivered to combined units of
computation and persistence ("objects" composed jointly of methods and attributes) which
intentionally hide both their internal data and algorithms.
While the Relational Data Model has no natural extension into the realm of computations (relational
algebra et al aside), OO has trapped itself by two rigidities. First, the messages which an object
accepts and emit tend to be very brittle in format, and the resulting protocols spoken between
objects tends to very complex and usually not documented and often simply not understood.
Capturing behavior adequately has always been a weak part of the OO paradigm.
More fundamentally, however, OO made a deal with the devil by willfully accepting "information
hiding". Most of the benefit of information hiding in OO is simply suppressing the details of the
lower level(s) of implementation used to define the internals of the object. But that throws the baby
out with bath water. The complete useful semantics of the object is almost certainly lost behind the
facade of syntactic sugar provided by the method signatures and exposed attributes.
Worst than that, this kind of unhealthy information hiding prevents what I believe will become the
next dominant design in distributed system design and implementation: which is the ability of a
computation dynamically to combine both data and "rules" (small units of computation) within itself
and within ,other units on the fly, and then make use of the new combination(s) directly, or
indirectly by using them as units of communication. The importance of dynamic composition (of
data, computation, and communication) for addressing complex and evolving systems can hardly
be overemphasized. As your article noted, keeping things "upstairs" and "downstairs" is hardly an
advantage, but combining them opens up whole new vistas.
This became apparent to me while analyzing a large billing system (100s of million of invoices per
month) only to realized that the latest fad of "componentization" did not address and would not
solve the core problem: the need to flow billing rules as well as data between major subsystems.
My (proposed) solution was to treat contracts, user agreements, pricing plans and the like as XML
documents that contained both parametric data and pricing rules, since we had proven previously
that application specific scripting languages to define the rules were possible. The documents could
thus be presented to humans via style sheets and the like, plus at the same time be used for system
configuration and execution.
From a computational perspective, the rules were combined on the fly from the multiple documents
related by common account number and the rules could reference data in any of the associated
documents. There was a framework for combining the rules into one virtual set of rules per
transaction step that was part of the architecture of the application specific scripting language(s).
Even in complex billing systems, the number of related documents needed per transaction step is
only 3 or 4 at a time, and almost certainly less than 10 documents. Besides producing a simplified
high level data model, this approach produced the architectural foundation of a dynamically
assembling transaction system (or at least each step of the transaction).
In fact, I believe this architectural pattern is applicable to the vast majority of commercial
transaction systems. It could be argued that webservices define the units of communication and the
transaction steps visible between computational and storage elements in the network.
So ultimately both data and rules need to be accessible and open to manipulation. This certainly
encourages very high level knowledge representations that are very close to the problem domain.
The notion of "composition" within such a language becomes front and central. In the limit, this
approach literally allows domain experts to "program" the system at the highest level using their
own native notation.
Of course, the enabling software and hardware to support this knowledge payload still requires a
great deal of hard slogging. But application specific knowledge moves out these lower layers which
become purely part of the infrastructure to support the high level applications. If your application
logic is hidden inside of objects and buried within the coding of (low level) general purpose
implementation languages, such as C/C++/Java/C#, or stuck in unfathomable data schemas,
something has gone seriously wrong.
If still desired, "information hiding" should be employed for reasons that inherently stem from the
problem domain, generally involving security and privacy, and not from extraneous obscurities
thrown in by the solution architecture. As a practical matter, at least for XML, the necessarily
"hidden" parts can simply be encrypted in order to make them inaccessible to undesirable eyes.
This still allows them to be used as units of persistence and units of communication. ("I don't know
all of what I sent you, but my source says it's really good!")
Well this got a little longer than expected, but I hope it was of some interest. And once again, good
|Subject: Re: Tale of Two Cultures