Next-gen Computers and Post-UNIX Operating Systems: Doing Away with Data Persistence.
We are used to certain products e.g. cars. We embraced the self-driving electric car future. We are on the verge of seeing production flying cars. Whether the real problem we should be solving are not better cars, buses, planes, and even Mars-bound spaceships, but the transportation itself. One needs to get from point A to point B. Via car? Via plane? How about a teleportation device?
I know, I know… Hear me out. Just like the 100+ years of automotive i.e. self-propelled wheeled carriage technology, the current 1960s computer architecture with the CPU, RAM, and HDD is showing its age. Laugh all you want at the crude DOS origin of Windows, the pinnacle of modern OS: Linux (meaning UNIX) is even older, and the aforementioned computer architecture: of the very first IBM mainframe, predates even UNIX. For the last 60 years everything has been about permanent (disk, tape, etc.) storage of… right, files.
A hundred years from now the most futuristic flying cars will be in a museum alongside 19th Century horse carriages. HDD, SSD, RAID, and other types of “storage volumes” belong there today; Along with the operating systems built around the concept of “block device drivers”.
This is not a semi-technical rhetoric. I don’t work for Gartner or Forbes, so I’m going to explain my vision like an engineer. The idea came to me during my early work on Px100. I even wrote a post about the 3rd generation data persistence several years ago. Which is no “disk” persistence at all.
Let’s briefly revisit it. Early (IBM) business software stored everything in flat files of various format. That very first generation of persistence survived in the form of spreadsheets. It’s nothing, but files, consisting of rows of data, comprised of fixed-width, delimited, encrypted, or other fields.
About 50 years ago IBM let Mr. Ellison commercialize its abandoned research on automatically linked (aka “related” tables), storing the same data as before: rows of fields. It is known today as relational (aka SQL — also invented by Oracle) database. Ultimately those table sets and more importantly table row indexes along with transaction journals, logs, and other data are stored in binary files on the same disk with aforementioned plain old unindexed files. That was the second generation of data persistence, which still rules the enterprise world.
Are post-relational aka NoSQL databases are conceptually different? NoSQL data can be roughly defined as indexed BLOBs: e.g. of general purpose (read-write) Document type like Mongo or volume/performance optimized (read) auto-partitioned Big Table like Cassandra. If you think that alone represents the third generation of data persistence, you are mistaken. It doesn’t matter, how the data is stored and accessed logically (e.g. Big Table vs. Document vs. good old SQL tables). And to be fair to the dinosaurs of the industry, the latest version of Oracle database is also not exactly “tabular” internally, like it was 50 years ago.
Here’s the main issue with the second-generation data persistence. All SQL and NoSQL databases have one thing in common: they are physically stored as a set of disk files, and just like plain old files of the glorious IBM past, they operate in coarse “block I/O” mode — reading and writing large chunks of data to mitigate for the inherently slow “block device” access, let alone the network call latency. Frequent reads and updates of small fine-grained pieces of information will kill a conventional database server: SQL or NoSQL doesn’t matter.
RAM to the rescue, right? The precious resource, that used to be prohibitively expensive during the Mad Men IBM 370 era, is a commodity nowadays. Not to mention there are no more HDDs, only SSDs. If you are familiar with the latest versions of Mongo or Oracle, you must know about the in-memory bandwagon, every database vendor jumped on — in the same hurried manner, as government-pushed and image-conscious automotive corporations embraced the electric future. Could Tesla be designed from the ground up to take advantage of the flexible battery compartment that can take any shape and several compact (compared to the massive internal combustion engine) electric motors? Wing-doored they are, they still looks like very conventional sedans and SUVs with a pronounced hood and until recently even a radiator grill. Database vendors simply moved their files from disk to RAM and declared it the “in-memory architecture”.
What they were supposed to do with those damn files? Let’s ask a different question. Files are used to store data, correct? Directly or through an intermediate (database) layer doesn’t matter. Just like cars are used for transportation, we should be revolutionizing instead of the specific mode of said transportation. Going back to computer world, are we worried about files or about the data? Is there an alternative way to store data in a reliable storage? A better question is why one even needs to move the allegedly “ephemeral” data from allegedly “unreliable” RAM to some permanent storage?
I don’t want to repeat my 3GP article, yet again pitching meticulous object-oriented modeling of the world around us: complex biology or convoluted business processes, doesn’t matter. Sadly, over the past 40 years of OOP paradigm, less than 1% of programmers (employed at a handful of tech giants like Google and Facebook) used C++ and later languages like Java for their intended purpose — beyond (using procedural C terms) structs with functions. I’ve changed lots of corporate employers and that’s been my experience.
OOP paradigm is based on many interconnected fine-grained objects: each a separate intelligent living entity with its own purpose: to produce/calculate new data, transform it, store, etc. or any combination of thereof. They frequently communicate with other objects by sending and receiving relatively small pieces of data, which can be roughly called “signals”. They bind and form bigger objects as needed. Those bigger objects can grow, shrink, divide, multiply, or die like cells in one’s body. I didn’t pick such reference to blame the failure to model the neuron cell structure of a human brain on incompetent (underpaid, outsourced, etc.) programmers, who never mastered C++. It can however explain why we still use computers architected in the 1960s.
Confined to that hardware and OS, OO languages like C++ and Java are compiled into 100% procedural code, because the current 60-year old computers are procedural. Things work like this. The application sends and receives structured (dumb) data to device drivers (a set of functions). The trendy modern take on it: APIs and so-called “micro-services” are as flat and procedural, as prehistoric block device drivers.
Imagine if the hardware (and subsequently OS) architecture supported infinite OOP networks of interconnected objects natively — with no translation (data transfer) between the higher-level “object” and lower-level “procedural” layers. It does require rethinking the CPU: as it name implies, a very Central Processor Unit — to decompose it into self-sufficient neuron-like cells of processing power with its own RAM/cache/whatever to dynamically allocate to fine-grained object’s state. Unconventional, it is, such “decentralization” is not rocket science at the current level of technology.
Similarly to advanced electric cars, that did away with transmissions and drivetrains by “decentralizing” the engine in favor of four in-hub motors, decentralizing the previously “C” “PU” eliminates many kinds of data buses, making the computer exponentially faster. Just the flexible PU(s) and RAM. No “hard disk” or “solid state” drives. No files. No databases or anything else in the age-old “permanent” storage “drivetrain”. Just endless clustered/distributed/redundant RAM, that never dies. Human memory is also redundant if you didn’t know. Cloud server farms are like that today — comprised of redundant self-healing servers. Remember to mention Kubernetes when you use my idea to impress your future boss at a job interview.
Simple, isn’t it? Instantiated once (received over the network, from another server, etc.), the set of fine-grained objects self-develops and grows/shrinks like a living organism to ensure optimal performance. Yes, it can be killed if needed. It’s just never loaded or saved like traditional dumb data. because it’s not a set of dumb C functions to process that data, you know. It’s self-contained, fully encapsulating its state (aka storing data in pre-OOP terms). More importantly, it can live in a very reliable distributed redundant RAM indefinitely.
Like I already mentioned, I discovered this idea working n the early versions of Px100 platform, which was created mostly out of frustration, that no one (I knew in the corporate IT world) properly used OOP. The very first discovery I made after building the proper OO core of my business process automation engine, was how worthless and obsolete traditional object to procedural (often called object-relational mapping aka ORM) plumbing was, along with all CRUD code to persist (load and save) dumb data between the storage (database) and the network of fine-grained objects. It was simply the matter of binary serialization of thousands (millions, whatever) of such objects as is, in their native form — using either distributed RAM grids (Hazelcast or Ignite) or OOP-friendly Document databases like Mongo. No relational “flattening” into two-dimensional tables necessary. No batching/packing data into bigger blocks to satisfy slow coarse “block device” I/O. Naturally, my next question was why I need that persistence at all if my server never goes down and those objects can live in RAM indefinitely.
Visualize the technology evolution spiral, if you haven’t already: centralized, decentralized, then centralized again computing… First came very central “mainframes”: floors of powerful (at the time) IBM equipment serving several weak terminals: dumb display/keyboard stations to enter and view the data. Then came very decentralized self-contained “personal computers” to edit Word and Excel files and play semi-realistic games. Then PCs were connected via local networks, and soon after the Internet. Gradually the processing power started to move back to a central location due to the same reason old mainframe terminals were dumb I/O devices: the lack of local resources and processing power. The world yet again embraced “thin” browser front-ends vs. “rich” (meaning “fat”) desktop programs communicating with the remote database directly.
The 25-year old centralization trend reached its peak today — commonly known as “Cloud”. What happens next? Right, decentralization. It is already begun. The Cloud itself is distributed with Big Data e.g. Map/Reduce or ML algorithms running on many servers simultaneously. If that alone wasn’t an indication of the computer evolution direction, recall explicitly decentralized peer-to-peer mechanisms like blockchain along with billions of trillions of gazillions of very smart and self-sufficient IoT devices — connected and collaborating: more and more with each other (e.g. car to car communication), than through a central server.
Only one thing’s left: decentralizing the age-old CPU. Imagine the possibilities. Yes, I’m talking about turning a computer into an infinitely complex self-managed digital cell organism similar to the cellular structure of one’s body, and most importantly, the brain — not to confuse with a set of very different from the human brain today’s scientific algorithms commonly known as AI. Why don’t we concentrate on the digital version of organic “I” rather than “A”? This will be the topic of my next article.