Did increased computing power break data management?
by Jens on Sep.19, 2011, under Data Management
With the risk of the title already having put you off, and labelling this post as lamenting progress over the “good old days”, I am simply offering a few observations.
As part of a data resuce project, I’m currently reading the documentation for a plankton database system from 1994. The documentation pretty much covers everything you would expect to find in a more modern documentation, e.g. a data model, work flows, methods and background code. It was designed to run on an old VAX system, and of course there are some technical limitations that I am glad that we are over.
There are very clear limits on what could be put into tables, and the number of tables that were created. This is obviously a result of having to be careful about referencing. E.g. you did not reference long species names when an identifier was available as it would slow your system to a crawl
However, looking at how very neatly data are structured in this old system, and looking to some of the more “modern” systems, it is actually far easier to re-extract and structure this data. All of the referential structure is intact even though the software itself is long gone, and I am left with nothing but the raw data. Looking at the raw data dump of many modern databases leaves something to be desired (IMO). The strict control of field formatting often goes out the window, and the “logic” of a system is frequently moved almost entirely to the front end, meaning that the DB itself sometimes doesn’t even contain all the referential information required.
Obviously you get what you pay for, and this is far from the case in many modern systems, but with the increased speed and power of computers, it has become ever so easier to throw clock cycles at a workflow problem rather than going back to the root causes.
Given the increase in computing power, I’m not sure if we have seen an equivalent increase in performance of databases. Granted we can store much, much more in there, and in a much wider range of formats, but does it encourage the cutting of corners rather than improvement of data models?
I’m happy to be proved wrong, and I suspect there isn’t a right or wrong, but there is certainly an observation that when you are forced to be economical with your clock cycles, there seems to be a higher attention to your data model.