Flat v.s. Relational Databases

When choosing a format (or larger infrastructure) in which to store data, there are perhaps two questions that are of utmost importance: the amount of space required, and the speed in which it can be accessed. The amount of space is often considered as not only the amount of space storing the data itself takes, but also the transient space required in accessing the data (say, the amount of RAM required at a particular stage of access). In the domain of computer science, the speed in which data is accessed is most often considered in terms of the asymptotic behavior of the algorithms which access the data. For example, if there are n data points, and you are looking to find a specific one, does it take the same amount of time to find a data point irrespective of the total amount of data points, or does it take longer with each additional data point? Does it take n times longer, or n-squared times longer? These are all important considerations, especially for larger data set.

The speed in which data can be accessed also has implications that fall outside of the asymptotic behavior of the access time. It also has to do with larger questions of accessibility. Data that can be easily modified itself without the use of proprietary software likely saves time in editing, and prevents the case where data becomes in accessible. This is a huge benefit of flat databases, like text files.

When there is a large diversity in the data, using a relational database is often best suited. A file by convention has particular formats, and not all data can be stored with every format. A relational database can be constructed so that multiple types of data can be encoded all into once collective database. The more robust nature of these databases can also allow for storage and access time optimizations that may otherwise not be possible in flat databases, although this depends on one’s intentions of use.

Finally, relational databases are well suited for cases in which direct access to a subset of the data is required, whereas in applications where an entire dataset it to be processed, typically flat databases are better suited.

Ultimately, it is important to not only consider the theoretical pros and cons of each, but also the real world experience of using these different database paradigms. In the end, it is critical that people who need to use the data stored within are able to do so effectively, and sometimes that real world outcome is different than what a theoretical perspective may suggest.

Hacking the Humanities 2019

Author: John

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Authors

License

Sources

Meta