Data Details

I’ve interacted with lots of different data structures, both flat and relational, through my professional work. I was never really aware of any of their differences though until I started managing the data for a complex analysis project we were planning in the CCCE. I spent a couple hours trying out different formats for a spreadsheet before I realized that no matter what I tried, I was going to have a problem. That’s because I was trying to use a spreadsheet, a flat data structure, to house not just information about the different aspects of the project, but also to keep track of the relationships between those pieces of information. I eventually realized that my Google Sheet was never going to work the way I envisioned and migrated to an Airtable base. At the time, I did not have any language to describe my problem and solution, but luckily this week’s reading have helped me understand better.

Flat data structures are extremely useful for organizing basic information. There are plenty of free spreadsheet programs available so there are relatively low barriers to access them, the essentials of rows and columns and tables are straightforward and familiar to any who have worked with their non-digital counterparts, and formatting and formula options make them easy to customize and query. However, there are downsides, like the one I discovered when trying to establish relationships between pieces of information. Because there is no built in function for linking cells, flat spreadsheets include a lot of redundant data–for example, every instance of a particular professor’s name appeared separately in my CCCE spreadsheet, meaning any search for their mention would return dozens of results instead of one result that is linked to each place where the name appears. This might not be much of an issue for a small dataset, but can certainly cause issues when computing large amounts of information.

Relational databases address both of these issues but come with some added complexities. Instead of presenting data in an easily manipulable grid interface like Excel, relational databases are built and accessed through a programming language, Structured Query Language (SQL). Using SQL, one can enter pieces of information into a database in their respective locations, and then retrieve specific combinations of data based on that location. To establish relationships between those bits of data, programmers use unique keys to co-reference the rows in each table where the linked information is housed. So not only does using a relational database require knowledge of SQL (or access to a proprietary software that uses SQL) and an understanding of more complex computational concepts, but they also require a lot of maintenance in order to ensure that those entries and relationships are in working order. For example, reconfiguring a database can create empty or “null pointers” when a former relationship is changed and keys no longer provide relational information, or inputting data without controlling for variations can lead to duplicated entries or broken relationships between entries.

Knowing these differences now, I can see why a relational database should have been the obvious choice for my project. Even learning some basic aspects about database options made it much easier to identify tools that would fit my needs–hopefully after learning more about databases, I’ll also get better at identifying other important data considerations for future projects. Such as my (in progress) website! I’ve been trying to change the link decoration using CSS to be dotted, but only within the body of posts, but haven’t quite zeroed in on the right class to use, so please share ideas if you have them!

Author: Michelle

Leave a Reply

Your email address will not be published. Required fields are marked *