Database Debate

Relation databases are extremely useful for holding large amounts of data. “Big Data” is a buzz word that you hear on a variety of disciplines from the data of primary sources of people from the 1800s for a history project to data on genetics for biologists and chemists. I’ve had some experience with SQL, the primary database relation program, in my summer internships.

As stated before, relational databases are great for trying to capture specific data in a large, usually more than 500 cells, database. Relation databases do a great job limiting redundant inquiries and making overwhelming amounts of data useable. Another good thing about relational databases is that you can change how data is connected. As explained in the Ramsay article, if multiple fields of data have similar contents, the user can create a primary key(unique identifier) and a foreign key, another unique identifier. Essentially, relational databases are great for analyzing big data from a variety of fields.

Relational databases may be the best we have, but they still have issues. First, data fragmentation is a significant issue that relational database owners have. Furthermore, constant manipulation of data by multiple users can make the database less efficient and more complicated. Also, there is a balance between having unique fields for data analysis and having too many unique areas that either make getting data more complicated to understand or slower to retrieve.

    Flat structured data like spreadsheets are a great way to analyzes small amounts of data. Flat structured databases like spreadsheets can manipulate data in a way that is manageable. For example, once you find all 20 books written by Malcolm Gladwell from your 100,000 items relational database. It is better to analyze the data on a spreadsheet than trying to interpret the relational databases. Also, spreadsheets are more intuitive for most humans to grasp rather than more complex coding languages like SQL.

    The issues with flat structures are that they are not that expansive, meaning it can only hold a low volume of data. It would be better to use a spreadsheet to store data on the class and use a relational database to store data on the residents of Northfield.

The issue with data collection and metadata that need to be solved first is security. Large amounts of data make it easy for bad actors to manipulate data unnoticed. Another issue is with data literacy. Many people do not know how their data is generated and used. The underlying theme between these two issues is an international political theme. There needs to be more transparency between private institutions and governments regarding consumer data usage.  

Author: Kwaku

Leave a Reply

Your email address will not be published. Required fields are marked *