Gephi is an open-source visualization software. It is especially useful for link and network analysis. Gephi is capable of in-depth analysis and creating and customizing your own visuals. Gephi uses data points called nodes connected by lines (edges) that represent the links and networks between them. Many examples online show analysis of website pathways, social media, and biological network data among other things. I decided to do a network analysis tutorial of The Guardian’s Top Selling 100 Books of All Times. Other datasets for Gephi exist on this site, and for simplicity’s sake, I would recommend following along using one of those files. This basic tutorial will go over how to set up Gephi, import a CSV file, and set up a visualization of the data.
Step 1: Download the Software
The Gephi software is available on the Gephi website, click Download Free and choose the appropriate download for your computer. Click on the Gephi download and follow the prompts on your application to open the software.
Now is a good time to download the CSV file you will be working with. The Top 100 Book file I am using is from the Guardian website, click DATA: Download the full spreadsheet and save to your server.
Step 2: Import CSV file into Gephi
Go to File > Open > New Project
Select your CSV file. Indicate the Separator (commas usually) and what form you want to import it as (Adjacency list for this file).
Click Next, then finish any import settings. I kept the time representation at Intervals for this file. Click Finish. The next screen will show any potential problems with your CSV file, and it also indicates how many nodes and edges there are in the file. Click OK.
Step 3: Navigating the Main Screen
After uploading your CSV file, you should be taken to the main screen. Each upload gets its own window, called a Workspace. The main area is called the Graph module. In this area, you can zoom (two fingers up and down on a track pad) and move (two fingers click and drag) the visualization. If you lose your visualization you can click the magnifying glass to reset.
The representation below is random; to fix this we need to adjust the layout in the Layout module.
Step 4: Customize the Layout
Navigate to the Layout module (bottom left corner) and choose a layout. There are many options, shown here are the Fruchterman Reingold and Force Atlas layouts. All layouts are customizable so you can edit what your graph looks like. For this tutorial I will use the Force Atlas layout.
Within the Force Atlas layout, you can adjust by size and change the repulsion strength. This makes the visualization easier to read and more aesthetically pleasing. Click adjust by size and change repulsion strength to your desired strength (I chose 1000.0).
Step 5: Customize the Appearance
An all-black visual is not one that someone can use practically. In the Appearance module, click on Nodes > Ranking > then choose your preferred color scheme, or create your own. Click Apply to see it in the Graph module.
Step 6: Add Labels
In order to understand what the graph is showing, you can add labels to the nodes. Use the toolbar at the bottom of the Graph module to add labels, adjust to node size, and adjust the text size. You can also use this toolbar to adjust the size of the edges.
Once all of this is done, you can see your data in a clear visualization. The networks and links between data points (nodes) are represented by lines (edges). Here is one section of the CSV file I used. In this graph, you can see the connections between the books published at Random House.
Here are some guides and instructions that helped me create this tutorial, as well as a link to the CSV data set and the Gephi website.