Preparation

Overview

This project will focus on the production and development of an application that generates a visualisation or group of visualisations with the aim of simplifying the presentation of large volumes of rapidly changing data, thus allowing the user to easily select, contextualise and analyse both live and historical data.


Specification

Below is a list of all of the requirements for this project, which have been split into the following sections:

Overview (based on the project description)

Data collection

Data processing and storage

Output

Graphs

Other


Risk Analysis

Risk Possible Consequences Probability Severity Overall Risk Preventative Measures
Twitter disables public access to their API and data The dashboard would be unable to function without the Twitter API as its data source. Low High Medium A few hours of data will be recorded in the format that Twitter provides so that if they decide to disable access then the dashboard can still function from stored data.
Twitter changes their API methods A substantial amount of connection code may have to be rewritten to use the new methods. Low Medium Medium The application will be written using a library that has been written to handle connection to Twitter (Tweepy Python library), if Twitter updates their API methods then Tweepy should be updated quickly as there are many programs using this library.
Twitter changes their API response format A substantial amount of code may have to be rewritten to accommodate the changes. Low Medium Medium The application will be created as multiple modular sections using APIs to connect them together so that if one part is required to be rewritten the changes will be contained to a single section of the code base rather than having to rewrite code in all / the majority of files.
Tweepy stops updating their repository and the library stops functioning correctly. Either a different Python Twitter library would have to be implemented or the Twitter connection would have to be managed manually. Low Low Low As the library is already downloaded (and is stored in this GitHub repository) unless Twitter also changed their API the current files would still be usable. A backup of this repository will be kept (by both myself and Github) to ensure that any files do not get lost.
ChartJS stops updating their repository and the library stops functioning correctly. Either a different JavaScript graphing library would have to be implemented or the visualisations would have to be created manually. Low Medium Medium As the library is already downloaded (and is stored in this GitHub repository) the current files would still be usable. A backup of this repository will be kept (by both myself and GitHub) to ensure that any files do not get lost.
JQuery stops updating their repository and the library stops functioning correctly. The application would have to be implemented using vanilla JavaScript. Very Low Medium Low The library is extremely popular so this is highly unlikely. As the library is already downloaded (and is stored in this GitHub repository) the current files may still be usable. A backup of this repository will be kept (by both myself and Github) to ensure that any files do not get lost.
My development machine breaks and all data is unrecoverable. All data would be lost, and I may be unable to work on the project until I was able to purchase a new computer. Low High Medium All files for this project are stored on Github and/or Dropbox (excluding the database created by the application). The only work that would be lost is work that had not yet been committed, which is normally at most one hour of progress. I would be able to get hold of a new computer to work on within a maximum of 72 hours, so at most three days would be lost. Though it is likely that some progress could be made without the use of a computer (designing a soon to be implemented part of the project).
Internet connection is unavailable for a few hours. Unable to access live data from Twitter as well as retrieve any new libraries required. Medium Low Low This would have no long term effects on the project, and although short term it would not be possible to work on certain areas of the project, there would almost certainly be progress that could be made without internet access.
Internet access is unavailable for a number of days or weeks. Without internet access for longer periods of time the rate of development would decrease, and the project specification may have to be re-evaluated. Extremely Low High Low There are multiple locations available to me where it would be possible for me to stay, in the highly unlikely event that internet access was unavailable on a university campus. If internet becomes unavailable at all of these locations for a long period of time there will be bigger issues than the completion of this project.
System requirements are not adequately identified Application does not meet the expected requirements due to the interpretation of the requirements. Low Medium Low Requirements will be defined specifically so that little or no interpretation is required.
Project involves the use of technology that hasn't been used in a prior project Extra time will be spent trying to figure out how to use these new technologies. Medium Medium Medium Popular libraries and technologies will be chosen so that it is likely that there will lots of available resources to provide assistance if required, which will minimise the amount of time that is wasted.
Inadequate estimation of time to complete tasks. The list of requirements would not be fully implemented to the level that was initially planned. Medium Low Low When estimating the time to complete tasks, if unsure the expected time will be overestimated slightly rather than underestimated so that if anything, additional time is available to improve the application.
Source code is lost. A substantial amount of progress that had been made on the project is lost and needs to be completed again. Low High Medium Files for this project are stored on Github and/or Dropbox (excluding the database created by the application). The only work that would be lost is work that had not yet been committed, which is normally at most one hour of progress.
Some of the source code is accidentally overwritten A substantial amount of progress that had been made on the project is lost and needs to be completed again. Low High Medium Files for this project are stored on Github and/or Dropbox (excluding the database created by the application). The only work that would be lost is work that had not yet been committed, which is normally at most one hour of progress.
Some of the source code is corrupted A substantial amount of progress that had been made on the project is lost and needs to be completed again. Low High Medium Files for this project are stored on Github and/or Dropbox (excluding the database created by the application). The only work that would be lost is work that had not yet been committed, which is normally at most one hour of progress.
There is poor visibility of project progress Important tasks within the project could be missed out due to the lack of visibility. Very Low High Medium A project management system will be used to keep track of both the project as a whole as well as individual parts of the project. This will ensure that there is high visibility of the project's status as long as this system is kept up to date.
The project developer becomes seriously ill or injured. The project would almost certainly not create the expected application in the same timeframe, either the project specification would have to be re-evaluated or the deadline would have to be extended. Low Very High Medium Activities that increase the chance of illness or injury, such as extreme sports, will be avoided until the completion of this project.


Proof of Concept

Overview

A basic, dynamically updating visualisation that makes use of the services that are likely to be used for the main project. Referred to as "Proof of Concept" in the rest of the documentation. Including the following:

Application

Below is a screenshot of the proof of concept that was created:

Screenshot of proof of concept graph.

A video of the proof of concept running is available at this link.

The source files for the proof of concept are available in 'masters-early' directory that was submitted for the early deliverable.


Proof of Concept Review

Important note: Google App Engine cannot be used with the Tweepy API (and multiple other Twitter API libraries) due to port restrictions put in place by Google.

Overall the proof of concept was a success. It showed that it was possible to retrieve tweet data from Twitter using an API, read specific data contained within the tweets, process and store that data on a server, and produce a dynamic graph displaying the processed data on a web application.

Although the application worked as a proof of concept, the following major issues were discovered, which will all be fixed in the implementation of the full application:

  1. Twitter data is ignored when not received chronologically. Due to the nature of the Twitter API, the order of the data that is received is occasionally not in chronological order when measured in seconds. In the 'proof of concept' application the tweets that arrive in the incorrect order are ignored and not processed showing incomplete data on the visualisation. For the next version of the application this needs to be addressed.

  2. Front end does not update when no data is received from the back end. On each request, the back end of the application sends most recent data, if the data that is sent is new, the front end will fill in the empty time with zero tweets sent and then update the graph. If the front end is sent the same data as it has already had, it will continue to wait for new data. This means that it handles no data for x seconds as no update for x seconds, which makes it look as if it has broken, and is no longer updating.

  3. Data appears to move vertically not horizontally. As the graph updates each data point is moved along one space as expected, but the graph does not make it clear that this is happening. To an unfamiliar user, it would probably look like the whole graph is updating each time it changes rather than what is actually happening - one data point is added to the left of the graph, and each of the other points are shifted right one space.

  4. Graph labels do not give the user enough information. Currently the graph is missing the following:

    • Both the x and y axis are missing axis titles.
    • Both the x and y axis are missing units of scale.
    • The x axis labels are extremely long and show far more information than is useful. Currently displayed is "Mon, 27 Jun 2016 05:38:18", the day, month and year are all redundant and should be removed. A better axis label would be "05:38:18", it would be much easier to differenciate each label as they could be a bigger font size, and using shorter labels would also allow the actual graph to be bigger.