Please go through all the details and guidelines of this project very carefully and then send your proposal accordingly.
This is a research-oriented assessment.
The project requires you to apply appropriate big data analytics methodologies and techniques
on a specific topic/problem related to large dataset processing, analytics and visualisation. You
need to analyse requirements, formulate solution, and implement your solution in the form of
software development (preferred programming language is R). Finally you need to present all
Module Study Guide template – May 2019 7
your work and results in a written technical report (jointly by group members).
1. Project topic
There is an initial set of proposed topics as listed below. However, it is allowed (and
encouraged) to propose alternative small projects, or variants, which will have to be
discussed with the lecturers.
Music Recommendation System
Music recommendation systems are becoming a hot topic these days due to increase in
number of online listeners to systems like Spotify. Recommending users with relevant
songs and predicting which songs will be liked by a particular user is always a very good
feature for any music application. You are to developing a music recommendation system
based on the Million Song Dataset.
Predict short term movements in stock prices
The basic assumption is that the stock price largely depends on both inside and outside
factors, where inside factor include company performance (earnings and profits),
company news (introducing new products, securing a new large contract, etc.), and
outside factor such as industry performance, investor sentiment (bull market or bear
market, news sentiments), economic environment (interest rates, economic outlook and
Twitter to predict the next best restaurant
Yelp has a data set that include restaurant rankings and reviews. One idea for this project
is to use tweets to predict restaurant star ratings. This would enable you combine Yelp
data with twitter data.
Community detection in large-scale social network
Social network is often modelled as graph. Community detection aims to identify highly
connected groups of individuals or objects inside the graph. You may explore graph
mining techniques to address the issue.
Please note the above topics are only suggestions which provide some general guidelines
on what could be the problems to be tackled. It is expected that you will take these
guidelines, and you should feel free to propose a specific topic based on your own research
2. Project tasks
The expected group work of this project will include:
Performing research on the subject topic, identifying project goal and analysing
Formulating technical solutions step by step, e.g. collecting data (public datasets),
exploring and preparing data, choosing appropriate big data models / algorithms and
applying them to your project, practising data visualisation techniques where appropriate;
Implementing the proposed solution as a software system using preferred programming
Conducting experiments with collected datasets to obtain results from your implemented
Module Study Guide template – May 2019 8
Analysis of results and evaluation of performance;
3. Project report
The structure of the group report is recommended to have the following sections, but you
may like to customise it based on your own project:
Cover page: where you should state the project topic, the name and student ID of each
member of your group.
Abstract: where you summarise the contents, outlining why and how you go about the
investigation, plus a brief summary of results and findings.
Introduction: where you introduce the topic of the investigation and provide necessary
background, motivating why it is interesting, and referencing all appropriate literature.
Model and implementation: where you go into more technical detail of the techniques
and procedures you have used, justifying appropriateness of
models/algorithms/techniques used, explaining software implementation. Referencing all
Experiments: where you describe experiments conducted, presenting experimental
results properly (e.g. practice of data visualisation where appropriate)
Analysis: where you analyse the experimental results, delving into deep technical
explanations of the results, suggesting possible improvement, etc.
Conclusions: where you summarise the investigation, stating the conclusions that the
results and analysis showed, along with suggestions of potential future work (if
Reference: where all the literature you have cited in your report can be listed.
Appendix: where the software source code you developed can be included. It is
important that you put comments in code properly.