I require a high level system architecture for big data and analytics reporting.
This should include Spark 3, I do not want to use hadoop.
our current high level process is as follows;
1. Containerised spiders scrape websites, csv text files are imported into MySql (done in containers)
2. Data cleaned and processed in a MySQL db.
3. reporting data is output from MySQL to a MongoDB collection, indexed with ElasticSearch this for fast data retrieval to the analytics UI.
What I am looking for in a Architecture:
1. Simple and cheap - the more different components required the more complex it will be to develop and support.
> my preference is to use spark3 without hadoop.
> semi-cloud. mostly we will use dedicated servers, some cloud servers maybe added.
2. Should included failover in case any database or server or service goes down.
3. the ability to easily scale, add and configure new servers(linux). or if required quickly use a cloud service.
4. our current system works fine, but we are looking to provide additional services later so want a more modern design, that is simple to change.
I am looking for someone that can provide a modern architecture, that is simple and can back up the design reasons for selecting each component over others.- these reasons must be provided.
If you also have the development skills to help develop this, we can also discuss setting up a prototype/poc.
We are a new start-up, new to truelancer, this is the first job posting.
Will to consider hourly or fixed price. but if you understand Big Data, this job should be quick.