The task here is to come up with a prototype of a machine learning system for translating statistical data (data that is already pre-processed for data visualization) into Natural language descriptions. Given its recent popularity, a system based on Tensorflow is preferred.
Initially, the prototype will focus on one or two types of statistical data: Financial data (such as stock market data) and weather data (such as a weather forecast).
The task here is NOT to parse through and analyze large datasets. The source data would be pre-processed data, formatted to be rendered as a graphical chart.
Initially, the purpose of this exercise is to create a natural language description of the data that is displayed in a chart. The resulting description will be used to help people who are blind or visually impaired understand what the chart describes, in a way similar to how a sighted person would describe the chart to someone who is blind (or how a radio journalist would explain the data/statistic).
For example, a financial journalist commenting on the Apple stock price may describe the overall trend, e.g. where there were notable highs and lows in the price, and the overall trajectory of the stock price. It is this kind of editorial natural language explanation that we will use machine learning/AI to create.
A meteorologist on the radio would explain the expected temperature fluctuations throughout the day, precipitation, humidity, wind, etc.
By having an ML/AI application for this purpose, one could scale such “editorial descriptions” of data to highly personalized datasets at a high volume.
As an initial data structure for the source data, I would like to use the popular data visualization library, Highcharts & Highstock. This approach allows the machine learning system to potentially combine machine vision/OCR with the numerical chart data, as the Highcharts API can also return a bitmap image from the source data.
My hope is, that because financial data and weather data are so domain specific, it would be easy to find both good source data and good training data (natural language descriptions) for each of the source data.
As a first prototype, the app to be built can be a simple API that takes a Highcharts-formatted json file and generates a natural language description based on that. Later iterations can allow for a variety of input sources.
Here is an example of a stock chart: https://www.highcharts.com/stock/demo/spline
And a weather chart: https://www.highcharts.com/demo/combo-meteogram#https://www.yr.no/place/United_Kingdom/England/London/forecast_hour_by_hour.xml
What I am looking for is a description on how you would go about doing this, what assets/data you would need to train the system, and what the projected time and cost would be for various stages of development to a functional prototype.
An example of a company that does this at a much more complex level would be https://narrativescience.com/
Copyright © 2020 | Truelancer.com