Auto-Generating Data Sources

The ask

A marketing and communication company came to us when they needed help to find an easier and more efficient way to generate data sources such as graphs and charts to use in articles. They wanted to find a way to use technology to auto-generate visual statistics to maximise the accuracy and efficiency of their written content.

When journalists are writing, they often need to create graphs based on information obtained from different online data sources. Therefore, the aim of this project was to detect whatever it was the journalist (or any other user) was writing about and automatically create graphs from reliable data sources related to that topic.

Visualisations are created in seconds during the writing process.

What we did

We used state-of-the-art natural language processing techniques to sort the data from the client’s written content into certain predefined topics. Once classified, the software would automatically build a chart using key information from reliable data sources. This helped simplify the research process and the task of creating data charts and graphs for articles.



The process

The first step in the process was to define a number of topics with the client that would provide the business with value. Then, we had to create a dataset from scratch as there was no pre-existing data to work with. This was extracted from a range of different media sites and labeled according to the predefined topics.

The next step was to use natural language processing techniques to obtain features that we could use as input for the model. After this data preparation and transformation, the problem was identified as a supervised classification. Because of this, we used automatic label extraction and NLP techniques in order to match the text with that label.

Once the topic of the text was defined, we implemented a service where the topic would relate to the different data sources and create the relevant graphic for the article.



Nlp allowed us to automate the research process.


As a result of this project, we were able to create a tool that was able to generate simple and standard charts from a range of topics, including employment, consumer price index, gross domestic products, etc. to produce an eye-catching data source within the article.

When the writer typed some words into the Google document, a Chrome extension would send an alert signal. This would trigger the system to identify the topic being written about, and it would then create a preview of the graph it had generated to suggest to the writer. Once accepted, the writer would then add it to their text.



