Bad data could cost Singapore businesses millionsBy Mike Davie
Chinese New Year has come and gone, and Singapore is getting back to work after the festivities. Most executives in private companies and public organisations will have a long list of priorities from sales targets that they have to hit to company expansion and budgeting. Anything concerning poor quality data coming out of the Data Economy is unlikely to make that list. But given Singapore’s trajectory and ambitions to position itself as a global data and Artificial Intelligence (AI) leader, there are good reasons why we should push cleaning up the Data Economy higher on our list of priorities – otherwise we risk our position as a leading data hub.
As a country we are becoming increasingly reliant on data – 93% of organisations in Singapore use data for critical and automated decision-making. Singapore rightly sees AI as one of the next frontiers that will power our growth. But AI relies on data to then make decisions, so poor data will result in poor decisions. Private companies are using data at a much greater pace – especially location data – and relying on analysis of data to make serious business decisions on expansion and strategy, or to make purchases worth many millions of dollars.
Singapore’s Personal Data Protection Act (PDPA) requires an organisation ‘to make a reasonable effort to ensure that personal data collected by or on behalf of the organisation is accurate and complete’. Despite this, bad data still makes its way into organisations’ AI algorithms and decision-making thanks largely to a Data Economy that lacks transparency.
Broadly put, the Data Economy is the production, flow, purchase and sale of data. Data is created by data producers such as ride-sharing apps which emit data on the movements of passengers and drivers. This is then stored in data storage centres and often purchased by a third party who seeks to use the data for their own, separate business purposes.
Currently, if a large consumer firm wanted to know how many consumers bought its products around the world, it would have to go through a series of middlemen to gather this data. The first layer middleman would come in the form of a data aggregator. The aggregator would then gather data from another layer of middlemen, which could take the form of individual retailers, loyalty card companies, POS companies etc.
Many of these companies, though, try to hide their sources and make the process non-transparent. They do not want to show where this data comes from for either malicious reasons (data could be false, replicated) or non-malicious reasons (they either don’t know or want to protect their own sources). This is problematic for the purchaser from a trust point of view, and from a regulatory one.
Without knowing the original source of the data, it is hard for companies to understand its quality and accuracy. Additionally, we are in a world where data can be faked very easily, and inaccurate data can lead to wrong decisions. Linked to this is increasing regulation: as governments continue to crack down on personal data use they are demanding to know the provenance of data from companies.
Data, especially location data, is booming, led by our increasing use of location data-emitting apps and is becoming more accurate and usable. But little has been done to make the Data Economy more transparent. Whilst there are conversations surrounding privacy and security, few are talking about data quality.
Imagine you are a large burger chain and are planning to build a new restaurant in Singapore. You would want to plan your location based on footfall data showing where Singaporeans congregate and eat -- and would likely visit your planned outlet. You will probably spend millions on such data from a specialist firm, who in turn would purchase from another data aggregator who would source such data from many different firms – firms which make more money the more data they are able to provide.
Some of the data you receive may well be false or replicated. So, the company may invest many millions in a new restaurant in an area that it thinks has high foot traffic, only to see less-than-stellar visitor numbers. There may be an internal company analysis on why the restaurant is underperforming, no doubt focussing on the branding, pricing or size of the burgers. Few will question the accuracy of the underlying data which helped to dictate the location of the restaurant.
Now apply the same scenario to a new MRT station, or a new algorithm or a new health campaign. The negative consequences that poor quality data can cause are endless, yet few people would identify poor data as creating these problems. As we become more reliant on the data economy we need to talk about its lack of transparency and potential for fraud, and push for solutions.
Singapore is routinely positioned as one of the best places to do business thanks partly to its embrace of data and AI. Tech giants such as Alibaba, Baidu, and Tencent have investments here and Google, Amazon, and Facebook have operations in Singapore. Singapore’s National Research Foundation (NRF) launched AI Singapore (AISG), a $150m national programme to deepen Singapore’s AI capabilities. Yet AI is only as good as the data it is fed and unless we take a lead in ensuring data is authentic and accurate our AI ambitions will remain just that: an ambition.
Luckily, solutions do already exist. One example is data authentication technology which tracks data from its source and uses blockchain to stamp an indelible signature to it. This guarantees that, from the time of stamping, any change in the data will result in a misalignment to the unique signature, signalling to the buyer that the data has been changed.
The result is more accountability, trust and confidence in the Data Economy, as poor-quality data can immediately be traced back to its source. That would help curtail Click Farms and other bad actors, and create more trust as users would know that the data they have received is accurate as of the time of creation. It will also make auditing easier -- as a digital trail leads directly to the data source -- and facilitate regulatory compliance.
The importance of good quality, accurate data cannot be overstated and new innovations such as AI will not work as intended if they are fed poor quality data. For a country such as Singapore who have ambitions to be a leader in AI, a transparent Data Economy should be a priority. For organisations in Singapore, private companies or government agencies, 2019 should be the year to start looking at the source of the data they use -- and start demanding provenance and authenticity.