The use of artificial intelligence (AI) in companies is progressing. This is always associated with high expectations. If the results of an AI project are disappointing, this is often due to the underestimated factor of the quality of the original data.
The survey conducted by the Bitkom industry association in the spring of 2021 underlines the attention AI systems are attracting companies. Two-thirds of the companies surveyed consider AI the most important future technology, and one in four companies wants to invest in this area. One-third expects expert knowledge through artificial intelligence, which otherwise could not be obtained in this form. The expectations of predictive analytics are high. The hope lies in using fewer resources, developing new products and solutions for customers, or making processes more efficient.
It is not uncommon for disillusionment to set in after introducing a system if the AI’s predicted developments have not materialized or there does not appear to be any particular gain in knowledge.
AI Cannot Perform Miracles
Anyone who deals more intensively with artificial intelligence knows that the systems are only as good as their foundations. The underlying algorithms are developed by people who risk being subject to cognitive distortions. The ‘bias’ phenomenon has already been discussed in the media, for example, in connection with systems that deal with automated credit checks. Since people make AI, it cannot act impartially. That’s because developers’ biases flow into programming, whether intentional or unintentional. The unconscious or conscious data ethics of the developers also influence the results.
If the desired results are not achieved in an AI process, this can also be due to the quality of the input. AI is no different from image processing in this respect. She can’t turn a bad photo into a masterpiece. Neither can AI deliver a great result from a lack of input.
Determine And Improve Data Quality
When using artificial intelligence, convincing results begin with merging the data and the ETL process (Extract, Transform, Load). A good data strategy in terms of data quality is crucial here.
Criteria For Measuring Data Quality Are:
- Completeness: The concept of completeness refers to different dimensions. Data is considered complete when content and data have been completely adopted in an ETL process. The business rules of the company define when 100% completeness is achieved.
- Correctness: Put, a data set is correct if it corresponds to reality.
- Consistency: The properties of a dataset must not have any logical contradictions with each other or with other datasets within a data source.
- Uniqueness occurs when the (real) objects described by the dataset are only represented once.
- Conformance: The data must conform to the defined format.
- Validity: The data correspond to the defined value ranges.
The quality of the data can be determined and improved by rules. These rules fall into two groups:
- Formal-technical rules and
- content rules.
Formal-technical rules can be implemented easily in software whose data will form the starting point for analyses. A simple formula is sufficient to check whether a data record is complete (e.g., the quotient of the filled attributes and the number of existing features). There is also technical support for content-related rules. Plausibility checks when entering data prevent incorrect entries and thus automatically contribute to improving data quality.
The core issue is that AI systems can only deliver correct (and unbiased) results if data is available in a clean and suitable format. This is where the first fundamental mistake when using AI analysis usually lies. Based on use cases, the data quality must be measured. The results then flow into a GAP analysis that shows whether additional data needs to be collected or where there are still gaps.
AI Projects Are Not Purely IT Projects
A second major mistake, closely linked to the lack of data quality, is a misunderstanding of AI projects. It’s just not the case that this is a pure technology project. The attitude that IT and contracted service providers often set up a turnkey system consisting of technologies and software and the rest runs by itself is inevitably misleading. The need to break down existing data silos is usually understood from a technical perspective. However, it takes more to succeed in AI analytics projects than letting the data flow freely. The boundaries inherent in the metaphor of the silos must also be torn down in people’s minds. AI projects are also changing projects.
Data Owners Do Not Know What Quality Data Should Have
The aspect of data quality also accompanies this. The required data often belongs to a department (data or information owner), which is also responsible for data quality. This often leads to a misunderstanding of the need for good data quality. There are no bad intentions behind this: The people involved in the department do not see the advantages that result from this for their department. It is often seen as an ‘on top’ job that doesn’t add value. As a result, there is no motivation to increase data quality.
So it is not enough to tear down the data silos in the form of data lakes and then leave the data scientists alone with them. The departments as data owners must also be convinced that better data quality brings them advantages, i.e., real added value can be achieved.
Data Product Team Combines Technology And Corporate Culture
Therefore, successful data analytics projects are also about corporate culture and the definition of common goals. This works best when work is data-centric, i.e., the stakeholders from the departments, data scientists, and data engineers act together as a so-called data product team and work together on defining the use cases and developing specific questions. The direct project work conveys the importance of data quality more sustainably to be able to initiate measures for improvement. The focus here must always be on the wishes and requirements of the department because, ultimately, they will later have to continue to support the ongoing processes and work with the information. High data quality and thus meaningful AI analyzes can only be achieved in a team.