Data is everywhere
Every day, you come into contact with data without realizing it. You are data yourself. Your name, age, gender, and so on are all examples of data. The number of shirts you have or coffee cups you drink daily, your social media comments, pictures, videos, or likes. The education level of your family members. The dimensions of your house. The color of your car or bike you use to visit friends. The number of trees, hospitals, and schools in a city. The travel time between home and office and between the earth and the moon. Whoever you are, wherever you go, you will find data everywhere.
What is data?
Data is derived from the Latin term datum, which means fact, making it valuable in our digital era. First, there are two main data formats: structured and unstructured.
- 1 The Economist, May 6th, 2017: The world's most valuable resource is no longer oil, but data
Unstructured vs. structured data
Structured data
Let's start with structured data. It is easy to manage, mainly in a tab, like the case used in Excel or Google Sheets, where every column is filled with the same data type. It has a predefined structure, commonly consisting of text and numbers. Customer information and financial transactions are typical examples of structured data.
Unstructured data
Unstructured data needs to be more organized and can be challenging to manage. It has no specific format or predefined structure and may contain text, images, audio, and video. Examples of unstructured data are reports, email messages, surveillance videos, etc.
Quantitative vs. qualitative data
Another critical aspect of data is whether its content is quantitative or qualitative. Quantitative data is also called numerical data because we can count, measure, and express it with numbers. A person's height or room temperature are examples of quantitative data. On the other hand, qualitative data is also called categorical data because we can group it into categories. For example, what language do you speak, and what is your favorite holiday destination?
Data context
Data context refers to the information that provides meaning to data. It includes the characteristics of the data, such as the time frame in which it was collected or its location and source. This is also called the metadata of the data. As the example showed, industry, sports, research, or business knowledge is vital in correctly interpreting data!
The value of data
Individuals and organizations use data
We all use data in our daily lives when managing our finances or planning what to do on the weekend. Some people also use data to track their health. In the simplest form, this is done by monitoring your weight. Organizations in all industries, including governmental and non-profit organizations, naturally use data to deal with various challenges.
Data in organizations
It might sound obvious, but for a commercial business, the ultimate goal of using data is primarily to increase profitability. Non-profit organizations may benefit from using data for social good or improve research on breakthrough technologies. User surveys or interviews with customers will generate customer data. This can be used for product and, ultimately, customer satisfaction.
Similarly, you can collect data from employees to improve employee happiness. Data helps achieve these goals because it enhances decision-making. By having transparency in the return on investment of different products, a business can maximize its profitability and use that information to optimize processes or spot new opportunities.
Data is a competitive advantage
It is crucial to remember how organizations leverage their data, which truly sets them apart from their competitors. Since every organization's data is unique to that organization, competitors will only be able to unlock the same insights if they have the same data available. Organizations increasingly compete to utilize their data to make good decisions instead of relying on gut feeling.
The curious case of data growth
The volume of data has grown exponentially over the last decade. In 2010, around two zettabytes of data were created, captured, consumed, and stored, compared to 40 times that amount in 2021.
1 Source: Statista
Data storage is changing
Nature has always preserved complete genetic information in the form of DNA. Thousands of years ago, people wrote their thoughts, ideas, and convictions using cave and wall paintings. Many civilizations, such as the Chinese, Egyptians, and Romans, moved to scrolls and books of papyrus or parchment to keep track of their financial systems. Punch cards were popularized in the 1890s before being replaced by magnetic tape and floppy disks. Recently, technology has allowed considerably more data to be stored on smaller media. The CD or compact disk, the hard drive, and the solid state drive drove local data storage forward. Since the emergence of the Internet, storage has become more and more centralized in data centers to balance out server utilization. Cloud storage is one of today's most popular data storage methods.
Companies are complex
As the complexity of the operations in a company increases, so does the amount of generated and stored data. Here are two more examples. 3D manufacturing companies use sensors and tools to measure beam heat, layer thickness, and structural stability. Financial institutions use data for mortgage applications and to detect fraudulent transactions.
Data wisdom
Mental models
Mental models give us a framework to understand the world around us and uncover the relationships hidden within the vast data landscape. With so much data being generated every day we must employ a mental model to simplify our understanding and focus our attention on uncovering value.
The DIKW pyramid
Enter the Data, information, knowledge, and wisdom pyramid. This pyramid, more commonly known as DIKW, highlights the journey data takes in order to become valuable wisdom. Each step of the pyramid requires refining the level below to extract more and more meaning until we finally achieve wisdom. Let's take a deeper look at each section of the pyramid.
Raw data
Data forms the base of the DIKW Pyramid. Data can be comprised of many different things. Words, numbers, dates, images, or even sounds. On its own, data isn't beneficial. To get more value from data, we need to add meaning to understand the context of the data. You probably find yourself automatically trying to add meaning to the data points. We do it all the time, and it is a source of unconscious bias.
Creating information
By taking the raw data and adding some context, we can understand more what the data means and transform it into information. Before, the number 2 meant nothing to us, but now we see that it is the age 2. Adding context revealed a whole new level of the pyramid and turned our raw data into information. You can probably piece even more together by looking at the other data points.
Knowledge is power
Information on its own isn't enough to make decisions. Instead, we need to tie these pieces of information together to understand their relationship. These links or connections between the different pieces of information allow us to add more meaning and transform our information into knowledge.
Knowledge in action
We're so close to wisdom. Everything is starting to come into focus. There is only one more step until wisdom. This might feel basic, but for more abstract data sources or problems being able to trace the path from data to wisdom can help break down big problems into manageable steps.
Achieving wisdom
Transforming knowledge into wisdom is arguably the hardest of the entire pyramid. There are many strategies for achieving this, and going heavily into detail could easily be its course. From a very high level, transforming knowledge into wisdom requires us to add more meaning to the information and understand the relationships between each piece of information. This final level of meaning allows us to make decisions and apply our knowledge to the world around us. We have achieved wisdom. Let's wrap this up by revisiting our example.
Wisdom achieved!
Because it will probably rain tonight and muddy the park, James' parents should plan alternative games for their fifteen guests. You might have been close to this when you looked at the raw data, but stepping through each level of the pyramid highlights how data alone isn't enough to make a good decision. You need data, a dash of context, a heap of meaning, and a helping understanding.
Data in decision-making
Becoming data-driven is the ultimate goal for many individuals and organizations. Understanding the role of data in decision-making is essential for becoming better data practitioners.
What is decision-making
People make decisions every day. For adults, it is estimated that they face about 35000 decisions every day. What should they eat, what should they wear, how to spend their free time, what major to pursue, and where should they live? Minor and significant choices are mixed, each with consequences and expectations. Decision-making is the process we all undergo to make the right choices at the right time. Employing data can help make a tough choice more apparent for many decisions.
From data to decision
Data-driven decision-making is a five-step process, each revealing more to drive a well-informed decision. The process begins with asking a question, gathering the correct data, preparing the data, conducting analysis, and finally making the right decision. It is important to note that this process is repeating in nature. As we make decisions, the results can fuel future decision-making. Let's explore each step in detail.
Asking the right question
The journey of a data-driven process starts with identifying the question you want to answer. This may sound easy, but it is the hardest part of the data-driven decision-making process. A good question will clarify what you are trying to answer and prevent you from creeping into other areas. Taking extra time to define your question clearly will ensure your success.
Collecting data
With your focused question in mind, the search begins for the correct data to answer this question. Data can often live in multiple locations and forms, so being deliberate with where you source data is essential. Thinking ahead to your analysis can also pay off intensely. For example, suppose you are deciding between different versions of your data. In that case, one in its raw form and one summarized by month, knowing which is most valuable for your analysis can reduce the cleaning and prep you need to do in the next step.
Preparing data
Preparing data can mean many things. Sometimes, it converts messy or low-quality data into higher-quality data through skillful manipulations. In others, it simply arranges the data into whatever expected format you need to enable your analysis. Many types of analysis have particular requirements for how data should be placed and aligned. A cleaned dataset will be ready for analysis without any additional effort or outstanding concerns. Sometimes, the data preparation phase can be the most cumbersome, taking up to 80% of the overall time for the entire decision-making process.
Analyzing data
Analyzing data is the following. This step is critical because it transforms our data into something we can make decisions with. Data analysis tools like Python, R, Tableau, Power BI, Excel, and Google Sheets allow us to perform many different kinds of analysis to find insights from data.
Making decisions
The final step is ultimately interpreting the results and making a decision. Armed with our analysis, we can balance the outcome with our knowledge of the broader subject matter to arrive at a data-driven decision. It is vital to recognize that the result of the analysis is only part of being data-driven; our personal experiences and knowledge also help drive the decision-making, and blending these two can arrive at a much more powerful decision than making it with a gut feeling alone.
Data life cycle
The data life cycle is a framework to regulate data from its collection to its use, analysis, and disposal. We will look into each step in more detail, but at a high level, the framework starts with the planning and creating or collecting data. Next is data storage and management, preferably securely and organized, typically using databases or data warehouses. Raw data often needs cleaning and processing to eliminate errors and inconsistencies and improve its quality and usefulness. Cleaned data can then be analyzed and visualized to extract insights and answer questions. To effectively communicate findings with your stakeholders, the results of the data analysis are shared with others. The final stage of the framework depends on the initial plan. Does the data need to be stored for future use, or can it safely be destroyed to ensure data privacy?
Why is the data life cycle important?
The data life cycle is vital because it can help companies to ensure data is regulated responsibly. By understanding the data life cycle, organizations can also identify potential areas for improvement in their data management practices, which can help improve their operations' efficiency and effectiveness. By following the stages of the data life cycle, organizations and researchers can ensure that they properly handle and leverage the data they collect and generate. Let's look at each step in detail.
Plan and collect
During the planning stage, a (business) question should be prepared that answers the needs of your stakeholders. It will affect other phases of the data life cycle since you'll decide on the type of required data, how it will be managed throughout its life cycle, who will be responsible for it, and how to achieve the most effective results. Whether you'll need to collect or create data from various sources, such as surveys, experiments, or sensor readings, will also be determined at this stage.
Store and manage
The collected data needs to be stored. This ensures that the data is easily accessible to the right person and can be managed appropriately over time. Additional concerns about handling PII or other sensitive data types should be addressed here.
Clean and process
Before proper data analysis can start, the data should be cleaned and processed. This may include formatting data, dealing with missing values or errors, or transforming data into a more usable form. Cleaning and processing the data often represents much effort in the entire data life cycle.
Analyze and visualize
Once data is appropriately cleaned, you can perform analyses. Data analysis is the process of getting new meaningful insights from raw data. Visualizing these insights effectively makes it easier to interpret them. Various methods are used to analyze and visualize data. They may involve statistical methods or machine learning algorithms using multiple programming languages or software tools.
Share
Doing an insightful analysis that is not used by someone else has no value. Successfully communicating your results is a vital but often overlooked step in the data life cycle. Examples of sharing insights are publishing dashboards, reports, or papers, presenting findings at conferences, or making data sets available to other colleagues or researchers.
Archive or destroy
Once you've gained and shared the required insights or answered the initial (business) question, the next and final step is to decide whether the data should be archived or destroyed. Data archiving may involve backing up the data, maintaining proper documentation, or applying digital preservation techniques to preserve the data in a usable format. In rare cases, destroying the data is critical for protecting private information from accidental loss; for example, permanently deleting the data is an option. Deletion of data also frees up resources.
Common data mistakes
People need to correct some things when working with data. Reflecting on the data life cycle framework, the most common ones include correctly defining the problem or question the data intends to answer. Not collecting enough or the wrong data makes it impossible to answer the defined question accurately. They need appropriate statistical methods or tools for the specific type of data and research question. Lastly, as seen before, the results could be communicated more effectively. Planning helps reduce mistakes, but let's discuss some examples.
Not clearly defining the problem
Say, for example, that you want to know more about the chase habits of a group of customers. Asking a question such as "Did you buy anything in the last month?" might give you a general idea of what they bought, but it is too vague to get actual insights. "Where did you make your last purchase?" or "Which payment method did you use?" might be better alternatives to get the data you need. With a straightforward question about the problem, you avoid inappropriate data collection and analysis and, ultimately, incorrect conclusions.
Insufficient or wrong data
Now, say that you're interested in the payment methods of older adults, and you collect data through an online survey. Not realizing it, most responses come from young adults, and only a few from older adults. This is an example of data bias: the reactions you get back do not represent the target audience you focused your question on. As you see, this survey data might give you some insights into the purchase habits of young adults, but it does not allow you to answer the research question. Note that data collection still needs proper cleaning or processing before analysis.
Lack of appropriate analysis
OK, so you asked the right questions and collected the correct data. Jumping to conclusions without proper analysis won't work. Say, for example, that the data tells you there was a steep decline in contactless payments. You could quickly conclude that older adults are less inclined to use this payment method. However, unknown to the researchers, there were many technical issues with payment terminals in the last week, which might explain the decrease in contactless payments. This is an example of a need for more context, which may lead to misinterpreting the results correctly. There are other reasons for data analysis mistakes, such as incorrect aggregations or calculations or confusing correlations with causation.
No clear communication of results
Finally, as mentioned before, presenting the results using clear communication is the most valuable part of the data life cycle. Not doing this does not only mean that your work has been for nothing; it could, again, lead to misunderstandings or incorrect conclusions. Various things could go wrong at this stage. For example, you could have used complex statistical techniques to analyze payment habits. Still, your manager needs to gain technical knowledge and see how the analysis is relevant to the business. Or you might cherry-pick specific data points or use misleading chart types to make your case, even if the data doesn't support the argument. Or your visualizations may need more transparent labels, legends, axes titles, or colors, increasing the chance of misinterpretations.
Embracing Data for a Smarter Future
Data surrounds us, from our daily lives to the intricate operations of businesses and organizations. It is the foundation of our modern decision-making processes, transforming raw facts into actionable insights through structured frameworks like the DIKW Pyramid and the data life cycle. As we delve deeper into the digital era, leveraging data wisely is becoming not just a competitive advantage but a necessity.
Understanding and applying data—structured or unstructured, quantitative or qualitative—can empower individuals and organizations to make more informed decisions, innovate, and solve real-world problems effectively. By mastering tools, frameworks, and principles, you can turn data into knowledge and knowledge into wisdom, ultimately shaping a brighter and more efficient future.
For those eager to dive deeper into the world of data, tools, and methodologies, explore resources like DataCamp to kickstart your journey in data analytics, visualization, and decision-making.