Table of Contents
Analytics: Data Science, Data Analysis and Predictive Analytics for Business
Analytics: Data Science, Data Analysis and Predictive Analytics for Business
Introduction
What defines the success of a business? Is it the number of people employed by the firm? Is it the sales turnover of the business? Or is it the strength of the customer base of the firm?
To answer any of the aforesaid questions, it is important that you have the required data in hand. For instance, you need to know how many staff you have employed first to assess the value contributed by them to the growth of your business. Similarly, you need to have a repository of all the customers along with details of their transactions to understand if your customer base is contributing to the success of your firm.
Why is data important? Data is important for the sustenance of a business. The most preliminary reason why it is important is because you need information to be aware of the state of affairs. For instance, if you don’t know how many units your company sells, in a month, you will never be able to state if your business is doing well or not. There are several other reasons as to why data is important. I have dealt in detail in the upcoming chapters of this book regarding the importance of data.
Just collecting data is not enough. Analyzing and putting it to use is important. Of course, if you belong to the class of people who are not worried if their business lost a customer or not, you don’t have to spend time on analyzing data. However, if this attitude persists, you will soon see the end of your business, with the growing competitors who care about the expectations of their customers. Hence, this is where predictive analytics comes into play. How you employ predictive analytics in your firm is what distinguishes your firm from the others in the market. Predictive analytics has the capacity to change the way you play your game in the market. It is capable of giving you that little edge over your competitor.
In the first chapter of this book, I have highlighted the importance of data in business. I have also highlighted how data plays an important role in increasing the efficiency of a business. In the second chapter of this book, I have mentioned the different steps involved in the process of data analysis. In the third chapter of this book, I will be taking you through the basics of predictive analytics and the various methods involved. I have mentioned the different techniques used for conducting predictive analytics in the fourth chapter of this book. In the final chapter of this book, you will see how predictive analytics is being employed in different fields. You will truly appreciate its flexibility, seeing how it can be used in finance as well as in medicine, in operations as well as in marketing.
The fields of data analysis and predictive analytics are so vast and there are so many sub branches to these fields, which are even more extensive. Hence, I have covered only the fundamentals of these fields in this book.
I hope you truly enjoy this book. I am sure you will be motivated to manage your data better and employ predictive analytics in your business and reap the maximum benefits. Thank you for purchasing this book.
Chapter 1: Importance of Data in Business
Even though it is the service provided or the goods manufactured by a company, which helps it in establishing a niche for itself in the market, data plays a crucial role in sustaining the success. In today’s technology driven world, information can make or break a business. For instance, there are businesses that have disappeared in such a short time, because they failed to gauge their progress or customer base. On the other hand, we also have start ups that have been doing extremely well because of the increased importance they show towards numbers and the expectations of their customer base.
What is the source of data?
By data, it could refer to the sales figures or the feedback from the customers or the demand for the product or service. Some of the sources of data for a company are as follows:
Transactional data: 
This could be pulled out from your ledger, sales reports and web payment transactions. If you have a customer relationship management system in place, you will also be able to take stock of how your customers are spending on your products.
Online engagement reporting: 
This is data pulled out based on the interaction of customers on your website. There are tools available such as Crazy egg and Google analytics, which can help you to collect data from your website.
Social media: 
Social networking sites such as Twitter, Facebook and LinkedIn also provide insights on the customer traffic on your page. You can also use these platforms to conduct a cost effective survey on the tastes and preferences of the customers and use it to improve their products or services.
How can data improve your business?
Data can improve the efficiency of your business in many ways. Here is a taste of how data can play an important role in upping your game.
Improving your marketing strategies: 
Based on the data collected, it is easier for the company to come up with innovative and attractive marketing strategies. It is easier for the company to alter existing marketing strategies and policies in such a fashion that it is in line with the current trends and customer expectations.
Identifying pain points
If your business is driven by predetermined processes and patterns, then data can help you identify any deviations from the usual. These small deviations could be the reason behind the sudden decrease in sales or increase in customer complaints or decrease in productivity. With the help of data, you will be able to catch these little mishaps early and take corrective actions.
Detecting fraud 
When you have the numbers in hand, it will be easier for you to detect any fraud that is being committed. For instance, when you have the purchase invoice of 100 units of tins and when you see, from your sales reports, that only 90 tins have been sold out and you are missing ten tins from your inventory, you know where to look. Most companies are being silent victims of fraud because they are not aware of the fraud being committed in the first place. One important reason for that is the absence of proper data management, which could have helped them detect fraud easily in the early stages.
Improving customer experience 
As I mentioned before, data also includes the feedback provided by customers. Based on their feedback, you will be able to work on areas, which can help you improve the quality of your product or service and thereby satisfy the customer. Similarly, when you have a repository of customer feedback, you will be able to customize your product or service in a better fashion. For instance, there are companies, which send out customized personal emails to the customers. This just sends out a message that the company genuinely cares about its customers and would like to satisfy them. This is possible solely because of the effective data management.
Decision making 
Data is very crucial for making important business decisions. For instance, if you want to launch a new product in the market, it is important that you first collect data about the current trends in the market, the size of the consumer base, the pricing of the competitors etc. If the decisions taken by the company are not driven by data, then it could cost the company a lot. For instance, if your company decides to launch a product, without taking into consideration the price of the competitor’s product, then there is a possibility that your product might be overpriced. As is the case with most overpriced products, the company would have trouble in increasing the sales figures.
When I say decisions, I don’t really just refer to the decisions pertaining to the product or service offered by the company. Data can also be useful in taking decisions with respect to the functioning of the departments, manpower management etc. For instance, data can help you assess how many personnel are required for the effective functioning of a department, in line with the business requirements. This information can help you to decide if a certain department is overstaffed or understaffed.
Hence, data is very crucial in aiding businesses to take effective decisions.
These are some of the reasons why data is crucial for the effective functioning of a business. Now that we have had a glance at the importance of data, let us get into the other aspects of data analysis in the upcoming chapters.
Chapter 2: Process of Data Analysis
What is data analysis?
Steps involved in data analysis
Even though the data requirements may not be the same for every company, most of the below steps are common for all companies.
Step 1: Decide on the objectives
The first step in the data analysis process is the setting of objectives. It is important that you set clear, measurable and concise objectives. These objectives can be in the form of questions. For instance, your company’s products are finding it difficult to get off the shelves because of a competitor’s products. The questions that you might ask are, “Is my product overpriced?” “What is unique about the competitor’s product?” “What is the target audience for the competitor’s product?” “Is my process or technology redundant?”
Why is asking these questions upfront important? This is because your data collection depends on the kind of questions you ask. For instance, if your question is, “What is unique about the competitor’s product?” you will have to collect feedback from the consumers about what they like in the product as well as do an analysis on the specifications of the product. On the other hand, if your question is, “Is my process or technology redundant?” you will have to do an audit of the existing processes and technologies used at your establishment as well as do a survey about the technology used by the others in the same industry. As you can see from this, the nature of data collected differs vastly based on the kind of questions you ask. Since data analysis is a tedious process, it is necessary that you do not waste the time of your data science team in collecting useless data. Ask your questions right!
Step 2: Set measurement priorities
Now that you have decided your objectives, you need to establish measurement priorities next. This is done through the following two stages:
Decide what to measure
This is when you have to decide what kind of data you need to answer your question. For example, if your question pertains to reducing the number of jobs, without compromising the quality of your product or service, then the data that you need in hand right now are:
- The number of staff employed by the company.
- The cost of employing the present number of staff.
- The percentage of time and efforts spent by the current staff on the existing processes.
Once you have the above data, you will have to ask other questions ancillary to the primary question such as, “Are my staff not being utilized to their fullest potential?” “Is there any process that can be altered to improve the productivity of the staff?” “Will the company be in a position to meet increased demands in the future despite the downsizing of manpower?”
These ancillary questions are as important as the primary objective/question. The data collected in connection with these ancillary questions will help you in taking better decisions.
Decide how to measure
It is highly important that you decide the parameters to measure your data before you begin collecting it. This is because how you measure your data plays an important role in analyzing the collected data in the later stages. Some of the questions you need to ask yourself at this stage are:
- What is the time frame within which I should complete the analysis?
- What is the unit of measure? For instance, if your product has international markets and you are required to determine the pricing of your product, you need to arrive at the base price using a certain currency and extrapolate it accordingly. In this case, choosing that base currency is the solution.
- What are the factors that you need to include? This could again depend on the question you have asked in stage 1. In the case of staff downsizing question, you need to decide on what factors you need to take into consideration with respect to the cost of employment. You need to decide whether you will be taking the gross salary package into consideration or the net annual salary drawn by the employee.
Step 3: Collection of data
The next step in the data analysis process is the collection of data. Now that you have already set your priorities and measurement parameters, it will be easier for you to collect data in a phased manner. Here are a few pointers that you need to bear in mind before you collect data:
- We already saw the different sources of data in the previous chapter. Before you collect data, take stock of the data available. For example, in the case of the downsizing of the staff case, to know the number of employees available, you can just look at the payroll and get the numbers. This could save you the time of collecting this particular data again. Similarly, collate all available information.
- If you intend to collect information from external sources in the form of a questionnaire, then spend a good amount of time in deciding the questions that you want to ask the others. Only when you are satisfied that the questionnaire looks satisfactory and serves your primary objective, should you circulate it. If you keep circulating different questionnaires, then you will have heterogeneous data in hand, which will not be possible to compare.
- Ensure that you have proper logs as and when you enter the data collected. This could help you analyze the trends in the market. For instance, let us assume you are conducting a survey regarding your product over a period of two months. You will note that the shopping habits of people change drastically during holiday seasons than any other period of the year. When you don’t include the date and time in your data log, you will end up with superfluous figures and this will affect your decisions in a grave fashion.
- Check how much budget is allocated for the purpose of data collection. Based on the available budget, you will be able to identify the methods of data collection that are cost effective. For instance, if you have a tight budget and you still have to do a survey to gauge the preferences of your customers, you can opt for free online survey tools as opposed to printed questionnaires that are included in the packages. Similarly, you can make the best use of social networking sites to conduct mini surveys and collect the data required. On the other hand, if you have enough budget, you can go for printed and attractive questionnaires that can be circulated along with the package or can be distributed at retail outlets. You can set up drop boxes at nearby cafes and malls for the customers to drop these filled in questionnaires. You can also organize contests to collect data as well as market your product in one go.
Step 4: Data cleaning
Now, it is not necessary that the data you have collected will be readily usable. This is why data cleaning is very crucial in this process to ensure that meaningless data does not find its way into the analysis stage. For example, when you correct the spelling mistakes in the collected questionnaires and feed it into your system, it is nothing but data cleaning. When you have junk data in the system, it will affect the quality of your decision. For instance, let us assume 50 out of 100 people responded to your questionnaires. However, you get ten incomplete questionnaire forms. You cannot count these ten forms for the purpose of analysis. In reality, you have gotten only 40% response to your questions and not 50%. These numbers make a big difference for the management to make decisions.
Similarly, if you are doing a region wise survey, you need to be extra careful at this stage. This is because most people have a tendency to not fill in their correct addresses in the questionnaires. Hence, unless you have a fair idea about the population of each region, you will never be able to catch these slip-ups. Why is it important to catch these mistakes? Let’s assume that your survey results show that a majority (say 70%) of your customer base is from X region. In reality, the population of the region is not even close to 30% of your customer base. Now, let us assume that you solely decide to make a decision based on your survey results. You decide to launch an exclusive marketing drive in this region. Of course, the marketing drive will not improve your sales because even if all the citizens in the region buy your product, they have contributed to only 30% of your customer base and not 70% as you imagined. Hence, these little numbers play an important role when it comes to big and expensive decisions.
As you can see, improving the quality of data is highly important for taking better decisions. Since this involves a lot of time, you should automate this process. For instance, to detect fake addresses, you can get the computer to detect those entries that have incorrect or incomplete Zip code. This could be easily done if you are using an Excel to store your data. Alternatively, if you have customized software for feeding and storing data, you can get in touch with the software designer to put certain algorithms in place to take care of such data.
Step 5: Analysis of data
Now that you have collected the requisite data, it is time to process it. You may resort to different techniques to analyze your data. Some of the techniques are as follows:
Exploratory data analysis:
This is a method by which data sets are analyzed with a view to summarize their distinct characteristics. This method was developed by John W. Tukey. According to him, there was too much importance that was being shown towards statistical hypothesis testing, which is nothing but confirmatory data analysis. He felt the need to use data for the purpose of testing hypotheses. The key objectives of exploratory data analysis are as follows:
- Suggestion of hypotheses in connection with the causes of the phenomena under question.
- Assessing the assumptions on which the statistical inference will be based.
- Supporting the choosing of appropriate statistical techniques and tools.
- Providing a basis for further data collection through modes such as surveys or experiments.
Several techniques prescribed by the exploratory data analysis have been put to extensive use in the fields of data mining and data analysis. These techniques also form part of certain curriculum to induce statistical thinking in students. As you perform explorative data analysis, you will be required to clean up more data. In some cases, you will be required to collect more data to complete the analysis. This is to ensure that the analysis is backed by meaningful and complete data.
Descriptive statistics:
This is another method of data analysis. By this method, data is analyzed to identify and describe the main features or characteristics of the data collected. This is different from inferential statistics. Under inferential statistics, the data collected is analyzed to learn more about the sample. These findings are then extrapolated to the general population based on the sample. On the other hand, descriptive statistics aims to only summarize and describe the data collected. These observations about the data collected can be either quantitative or visual. These summaries could just be the beginning of your data analysis process. These could form the basis on which further analysis is done to process the data. To understand better, let us look at an example. The shooting percentage in the game of basketball is nothing but a descriptive statistic. This shooting percentage indicates the performance of the team. It is calculated by dividing the number of shots made, by the number of shots taken. For instance, if a basketball player’s shooting percentage is 50%, it means that he makes one shot in every two. Other tools used under descriptive statistics include mean, median, mode, range, variance, standard deviation etc.
Data visualization:
As the name suggests, data visualization is nothing but the representation of data in a visual form. This can be done with the help of tools such as plots, informational graphics, statistical graphics, charts and tables. The objective of data visualization is to communicate the data in an effective fashion. When you are able to represent data effectively in a visual form, it helps in the analysis of data and also to reason about data and evidence. Even complex data can be understood and analyzed by people, when put in visual form. These visual representations also facilitate easy comparison. For instance, if you are vested with the job of reviewing the performance of your product and your competitor’s product, you will be able to do so easily if all the related data are represented in visual form. All your data team needs to do is data pertaining to the parameters such as price; number of units sold, specifications etc, and put it in pictorial form. This way, you will be able to assess the raw data easily. You will also be able to establish correlation between the different parameters and make decisions accordingly. For instance, if you notice that your price is higher than your competitor’s and your sales are lower than your competitor’s, then you know where the problem lies. The decreased sales can be attributed to the increase in price. This can be set aside easily by reworking your prices.
Apart from these three major methods, you can also take the help of software available in the market. Some of the prominent software available in the market for the purpose of data analysis are Minitab, Stata and Visio. Let us not forget the multipurpose Excel.
Step 6: Interpreting the results
Once you have analyzed your data, it is now time to interpret your results. Once you have analyzed the data, here are a few questions that you need to ask:
- Does the analyzed data answer your key question? If yes, how so?
- If there were any objections to begin with, did your data help you defend them? If yes, how so?
- Do you think there are any limitations to your results? Are there any angles that you haven’t considered while setting priorities?
If the analyzed data satisfies all the above questions, then your analyzed data is final. This information can now be used for the purpose of decision-making.
