The CRISP-DM or Cross Industry Standard Process for Data Mining — Is the common process to find many solutions in data science. This process has been an industry standard to analyzing data for years and has 6 major phases :
- Developping Business Understanding
- Developping Data Understanding
- Preping Your Data to be analyzed
- Modelling the Data
- Evaluating the result to answer your question of interest
- Deploying the changes based on the result of the analysis
I. Business & Data Understanding
By Business understanding, you need to understand the problem.
Are you Interested in acquiring new customers ? or prove that a new cancer treatment outperforms the other ?
All of this question falls under businnes understanding.
By Data Understanding, you need to gain an understanding of the data necessary to answer your question. wether by collecting it from nowhere or facing a huge amount of data.
What kind of data will be able to provide you the insight you need ?
II. Prepare Data
Commonly denoted as 80% of the process. From working with missing data to finding a way to work with categorical variables, looking for outliers . There is a ton more we could have done to wrangle the data, don’t forget that you have to start somewhere, and can always iterate if needed.
III. Model Data
Finally we are able to model the data, There still may be changes that could be done to improve the model we have in place. From additional feature engineering to choosing a more advanced modeling technique.
Results are the findings from our wrangling and modeling.
V. Evaluate & Deploy
Deploying can occur by moving your approach into production or by using your results to persuade others within a company to act on the results.