Starting Point: Bridging the Gap Between Data and Processes
At the beginning of 1999, I wrote a research proposal with the title "Process Design by Discovery: Harvesting Workflow Knowledge from Ad-hoc Executions". In the proposal, I defined process mining as "distilling a structured process description from a set of real executions". At the time, there were over 200 workflow management systems that all had problems supporting the dynamism of real-life processes involving people. Only highly structured and stable processes could be supported properly. Therefore, adding flexibility to workflow systems was one of the main research topics. Instead of adding more flexibility mechanisms, I proposed to use information about the actual process executions to infer a workflow model (expressed in terms of a Petri net) and use the model to automatically improve processes while they are running.
Initially, the focus of process mining was on workflow automation. Therefore, we often used the term "workflow mining" in the first couple of years. Only later, we realized that the principles can be applied to any operational process in production, logistics, healthcare, learning, government, finance, etc. However, at the time event data were only available in administrative or financial settings, i.e., the natural habitat of workflow management systems. Re-reading the 20-year-old research proposal "Process Design by Discovery: Harvesting Workflow Knowledge from Ad-hoc Executions" makes me realize that many of today's ideas in process mining and Robotic Process Automation (RPA) have been around for quite some time.
My main motivation to start working on process mining at the end of the 1990-ties was my disappointment in the practical use of simulation software and workflow management systems. Simulation and workflow management have in common that they rely on humans to make process models. However, such models typically describe only the "happy flows" and fail to capture the less frequent executions that generate most of the problems. At the time, many workflow management projects failed. Together with Hajo Reijers and several PhD and MSc students, I did a longitudinal study on the effects of workflow management systems. Only half of all implementation projects in the study succeeded in taking the workflow system into operation. When using simulation software, the process of making the model was often more insightful than the actual results. Simulations rely on too many simplifying assumptions. All these experiences show that process orientation is important, but one should connect process management to the actual evidence recorded in databases, audit trails, etc.
The above experiences naturally led to scientific challenges such as discovering process models (e.g., Petri nets) from event data. However, experts in process modeling and analysis (e.g., the Petri net, concurrency, and model checking communities) were not interested in actual data, and experts in data analysis (e.g., the statistics and data mining communities) were not interested in processes. For a very long time, there were very few researchers working on the interplay between "data science" and "process science". The situation changed only recently. The success of the First International Conference on Process Mining (ICPM) in June 2019 in Aachen illustrates this change (see https://icpmconference.org/2019/).