Process mining is a relatively new field of research: its final aim is to bridge the gap between data mining and business process modelling. In particular, the assumption underpinning this discipline is the availability of data coming from business process executions. In business process theory, once the process has been defined, it is possible to have a number of instances of the process running at the same time. Usually, the identification of different instances is referred to a specific “case id” field in the log exploited by process mining techniques. The software systems that support the execution of a business process, however, often do not record explicitly such information. This paper presents an approach that faces the absence of the “case id” information: we have a set of extra fields, decorating each single activity log, that are known to carry the information on the process instance. A framework is addressed, based on simple relational algebra notions, to extract the most promising case ids from the extra fields. The work is a generalization of a real business case.
In Proceedings of IEEE Symposium on Computational Intelligence and Data Mining (IEEE SSCI CIDM 2011); Paris, France; April 11-15, 2011.