Logo
  • Home
  • My book
    • AI in healthcare
  • Thinking
    • Data & technology
    • Process mining
  • Research
    • Masters dissertation project
  • About
    • About me

Process mining – getting to know your data

July 26 2021
  • Process mining

Before you start doing any detailed analysis of a dataset, especially if it a new dataset or one you are not familiar with, it is a good idea to spend a bit of time getting a feel for the data. This will save you time in the longer term and help you get better overall insights from the analysis.

Early on in my career, just after I left clinical practice and started as a management consultant, Excel was my go to data analysis tool (still very useful!).

When I got hold of a new dataset, I used to (and still do), take some basic steps to get an initial view of the data including applying some filters, doing a pivot table, doing some basic calculations (and simple graphs to spot any obvious trends), etc.

The same applies to process mining. I wanted to share what I think are the 4 steps that would be most useful to take.

[I am using the great open source R package – https://www.bupar.net/]

[As an example, I have used the sepsis event log dataset. This dataset is based on real data from a hospital for patients who suffered from severe infections (sepsis)]

  1. Summary

See the high level overview.

sepsis %>% summary

An extract from the output below highlights some of the key information you would see including number of cases (i.e. patients), number of activities (i.e. register the patient, give IV fluids, give IV antibiotics, admit to hospital, discharge patient), the timeframe the dataset covers, etc.

Number of events:  15214
Number of cases:  1050
Number of traces:  846
Number of distinct activities:  16
Average trace length:  14.48952

Start eventlog:  2013-11-07 08:18:29
End eventlog:  2015-06-05 12:25:11

  1. Default path or happy path

Visualise a basic process flow model that covers a small number of events (I chose 10% below) to get a feel of the common ‘variants’.

happy_path <- sepsis %>% filter_trace_frequency(percentage=0.1)
happy_path %>% process_map(type=frequency(value=”absolute_case”))

  1. Activity presence

To identify in what percentage of cases a particular activity occurs.

sepsis %>% activity_presence() %>% plot ()

In the example below, 78% of patients receive IV antibiotics and 28% return the Emergency Room.

  1. Precedence matrix

This shows which activity (consequent) follows which other activity (antecedent) and how many times that occurs.

sepsis %>% precedence_matrix((type=”absolute”)) %>% plot()

In the example below, it’s interesting to note that a white blood cell count (Leucocytes) and CRP test (a marker for inflammation, i.e. infection) is done almost interchangeably in terms of which test was done first.

Hope this is useful.

In terms of overall methodology and approach to process mining, I read a great paper recently, will share more about this in an upcoming post!

[Thank you to Gert Janssenswillen for developing bupaR and for the excellent course on DataCamp.]

Very happy to hear your comments below or feel free to email me to share ideas – janak@usehealthdata.com

Previous Post Next Post
bupaRgettingtoknowyourdataHealthcareplanningProcess mining

Leave a Comment Cancel reply

Recent Posts

  • Waiting times in English A+E departments: What does the data say?
  • Does digital health solutions create more work for doctors and nurses?
  • How can data help improve Covid-19 vaccination uptake rates?
  • NHS waiting lists – 3 key elements to consider
  • Can learning together make health AI solutions better?

Recent Comments

    Archives

    • October 2021
    • September 2021
    • August 2021
    • July 2021

    Categories

    • Data and technology
    • Process mining
    • Thinking

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    Contact Info

    Would love to hear your thoughts and discuss ideas. Please drop me a message via:

    • Email: janak@usehealthdata.com

    Copyright 2022. Use Health Data | Logo made using https://www.onlinelogomaker.com/