Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions docs/source/ocpa.algo.discovery.neo4j_discovery.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
Neo4j Discovery Functions
=========================

This document explains the usage of the Neo4j-based discovery functions
available in the `ocpa.algo.discovery.neo4j_discovery` module:

- `discover_proclet_model_neo4j`
- `discover_dfg_neo4j`

These functions enable discovery of process models directly from event logs stored in a Neo4j database.

Usage Example
-------------

.. code-block:: python

from ocpa.algo.discovery.neo4j_discovery.discover_proclet_model import discover_proclet_model_neo4j
from ocpa.algo.discovery.neo4j_discovery.discover_dfg import discover_dfg_neo4j

# Define Neo4j connection URL
url = 'bolt://neo4j:neo4jpass@localhost:7687'

# Discover Proclet model from Neo4j
proclet_neo = discover_proclet_model_neo4j(url)
print("Proclet model discovered from Neo4j:")
print(proclet_neo)

# Discover Directly Follows Graph (DFG) from Neo4j
dfg_neo = discover_dfg_neo4j(url)
print("\nDirectly Follows Graph (DFG) discovered from Neo4j:")
print(dfg_neo)


Neo4j Queries for Visual Inspection
-----------------------------------

After running the discovery functions, you can also execute the following Cypher queries
directly in your Neo4j database to visually inspect the discovered models.

1. Proclet Model Query
~~~~~~~~~~~~~~~~~~~~~~

This query retrieves activity and entity type classes, their directly-follows relationships,
and synchronization relationships from the Neo4j database:

.. code-block:: cypher

MATCH (c1:Class)
WHERE c1.Type = "activity,EntityType"
OPTIONAL MATCH (c1)-[df:DF_C]->(c2)
WHERE c1.Type = c2.Type
OPTIONAL MATCH (c1)-[sync:SYNC]->(c3)
WHERE c1.Type = c3.Type
RETURN c1, df, c2, sync, c3


2. Directly Follows Graph (DFG) Query
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This query retrieves activity classes and their directly-follows relationships:

.. code-block:: cypher

MATCH (c1:Class) WHERE c1.Type = "Activity"
OPTIONAL MATCH (c1)-[df:DF_C]->(c2)
RETURN c1, df, c2


Notes
-----

- The `url` parameter should point to your running Neo4j instance, including username and password.
- The discovered models are returned as query results that can be further processed or visualized.
- The Cypher queries provided are useful for manual inspection within the Neo4j Browser or other visualization tools.

204 changes: 192 additions & 12 deletions docs/source/ocpa.algo.filtering.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,199 @@
ocpa.algo.filtering package
===========================

Subpackages
-----------
OCPA offers various filtering techniques for object-centric event logs, allowing to select subsets of the data based on activities, objects, time, attributes, lifecycle, performance, and variants.

.. toctree::
:maxdepth: 4
Activity Filtering
__________________

ocpa.algo.filtering.graph
ocpa.algo.filtering.log
Filters an Object-Centric Event Log to retain only events corresponding to specified activities, preserving related objects and event-object relationships while removing all unrelated data. In the following example, only events for 'Create Purchase Requisition', 'Receive Goods', and 'Issue Goods Receipt' are retained.

Module contents
---------------
.. code-block:: python

from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.algo.util.filtering.log.index_based_filtering import activity_filtering

filename = "sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

filtered_using_list_of_activities = activity_filtering(
ocel,
['Create Purchase Requisition', 'Receive Goods', 'Issue Goods Receipt']
)

Activity Frequency Filtering
____________________________

Filters an Object-Centric Event Log by retaining only the most frequent activities until the specified cumulative frequency threshold is met. In the following example, activities are kept until they account for 80% of all events, and the rest are removed.

.. code-block:: python

from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.algo.util.filtering.log.index_based_filtering import activity_freq_filtering

filename = "sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

filtered_using_activity_frequencies = activity_freq_filtering(ocel, 0.8)

Object Type Filtering
_____________________

Filters an Object-Centric Event Log by retaining only specified object types and all events related to them. In the following example, only objects of types 'PURCHORD' and 'INVOICE' and their associated events are kept; all other object types and unrelated events are removed.

.. code-block:: python

from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.algo.util.filtering.log.index_based_filtering import object_type_filtering

filename = "sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

filtered_using_list_of_object_types = object_type_filtering(
ocel,
['PURCHORD', 'INVOICE']
)

Object Frequency Filtering
__________________________

Filters object types in an Object-Centric Event Log based on their frequency of participation in events, removing those whose involvement falls below a given threshold. In the example below, object types participating in less than 20% of events are filtered out.

.. code-block:: python

from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.algo.util.filtering.log.index_based_filtering import object_freq_filtering

filename = "sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

filtered_using_object_type_frequencies = object_freq_filtering(ocel, 0.2)

Time-based Filtering
____________________

Filters cases in an Object-Centric Event Log based on specified time intervals using different strategies, such as filtering by case start time, end time, full containment within the interval, or cases spanning the interval. In the example, cases starting between May 4 and July 6, 2021, are retained.

.. code-block:: python

from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from datetime import datetime
from ocpa.algo.util.filtering.log.index_based_filtering import time_filtering

filename = "sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

start = datetime.fromisoformat('2021-05-04 09:02:00+01:00')
end = datetime.fromisoformat('2021-07-06 09:00:00+01:00')

filtered_based_on_time = time_filtering(
ocel,
start,
end,
strategy_name="start" # Alternatives: "end", "contained", "spanning"
)

Event Attribute Filtering
_________________________

Filters an Object-Centric Event Log by retaining only events that match specified attribute values. In the following example, only events with the activity 'Create Purchase Order' or 'Create Purchase Requisition' are retained.

.. code-block:: python

from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.algo.util.filtering.log.index_based_filtering import event_attribute_filtering

filename = "sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

attr_filter = {"event_activity": ["Create Purchase Order", "Create Purchase Requisition"]}
filtered_based_on_event_attributes = event_attribute_filtering(ocel, attr_filter)

Object Attribute Filtering
__________________________

Filters an Object-Centric Event Log by retaining only events linked to objects that meet specified attribute cardinality conditions. In the example below, only events associated with more than two 'MATERIAL' objects and exactly one 'PURCHORD' object are retained.

.. code-block:: python

from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.algo.util.filtering.log.index_based_filtering import object_attribute_filtering

filename = "sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

vmap = {'MATERIAL': ['more than', 2], 'PURCHORD': ['exactly', 1]}
filtered_based_on_object_attributes = object_attribute_filtering(ocel, vmap)

Object Lifecycle Filtering
__________________________

Filters an Object-Centric Event Log to retain only objects of a specified type that follow a given sequence of activities. In the following example, only 'PURCHORD' objects that go through 'Create Purchase Order', 'Receive Invoice', and 'Clear Invoice' in that order are retained.

.. code-block:: python

from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.algo.util.filtering.log.index_based_filtering import object_lifecycle_filtering

filename = "sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

filtered_using_control_flow_of_objects = object_lifecycle_filtering(
ocel,
object_type="PURCHORD",
list_of_activities=["Create Purchase Order", "Receive Invoice", "Clear Invoice"]
)

Event Performance-based Filtering
________________________________

Filters an Object-Centric Event Log based on performance measures (e.g., synchronization, flow, or sojourn time), retaining only events that meet a specified condition. In the following example, only 'Create Purchase Order' events with a synchronization time of less than 24 hours are kept.

.. code-block:: python

from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.algo.util.filtering.log.index_based_filtering import event_performance_based_filtering

filename = "sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

parameters = {
'measure': 'synchronization',
'activity': 'Create Purchase Order',
'condition': lambda x: x < 86400 # 24-hour threshold
}
filtered_using_event_performance = event_performance_based_filtering(ocel, parameters)

Variant Frequency Filtering
____________________________

Filters an Object-Centric Event Log by removing infrequent variants based on the given cumulative frequency threshold. In the following example, only the most common variants that together make up 80% of the total cases are retained.

.. code-block:: python

from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.algo.util.filtering.log.index_based_filtering import variant_frequency_filtering

filename = "sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

filtered_ocel_variant_freq = variant_frequency_filtering(ocel, 0.8)

Variant Activity Sequence Filtering
___________________________________

Filters an Object-Centric Event Log to retain only process executions (variants) that contain specific activity transitions. In the following example, only executions that include the transition from 'Verify Material' to 'Plan Goods Issue' are kept.

.. code-block:: python

from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.algo.util.filtering.log.index_based_filtering import variant_activity_sequence_filtering

filename = "sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

filtered_ocel_with_act_to_act = variant_activity_sequence_filtering(
ocel,
[('Verify Material', 'Plan Goods Issue')]
)

.. automodule:: ocpa.algo.filtering
:members:
:undoc-members:
:show-inheritance:
23 changes: 13 additions & 10 deletions example-scripts/event-log-management/filtering.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,33 +10,36 @@
object_attribute_filtering,
object_lifecycle_filtering,
event_performance_based_filtering,
variant_infrequent_filtering,
variant_frequency_filtering,
variant_activity_sequence_filtering
)

filename = "../../sample_logs/jsonocel/exported-p2p-normal.jsonocel"
ocel = ocel_import_factory.apply(filename)

# 1. Filter by explicitly removing specific activities from the log
# Removes all events related to 'Create Purchase Requisition', 'Receive Goods', and 'Issue Goods Receipt'
# 1. Inclusive activity filter: Retain all events belonging to specified activities
# Preserves only 'Create Purchase Requisition', 'Receive Goods', and 'Issue Goods Receipt' activities
# along with their associated objects and relationships. All other activities are permanently removed.
filtered_using_list_of_activities = activity_filtering(
ocel,
['Create Purchase Requisition', 'Receive Goods', 'Issue Goods Receipt']
)

# 2. Filter activities by frequency - keep most frequent activities until cumulative 20% threshold
# Retains activities that together account for ≥20% of total activity occurrences
# 2. Frequency-based activity filter: Maintain most common activities covering ≥20% event coverage
# Keeps highest-frequency activities until cumulative frequency reaches 20% of total events
# Removes low-frequency activities while preserving the majority of common business process steps
filtered_using_activity_frequencies = activity_freq_filtering(ocel, 0.2)

# 3. Filter by removing specific object types and their related events
# Removes all PURCHORD and INVOICE objects and their associated events
# 3. Object-centric filter: Preserve complete lifecycle of specified object types
# Retains all events and relationships involving 'PURCHORD' (Purchase Orders) and 'INVOICE' objects
filtered_using_list_of_object_types = object_type_filtering(
ocel,
['PURCHORD', 'INVOICE']
)

# 4. Filter object types by participation frequency - remove types with <20% relative frequency
# Eliminates object types that participate in less than 20% of total object-event relationships
# 4. Participation threshold filter: Remove infrequently involved object types
# Eliminates object types participating in <20% of object-event relationships
# Maintains only object types with significant process involvement (≥20% relative frequency)
filtered_using_object_type_frequencies = object_freq_filtering(ocel, 0.2)

# 5. Temporal filtering using "start" strategy between 2021-05-04 and 2021-07-06
Expand Down Expand Up @@ -85,7 +88,7 @@

# 10. Filter infrequent variants based on cumulative frequency threshold (e.g., top 80%)
# Retains only the most frequent behavioral variants whose combined frequency reaches ≥80% of total variant occurrences
filtered_ocel_variant_freq = variant_infrequent_filtering(ocel, 0.8)
filtered_ocel_variant_freq = variant_frequency_filtering(ocel, 0.8)

# 11. Filter log by keeping only process executions that include specified activity transitions
# Retains only executions where the activity sequence ('Verify Material' → 'Plan Goods Issue') occurs
Expand Down
14 changes: 14 additions & 0 deletions example-scripts/event-log-management/neo4j_to_ocel_converter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Import the function to convert Neo4j data back into an OCEL log
from ocpa.objects.log.converter.versions.neo4j_to_ocel import neo4j_to_ocel

# Specify the Neo4j connection URL
# Format: 'bolt://<username>:<password>@<host>:<port>'
# Example assumes Neo4j is running locally with username 'neo4j' and password 'password'
url = 'bolt://neo4j:password@localhost:7687'

# Retrieve the OCEL event log from the Neo4j database
# This function queries the graph stored in Neo4j and reconstructs the OCEL event log,
# including events, objects, and their relationships.
ocel_from_neo4j = neo4j_to_ocel(url)

# The 'ocel_from_neo4j' object now holds the event log in OCEL format.
22 changes: 22 additions & 0 deletions example-scripts/event-log-management/ocel_to_neo4j_converter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Import necessary functions for loading OCEL logs and uploading to Neo4j
from ocpa.objects.log.importer.ocel import factory as ocel_import_factory
from ocpa.objects.log.converter.versions..ocel_to_neo4j import ocel_to_neo4j

# Specify the path to the OCEL log file
# This file contains the event data and object relationships in JSON-OCEL format
filename = "../../sample_logs/jsonocel/exported-p2p-normal.jsonocel"

# Load the OCEL log into a Python object
# The 'ocel' object now holds events, objects, and their relations for further processing
ocel = ocel_import_factory.apply(filename)

# Define the connection URL to the Neo4j database
# Format: 'bolt://<username>:<password>@<host>:<port>'
# Example: 'bolt://neo4j:neo4jpass@localhost:7687'
url = 'bolt://neo4j:neo4jpass@localhost:7687'

# Upload the OCEL event log into the Neo4j database
# This converts the OCEL structure into nodes (events, objects) and relationships in the graph database
# The returned 'db' object represents the connection to Neo4j, allowing further queries if needed.
# Importing classes Entity and Events is necessary for cypher querying.
db = ocel_to_neo4j(url, ocel)
17 changes: 17 additions & 0 deletions example-scripts/process-discovery/neo4j_discovery.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from ocpa.algo.discovery.neo4j_discovery.discover_proclet_model import discover_proclet_model_neo4j
from ocpa.algo.discovery.neo4j_discovery.discover_dfg import discover_dfg_neo4j

# Example usage script for Neo4j-based discovery functions

# Define Neo4j connection URL
url = 'bolt://neo4j:neo4jpass@localhost:7687'

# Discover Proclet model from Neo4j
proclet_neo = discover_proclet_model_neo4j(url)
print("Proclet model discovered from Neo4j:")
print(proclet_neo)

# Discover Directly Follows Graph (DFG) from Neo4j
dfg_neo = discover_dfg_neo4j(url)
print("\nDirectly Follows Graph (DFG) discovered from Neo4j:")
print(dfg_neo)
Loading