Skip to content

Extract Solr operations into dedicated SolrService class#34

Draft
Copilot wants to merge 2 commits intoindex_creatorsfrom
copilot/refactor-solr-operations
Draft

Extract Solr operations into dedicated SolrService class#34
Copilot wants to merge 2 commits intoindex_creatorsfrom
copilot/refactor-solr-operations

Conversation

Copy link
Contributor

Copilot AI commented Mar 6, 2026

First step in decomposing the monolithic main.py (1,446 lines): pulls all Solr-related logic out of ArcFlow into a standalone SolrService class to improve testability and separation of concerns.

New: arcflow/services/solr_service.py

Extracted from ArcFlow:

Old method New method
_get_target_agent_criteria() SolrService.get_target_agent_criteria()
_get_nontarget_agent_criteria() SolrService.get_nontarget_agent_criteria()
_execute_solr_query() SolrService.execute_query()
get_all_agents() SolrService.get_all_agents()
delete_arclight_solr_record() SolrService.delete_record()

All methods gain type hints and docstrings. Also fixes two latent bugs carried over from the original: spurious period in a log message and inconsistent ArclightArcLight capitalization, plus adds a missing return False on the exception path of delete_record().

Changes to main.py

ArcFlow.__init__() now instantiates the service:

self.solr_service = SolrService(
    solr_url=self.solr_url,
    aspace_solr_url=self.aspace_solr_url,
    logger=self.log,
    force_update=self.force_update
)

Call sites updated to self.solr_service.get_all_agents(...) and self.solr_service.delete_record(...). Net result: ~141 lines removed from main.py.

Original prompt

Overview

Refactor the monolithic arcflow/main.py (1,446 lines) by extracting Solr-related operations into a dedicated SolrService class. This is the first step in a larger refactoring effort to improve code maintainability and testability.

What to Change

1. Create Service Infrastructure

Create the services package structure:

arcflow/
├── services/
│   ├── __init__.py       (new, empty file)
│   └── solr_service.py   (new)

2. Create arcflow/services/solr_service.py

Extract the following methods from the ArcFlow class into a new SolrService class:

Methods to move (with current line numbers for reference):

  • _get_target_agent_criteria() - lines 697-714
  • _get_nontarget_agent_criteria() - lines 716-737
  • _execute_solr_query() - lines 739-791
  • get_all_agents() - lines 793-826
  • delete_arclight_solr_record() - lines 1188-1204

SolrService class structure:

import requests
import logging
from datetime import datetime, timezone
from typing import List, Dict, Optional


class SolrService:
    """
    Handles all Solr operations including querying, filtering, and record deletion.
    Supports both ArcLight Solr and ArchivesSpace Solr instances.
    """
    
    def __init__(self, solr_url: str, aspace_solr_url: str, logger: logging.Logger, force_update: bool = False):
        """
        Initialize the Solr service.
        
        Args:
            solr_url: URL of the ArcLight Solr core
            aspace_solr_url: URL of the ArchivesSpace Solr core
            logger: Logger instance for logging operations
            force_update: If True, ignore modified_since timestamps in queries
        """
        self.solr_url = solr_url
        self.aspace_solr_url = aspace_solr_url
        self.log = logger
        self.force_update = force_update
    
    def get_target_agent_criteria(self, modified_since: int = 0) -> List[str]:
        """
        Defines the Solr query criteria for "target" agents.
        These are agents we want to process.
        
        Args:
            modified_since: Unix timestamp to filter by modification time
            
        Returns:
            List of query criteria strings
        """
        # Move implementation from _get_target_agent_criteria
        pass
    
    def get_nontarget_agent_criteria(self, modified_since: int = 0) -> List[str]:
        """
        Defines the Solr query criteria for "non-target" (excluded) agents.
        This is the logical inverse of the target criteria.
        
        Args:
            modified_since: Unix timestamp to filter by modification time
            
        Returns:
            List of query criteria strings
        """
        # Move implementation from _get_nontarget_agent_criteria
        pass
    
    def execute_query(self, query_parts: List[str], solr_url: Optional[str] = None, 
                     fields: List[str] = None, indent_size: int = 0) -> List[Dict]:
        """
        A generic function to execute a query against the Solr index.
        
        Args:
            query_parts: A list of strings that will be joined with " AND "
            solr_url: Solr URL to use (defaults to self.solr_url if not provided)
            fields: List of Solr fields to return in the response
            indent_size: Indentation size for logging
            
        Returns:
            List of dictionaries, where each dictionary contains the requested fields.
            Returns an empty list on failure.
        """
        # Move implementation from _execute_solr_query
        # Default fields to ['id'] if None
        pass
    
    def get_all_agents(self, agent_types: Optional[List[str]] = None, 
                      modified_since: int = 0, indent_size: int = 0) -> List[str]:
        """
        Fetch target agent URIs from the Solr index and log non-target agents.
        
        Args:
            agent_types: List of agent types to query (defaults to person, corporate_entity, family)
            modified_since: Unix timestamp to filter by modification time
            indent_size: Indentation size for logging
            
        Returns:
            List of agent URIs for target agents
        """
        # Move implementation from get_all_agents
        pass
    
    def delete_record(self, solr_record_id: str, indent_size: int = 0) -> bool:
        """
        Delete a record from ArcLight Solr by ID.
        
        Args:
            solr_record_id: The Solr document ID to delete
            indent_size: Indentation size for logging
            
        Returns:
            True if deletion was successful, False otherwise
        """
        # Move implementation from delete_arclight_solr_record
        pass

3. Update arcflow/main.py

Import the service:

from services.solr_service import SolrService

In ArcFlow.__init__() (around line 44):

After initializing instance variables, add:

# Initialize Solr servi...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

<!-- START COPILOT CODING AGENT TIPS -->
---

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. [Learn more about Advanced Security.](https://gh.io/cca-advanced-security)

Co-authored-by: alexdryden <47127862+alexdryden@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor arcflow/main.py by extracting Solr operations Extract Solr operations into dedicated SolrService class Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants