Skip to content

USE 455 - Skip libguides sub-pages that are root page#271

Merged
ghukill merged 3 commits intomainfrom
USE-455-improve-libguide-sub-page-handling
Mar 17, 2026
Merged

USE 455 - Skip libguides sub-pages that are root page#271
ghukill merged 3 commits intomainfrom
USE-455-improve-libguide-sub-page-handling

Conversation

@ghukill
Copy link
Copy Markdown
Contributor

@ghukill ghukill commented Mar 17, 2026

Purpose and background context

Why these changes are being introduced:

It was discovered that we had duplicates in LibGuides since adding sub-pages. This was because for guides that have sub-pages, the sub-page with position = 1 is effectively the root guide, just by a non-friendly URL.

How this addresses that need:

We can quite easily skip these guides by detecting position = 1 and name = 'Home' when analyzing the sub-page metadata from the LibGuides API.

In addition, just a bit of cleanup of regex expressions and converting an INFO --> DEBUG logging statement.

How can a reviewer manually see the effects of these changes?

Compare the previous v2 tab against the current v3 tab from this spreadsheet: https://docs.google.com/spreadsheets/d/1G0ci9V-SMvqGuT3p0xwI6u63Hr1VTy2e-cLOu60LfAk/edit?gid=1391449591#gid=1391449591.

While the total number of rows has increased, this is due to an expanded crawl scope for an unrelated issue.

What's important to note is the dramatic reduction, nearly to zero, for duplicated title column values (a good indicator of a true duplicate).

A couple of examples that are now de-duped:

  • "MIT in Popular Culture"
  • "MIT Buildings"
  • "Mathematics"
  • and so forth...

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Code review

  • Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

Why these changes are being introduced:

It was discovered that we had duplicates in LibGuides since adding sub-pages.  This
was because for guides that have sub-pages, the sub-page with "position = 1" is
effectively the root guide, just by a non-friendly URL.

How this addresses that need:

We can quite easily skip these guides by detecting "position = 1" and
"name = 'Home'" when analyzing the sub-page metadata from the LibGuides API.

Side effects of this change:
* Reduce duplication

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/USE-455
@ghukill ghukill marked this pull request as ready for review March 17, 2026 14:14
@ghukill ghukill requested a review from a team as a code owner March 17, 2026 14:14
@ghukill ghukill merged commit 3d5db86 into main Mar 17, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants