Issue
We have a wide web ecosystem, and currently our main form of analysis is siloed at the host domain level. How these domains connect, or how this looks once you get into a site, is less clear.
Look into how to "map" user journeys, and explore other visualization and site connection methods. Likely some form of node graphs, but exploration of other possibilities.
Proposed solution
- Parse directories/pages to create similarity scores for pages within an individual site, to create forked/branching paths
- Create bigrams of page referrers to measure site "distance" and connection points
- Node graph with stretchy connections weighted by a sort of TF-IDF measure