Skip to content

Adds webarchive support #358

Open
ianrumac wants to merge 1 commit intodevelopfrom
ir/feat/archive-v2
Open

Adds webarchive support #358
ianrumac wants to merge 1 commit intodevelopfrom
ir/feat/archive-v2

Conversation

@ianrumac
Copy link
Collaborator

@ianrumac ianrumac commented Feb 11, 2026

Note: this is a migration from the old PR due to stale downstream

Changes in this pull request

  • Adds webarchive support for instant paywall loading

Checklist

  • All unit tests pass.
  • All UI tests pass.
  • Demo project builds and runs.
  • I added/updated tests or detailed why my change isn't tested.
  • I added an entry to the CHANGELOG.md for any breaking changes, enhancements, or bug fixes.
  • I have run ktlint in the main directory and fixed any issues.
  • I have updated the SDK documentation as well as the online docs.
  • I have reviewed the contributing guide

Greptile Overview

Greptile Summary

This PR implements webarchive support for instant paywall loading by downloading, compressing, and caching HTML paywalls with their dependencies as MHTML-style archives. The implementation includes a manifest-based resource discovery system that uses regex to find relative and absolute resources in HTML, downloads them in parallel using a 64-thread pool, and compresses them into multipart archives stored on disk. When a paywall is displayed, the SDK checks if a cached archive exists and loads it directly through a custom WebViewClient that intercepts requests and serves content from the decompressed archive.

Key changes:

  • Adds ManifestDownloader with regex-based resource discovery and parallel download system
  • Implements StreamArchiveCompressor and StringArchiveCompressor for MHTML-style compression/decompression
  • Creates CachedArchiveLibrary to manage archive lifecycle with download queue
  • Extends PaywallView and SWWebView to conditionally load from archive with fallback to URL loading
  • Adds file stream operations to Storage interface for archive persistence
  • Includes comprehensive unit tests for compression/decompression logic

Issues found:

  • Syntax error in ManifestDownloader.kt:124-128 - missing return statement in if-expression will cause compilation failure
  • Logic issue in ArchiveWebClient.kt:71-76 - URL matching uses fragile .contains() logic that could match incorrect resources
  • Security concern in SWWebView.kt:199-203 - enabling universal file access creates potential vulnerability
  • Performance concern - 64-thread pool may be excessive for resource-constrained devices

Confidence Score: 2/5

  • This PR has a critical syntax error that will prevent compilation
  • Score reflects a compilation-blocking syntax error in ManifestDownloader.kt:124-128 where an if-expression is missing a return statement. Additionally, the fragile URL matching logic in ArchiveWebClient could cause runtime failures when loading archived paywalls, and the security implications of enabling universal file access need careful review.
  • Pay close attention to ManifestDownloader.kt (syntax error), ArchiveWebClient.kt (URL matching logic), and SWWebView.kt (security permissions)

Important Files Changed

Filename Overview
superwall/src/main/java/com/superwall/sdk/paywall/archive/ManifestDownloader.kt Downloads webarchive resources using 64-thread pool and regex-based resource discovery; has syntax error and performance concerns
superwall/src/main/java/com/superwall/sdk/paywall/archive/ArchiveWebClient.kt Intercepts WebView requests to serve webarchive content; URL matching uses fragile .contains() logic
superwall/src/main/java/com/superwall/sdk/paywall/view/webview/SWWebView.kt Adds webarchive loading with broad file access permissions enabled for archive playback
superwall/src/main/java/com/superwall/sdk/paywall/archive/StreamArchiveCompressor.kt Implements MHTML-style multipart compression/decompression with streaming support for webarchives
superwall/src/main/java/com/superwall/sdk/config/PaywallPreload.kt Updated to support conditional webarchive caching before paywall preloading based on manifest availability
superwall/src/main/java/com/superwall/sdk/paywall/view/PaywallView.kt Adds conditional archive loading with graceful fallback to standard URL loading

Sequence Diagram

sequenceDiagram
    participant SDK as Superwall SDK
    participant PM as PaywallManager
    participant PW as PaywallView
    participant AL as CachedArchiveLibrary
    participant MD as ManifestDownloader
    participant Net as Network/ArchiveService
    participant WV as SWWebView
    participant AC as ArchiveWebClient
    
    Note over SDK,PM: Paywall Preload Flow
    SDK->>PM: preloadAllPaywalls(config)
    PM->>AL: downloadManifest(paywallId, url, manifest)
    AL->>MD: downloadArchiveForManifest(id, manifest)
    MD->>Net: fetchRemoteFile(mainDocumentUrl)
    Net-->>MD: main HTML content
    MD->>MD: discoverRelativeResources(html)
    MD->>MD: discoverAbsoluteResources(html)
    par Parallel Downloads (64 threads)
        MD->>Net: fetchRemoteFile(resource1)
        MD->>Net: fetchRemoteFile(resource2)
        MD->>Net: fetchRemoteFile(resourceN)
    end
    Net-->>MD: all resources
    MD-->>AL: List<ArchivePart>
    AL->>AL: compressToStream(url, parts, fileStream)
    AL-->>PM: archive saved to disk
    
    Note over PW,WV: Paywall Display Flow
    SDK->>PM: getPaywallView(request)
    PM->>PW: create PaywallView
    PW->>AL: checkIfArchived(paywallId)
    alt Archive exists
        AL-->>PW: true
        PW->>AL: loadArchive(paywallId)
        AL-->>PW: DecompressedWebArchive
        PW->>WV: loadFromArchive(archive)
        WV->>AC: create ArchiveWebClient(archive)
        WV->>WV: enable file access permissions
        WV->>WV: loadUrl(OVERRIDE_PATH)
        WV->>AC: shouldInterceptRequest(url)
        AC->>AC: resolveUrlFromArchive(archive, url)
        AC-->>WV: WebResourceResponse(content)
    else No archive
        PW->>WV: setup(url)
        WV->>WV: loadUrl(paywall.url)
    end
    WV-->>PW: paywall rendered
Loading

Note: this is a migration from the old PR due to stale downstream
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

45 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +29 to +30
val dispatcher =
Executors.newFixedThreadPool(64).asCoroutineDispatcher()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64 threads may be excessive for parallel downloads - consider using a smaller pool (e.g., 16-32 threads) to avoid resource exhaustion on devices with limited capabilities

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: superwall/src/main/java/com/superwall/sdk/paywall/archive/ManifestDownloader.kt
Line: 29:30

Comment:
64 threads may be excessive for parallel downloads - consider using a smaller pool (e.g., 16-32 threads) to avoid resource exhaustion on devices with limited capabilities

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +71 to +76
archiveFile.content.find { part ->
if (url.contains("index.html")) {
part is ArchivePart.Document
} else {
part.url.contains(url)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL matching with .contains() is fragile - a URL like /about/index.html would incorrectly match when searching for /index.html. Use exact URL matching or path normalization.

Suggested change
archiveFile.content.find { part ->
if (url.contains("index.html")) {
part is ArchivePart.Document
} else {
part.url.contains(url)
}
archiveFile.content.find { part ->
if (url.contains("index.html")) {
part is ArchivePart.Document
} else {
part.url == url || part.url.endsWith(url)
}
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: superwall/src/main/java/com/superwall/sdk/paywall/archive/ArchiveWebClient.kt
Line: 71:76

Comment:
URL matching with `.contains()` is fragile - a URL like `/about/index.html` would incorrectly match when searching for `/index.html`. Use exact URL matching or path normalization.

```suggestion
            archiveFile.content.find { part ->
                if (url.contains("index.html")) {
                    part is ArchivePart.Document
                } else {
                    part.url == url || part.url.endsWith(url)
                }
            }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +199 to +203
// Enable file access only for archive loading
settings.allowFileAccess = true
settings.allowFileAccessFromFileURLs = true
settings.allowUniversalAccessFromFileURLs = true
settings.allowContentAccess = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling universal file access (allowFileAccessFromFileURLs, allowUniversalAccessFromFileURLs) creates security risks - malicious web content could potentially read local files. Ensure webarchive content is from trusted sources only.

Prompt To Fix With AI
This is a comment left during a code review.
Path: superwall/src/main/java/com/superwall/sdk/paywall/view/webview/SWWebView.kt
Line: 199:203

Comment:
Enabling universal file access (`allowFileAccessFromFileURLs`, `allowUniversalAccessFromFileURLs`) creates security risks - malicious web content could potentially read local files. Ensure webarchive content is from trusted sources only.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +36 to +46
override fun encode(content: ByteArray): String {
return if (content.size > 1) {
return try {
return Base64.encodeToString(content, Base64.CRLF)
} catch (e: Throwable) {
"ICAgIA=="
}
} else {
"ICAgIA=="
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple redundant return statements - simplify logic

Suggested change
override fun encode(content: ByteArray): String {
return if (content.size > 1) {
return try {
return Base64.encodeToString(content, Base64.CRLF)
} catch (e: Throwable) {
"ICAgIA=="
}
} else {
"ICAgIA=="
}
}
override fun encode(content: ByteArray): String {
return if (content.size > 1) {
try {
Base64.encodeToString(content, Base64.CRLF)
} catch (e: Throwable) {
"ICAgIA=="
}
} else {
"ICAgIA=="
}
}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: superwall/src/main/java/com/superwall/sdk/paywall/archive/Base64ArchiveEncoder.kt
Line: 36:46

Comment:
Multiple redundant `return` statements - simplify logic

```suggestion
    override fun encode(content: ByteArray): String {
        return if (content.size > 1) {
            try {
                Base64.encodeToString(content, Base64.CRLF)
            } catch (e: Throwable) {
                "ICAgIA=="
            }
        } else {
            "ICAgIA=="
        }
    }
```

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +124 to +128
if (relativeUrlsOnly.contains(it.url)) {
if (it.url.contains("favicon.ico")) {
"favicon.ico"
}
it.copy(url = it.url.removePrefix("https://${host.host}"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return statement causes compilation error

Suggested change
if (relativeUrlsOnly.contains(it.url)) {
if (it.url.contains("favicon.ico")) {
"favicon.ico"
}
it.copy(url = it.url.removePrefix("https://${host.host}"))
if (relativeUrlsOnly.contains(it.url)) {
if (it.url.contains("favicon.ico")) {
it.copy(url = "favicon.ico")
} else {
it.copy(url = it.url.removePrefix("https://${host.host}"))
}
} else {
it
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: superwall/src/main/java/com/superwall/sdk/paywall/archive/ManifestDownloader.kt
Line: 124:128

Comment:
Missing return statement causes compilation error

```suggestion
                    if (relativeUrlsOnly.contains(it.url)) {
                        if (it.url.contains("favicon.ico")) {
                            it.copy(url = "favicon.ico")
                        } else {
                            it.copy(url = it.url.removePrefix("https://${host.host}"))
                        }
                    } else {
                        it
                    }
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant