`archive` pseudo-vcs driver: indexing code in archives (e.g. zip, tar) without extracting files by muravjov · Pull Request #484 · hound-search/hound

muravjov · 2024-05-21T22:32:43Z

What kind of change does this PR introduce? (check at least one)

The PR fulfills these requirements:

All tests are passing?
New/updated tests are included?
If any static assets have been updated, has ui/bindata.go been regenerated?
Are there doc blocks for functions that I updated/created?

If adding a new feature, the PR's description includes:

A convincing reason for adding this feature (to avoid wasting your time, it's best to open a suggestion issue first and wait for approval before working on it)

Description:

This PR adds a new driver archive, which allows to index source code in archives (e.g. zip, tar; any that supported by https://github.com/mholt/archiver) without extracting files: while indexing, files are walked using archive API, and while searching, results are checked and snippets generated with files extracted on the fly.

A config example:

{
  "dbpath" : "db",
  "vcs-config" : {
    "git": {
      "ref" : "main"
    }
  },
  "repos" : {
    "video" : {
      "url" : "/Volumes/1tb-ext4/twitch/video.zip",
      "vcs" : "archive",
      "vcs-config" : {
        "ignored-files" : [".git"]
      },
      "url-pattern" : {
        "base-url" : "file:///Volumes/1tb-ext4/src/twitch/{path}"
      }
    }
  }
}

Some metrics:

for 160 zip files, 126GB, I got 3GB of indexes
it takes about 13 seconds for a search request to execute

muravjov · 2024-06-01T21:56:16Z

@salemhilal
would you mind to review the PR

muravjov added 6 commits May 20, 2024 03:36

vcs zip: initial commit

a9bf937

pass repo instead of repo.Url

523d8c1

indexing for archives

06d24e2

snippetting from archives

a85200e

driver naming: zip => archive

dcdab7c

unit test: index/archive_test.go

e8d4178

zslucero approved these changes Oct 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`archive` pseudo-vcs driver: indexing code in archives (e.g. zip, tar) without extracting files#484

`archive` pseudo-vcs driver: indexing code in archives (e.g. zip, tar) without extracting files#484
muravjov wants to merge 6 commits intohound-search:mainfrom
muravjov:archive

muravjov commented May 21, 2024 •

edited

Loading

Uh oh!

muravjov commented Jun 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

muravjov commented May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

muravjov commented Jun 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

muravjov commented May 21, 2024 •

edited

Loading