Skip to content

Main v2#30

Open
tgodzik wants to merge 306 commits intomainfrom
main-v2
Open

Main v2#30
tgodzik wants to merge 306 commits intomainfrom
main-v2

Conversation

@tgodzik
Copy link
Copy Markdown
Owner

@tgodzik tgodzik commented Jan 21, 2026

No description provided.

olaf-geirsson_data and others added 30 commits September 10, 2025 11:39
I use Mise to manage my Java installations and this file makes Mise use
the right JVM in the Metals repo.

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
The publish job didn't find any suitable workers because of forgotten
labels field.
…calameta#70)

Previously, workspace/symbol only searched for symbols of synced targets
via BSP. This meant that the feature didn't work at all until you
established a BSP connection, and until Metals had finished indexing the
synced targets.

Now, Metals implements workspace/symbols with a new index that is built
directly from reading Scala/Java files that are checked into Git. It
uses a new LMDB persistent key-value store to make this indexing step
incremental so that it's reasonably fast even in a large monorepo.

This change makes Metals workspace/symbol behave more similarly to other
LSP servers
like tsserver that only search for symbols that defined in the
*workspace* (not from external deps like node_modules/). Users who want
the old behavior can still access it through the custom "Metals: Search
symbol in workspace" command, which we should probably rename to
"Metals: Search Symbol From BSP" or something like that.

## Test Plan
There are e2e integration tests that run the indexer against an actual
git repo to assert the indexing is truly incremental, it invalidates
cache entries from deleted files, etc.

The manual testing plan for me is to add the following
`.vscode/settings.json`
```jsonc
  "metals.serverVersion": "1.5.1-SNAPSHOT",
  "metals.serverProperties": [
    // The final index is "only" ~300mb in memory, but the indexing step can use up more memory temporarily.
    // Metals must have more than 2gb ram even if only temporarily, we're dealing with a large build.
    "-Xmx6G",
    "-XX:+UseZGC",
    "-XX:ZUncommitDelay=30",
    "-XX:ZCollectionInterval=5",
    "-XX:+IgnoreUnrecognizedVMOptions",
    "-Dmetals.statistics=workspace-symbol",
    "-Djol.magicFieldOffset=true",
    "-Djol.tryWithSudo=true",
    "-Djdk.attach.allowAttachSelf",
    "-Dmetals.workspace-symbol-provider=mbt",
    "--add-opens=java.base/java.nio=ALL-UNNAMED",
    "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED",
  ],
```

The `--add-opens` settings are necessary for LMDB to work, and we have
to update metals-vscode to always include them. The jol settings are
needed if you enable `-Dmetals.statistics=memory`, which I used to
confirm that the index consumes ~270mb of memory to hold the git OIDs
and the bloom filters.

After publishing locally, open Metals in the universe repo in Cursor,
open the integrated terminal and run
```
$ tail -f .metals/metals.log | grep workspace/symbol
```
The command above shows logs related to indexing
```
$ tail -f .metals/metals.log | grep workspace/symbol
2025.09.11 13:27:06 INFO  workspace/symbol indexing started
2025.09.11 13:27:08 INFO  time: workspace/symbol discovered 85,391 files in 2.08s
2025.09.11 13:27:10 INFO  time: workspace/symbol 0 cold files, 85391 hot files, discovered in 1.17s
```
Use "Go to Symbol in Workspace" after it has completed indexing and
confirm that it searches the full repo, Java (classes) and Scala
(classes, vals, defs, vars).

## Release Plan

This feature is disabled by default. To start with, users have to add
the server property `-Dmetals.workspace-symbol-provider=mbt` along with
the `--add-opens` and `-Xmx` settings above, and reload Cursor. This
allows us to slowly roll out the functionality, first test it with the
team before enabling by default for the entire organization.

We need to update metals-vscode to include the necessary VM options by
default before we do a bigger rollout.

## Known Limitations - Followups

- We only index the workspace once when the server starts. We should add
a custom `metals/git/didChangeCurrentCommit` notification that triggers
a re-index when the user commits or rebases, etc.
- Incorporate dirty files into the index by listening to
`textDocument/didChange`
- Improve ranking. For example, we could use the imports in the current
file to boost symbols with matching package names.

---------

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
I am cleaning up my workspace/symbol PR for review and spotted several
unrelated warnings that make it hard to know which warnings come from my
PR or existed before.

---------

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
I'm not sure why the CI was green in the PR
https://github.com/REDACTED_ORG/metals/pull/70 when the YAML config
had problems. Either way, this PR should get it green again.

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
I just setup another clone of Metals and re-did these steps so it felt
like a good time to document it.

---------

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
The CI is releasing versions like 0.0.0-blah and I suspect it's because
we're not fetching the full git repo, and sbt-dynver needs more git tags
to compute the correct version.

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
Previously, there was no easy way to gate new functionality behind
feature flags. It was also not easy to track metrics like the time spent
on indexing.

Now, there are new `scala.meta.infra.{MonitoringClient,
FeatureFlagProvider}` Java interfaces that anyone can implement and
dynamically register via JDK service loaders.

This PR inlines the internal implementations of the Databricks clients
for feature-flags and monitoring so that they are always enabled.
However, none of the code in Metals directly depends on the Databricks
classes, all requests go via the agnostic Java interfaces. This means we
can later move the `com.databricks.*` code out of the repo and
dynamically register it somehow via the metals-vscode extension.

We can upstream the `scala.meta.infra` code and document on
scalameta.org/metals for how companies can dynamically register their
own clients for feature flags and metrics monitoring.

To demonstrate how the clients work in action, this PR adds one metric
and one feature flag:

- Metric: the time it took to index the workspace
- Feature flag: whether to follow symlinks

The combination of these allows us to measure the performance impact of
enabling/disabling the performance optimization from
https://github.com/REDACTED_ORG/metals/pull/61

---------

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
Claude Code helped me figure out why the pre-push hook isn't running on
arca for me.
The call to mkString is inherited from generic collections and performs
an expensive iteration over each character, instead of using
`StringBuilder` directly. Additionally, the keyword logic can be spedup
by a hash map lookup instead of a pattern match. Also removed unused
positions and made Enum and Record "real" tokens.

Naively interpreting the existing benchmark (4.9 mLOC of Java), this
gives a speedup of over 30%.

```
Before:

[info] Benchmark                        Mode  Cnt  Score   Error  Units
[info] MetalsBench.toplevelJavaMtags      ss    2  3.248           s/op
[info] MetalsBench.toplevelsScalaIndex    ss    2  0.448           s/op


After:

[info] Benchmark                        Mode  Cnt  Score   Error  Units
[info] MetalsBench.toplevelJavaMtags      ss    2  2.233           s/op
[info] MetalsBench.toplevelsScalaIndex    ss    2  0.304           s/op
```
Don't box a boolean for every character in the scanner and remove the
tuple creation. This removes about 10% of runtime:

```
Before:

[info] Benchmark                        Mode  Cnt  Score   Error  Units
[info] MetalsBench.toplevelJavaMtags      ss    2  2.233           s/op

After:

[info] Benchmark                              Mode  Cnt  Score   Error  Units
[info] MetalsBench.toplevelJavaMtags            ss    2  2.074           s/op
This prints one line per indexed jars, but sometimes we index thousands of jars, so this generate a lot a logs that are not really useful to users. Downgrade to debug level
Previously, Metals only indexed Scala and Java files for fuzzy symbol
search (aka. workspace/symbol). Now, we additionally index `*.proto`
files, which is especially helpful when you want to navigate from Scala
code using generated Protobuf classes but you actually want to navigate
to the Protobuf code instead. Instead of using Goto-definition, you can
pu your cursor over the symbol and "Go to symbol in workspace" and it
will pre-populate you query with the word at your cursor.

*DISCLAIMER*: almost all the protobuf tokenizing and outlining code is
AI generated. I only wrote the first basic tests and told it to mirror
the Scala/Java indexers.

---------

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
It took me a while to figure out the steps to fetch the releases that we
publish on every commit. I am planning to do e2e tests tomorrow with
these releases instead of locally build jars.

---------

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
Previously, we only reported a single metric "index workspace". Now, we
report a lot more metrics especially related to indexing performance.
We're also tracking workspace/symbol usage since I want/hope to see an
uptick in usage.

Also:

- More descriptive labels for the long-running "Expanding working set"
task
- A proper integration tests for telemetry (feature flags and metrics)

---------

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
…calameta#85)

Previously, when the Metals was started with the environment variable
`COURSIER_REPOSITORIES`, Metals would correctly parse the repos and
associate credentials from `~/.config/coursier/credentials.properties`
*but* resolution still failed because we used `.addRepositories()`
instead of `.withRepositories()`. The root problem is that Coursier adds
the same repos but with no credentials and they appeared at the front of
the list. Now, we flip the order so that our custom repos with valid
credentials appear at the front of the list allowing the resolution to
work as expected.

Here are the steps I used to reproduce the issue,
```diff
--- a/build.sbt
+++ b/build.sbt
@@ -538,7 +538,7 @@ lazy val metals = project
     buildInfoKeys := Seq[BuildInfoKey](
       "localSnapshotVersion" -> localSnapshotVersion,
-      "metalsVersion" -> version.value,
+      "metalsVersion" -> "1.5.1-DATABRICKS-8-8-d26262d7", //  version.value,
       "mdocVersion" -> V.mdoc,
       "bspVersion" -> V.bsp,
```

This diff allows you to run a local build of Metals but it will try to
resolve mtags from the remote. I had these settings.json
```jsonc
  "metals.serverVersion": "1.5.1-SNAPSHOT",
  "metals.customRepositories": [
    "central",
    "ivy2local",
    "https://maven.pkg.github.com/REDACTED_ORG/metals"
  ],
```

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
Previously, we submitted metrics, got a 200 OK response, but the server
was erroring because the data we posted was invalid (missing `buckets`
and `description` fields). Now, Metals submits an `Event` instead, which
can be queried from go/cl like this
```sql
select
  avg(tags['durationMillis']) as avg_duration,
  percentile(tags['durationMillis'], 0.9) as p90_duration,
  percentile(tags['durationMillis'], 0.5) as p50_duration
from
  main.eng_devtools.raw_metrics
where
  date >= '2025-09-16'
  and service = 'metals'
```

---------

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
These together bring down the indexing/importing time close to 15s for a
large project. Further optimization is possible but we reached the point
of diminishing returns. Each of the items below amount to a couple of
seconds for the repro in the ticket, but I don't have individual
benchmarks.

- avoid expensive URI parsing for file URIs
- avoid expensive Json parsing by caching type adapters for maven
dependency info
- avoid expensive TrieMap updates when the value didn't change while
building inverted indexes for source jars

Example log from importing the repro project:
```
2025.09.19 11:02:12 INFO  time: indexed library sources for main in 7.52s
..
2025.09.19 11:02:13 INFO  time: indexed workspace in 15s
```

---------

Co-authored-by: Olaf Geirsson <olaf.geirsson@databricks.com>
The feature flag provider flipped the boolean logic, we test if
`FOLLOW_SYMLINKS` was enabled while the actual flag name was
`databricks.internaltools.metals-no-follow-symlinks`. Now, we can
declare a flag as "negated" to flip the final boolean result.

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
Previously, local 1.5.1-SNAPSHOT builds failed on the latest canary
image in arca because we set the repo `ivy2Local` instead of `ivy2local`
(all lowercase). I'm not sure where exactly Coursier compares it in a
case-insensitive way because I copied the original code from how it
parses repos, but I suspect it's somewhere in the bootstrap launcher.
Either way, we now handle this special case.

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
Previously, the mbt-based workspace symbol solution was gated behind a
system property `-Dmetals.workspace-symbol-provider=mbt`. Now, it's a
user-setting that can be edited in `settings.json`, which Metals
dynamically reacts to.
```jsonc
  "metals.workspaceSymbolProvider": "mbt", // or "bsp"
```
Also, if the config is not set, then Metals now defaults to mbt if the
`MBT_WORKSPACE_SYMBOL_PROVIDER` feature flag is enabled.

Also:
- If the process does not have the necessary VM options to use LMDB,
Metals now logs a helpful warning message with instructions on how to
fix the problem, and it defaults to bsp.
- The old system property still works thanks to how Metals parses
config.

## Testing plan

- With no custom settings or VM options, it defaults to bsp and logs a
warning
- With correct VM options and you're in the feature flag, it defaults to
mbt
- You can always explicitly override the setting
- Metals dynamically reacts to updates to the setting (it will even
re-index with mbt when it changes)

---------

Co-authored-by: Olafur Geirsson <olafurpg@gmail.com>
Makes it easier to add a distinguisher without having to know which oss version this release is based on.
…ig (scalameta#104)

When the user config specifies the preferred build server, use it when multiple build servers are detected but one of them is the preferred one.
olafurpg and others added 29 commits January 7, 2026 19:48
…calameta#342)

Previously, I would regularly see spurious errors in code using type
variables when using turbine-classpath mode. This was frequent enough to
be a blocker for the initial rollout. This PR should fix the issue. I
spent a long time trying to get a minimized reproduction going but
couldn't. However, the ManualSuite shows a production reproduction of
the issue, and I manually confirmed it failed before and passed after.
I also manually confirmed that TurbineManualSuite still passes (runs in
5 XXL repos).
Previously, the only way to add navigation for external dependencies was
to connect with a BSP server. This introduced a high barrier for people
to integrate with Metals. For example, just to support basic jump-to-def
for JBang scripts, JBang would have to implement the full BSP spec via
JSON-RPC. Now, with this change, JBang only needs to create a basic
`.metals/mbt.json` file and all the code nav will more or less work out
of the box.

The format of the `mbt.json` will eventually have to evolve to support
more advanced features like generated code on the sourcepath. For now,
it only supports declaring external dependencies like this:
```
{
  "dependencyModules": [
    {
      "id": "com.google.guava:guava:33.5.0-jre",
      "jar": "/path/to/guava.jar",
      "sources": "/path/to/guava-sources.jar"
    }
  ]
}
```
Previously, to test Metals v2, you had to add about 20 lines of JSON
config to use the new features in v2. This commit flips the defaults for
most of them so that you only need to specify the desired Metals
version and everything else works out of the box.
Now that we flipped the defaults, some tests started failing if they
relied on the old defaults. Nothing serious there, we can later fix the
tests.
Fixes scalameta#8099

Previouly, ImplementationProvider assumed that `TextDocument.uri` was a
relative path (`a/b/c/D.scala`) because that's what the scala-2
SemanticDB provider does. The new Java SemanticDB provider in v2 uses
absolute URIs. This commit is a workaround the root cause of the
problem, and the long-term fix is to emit consistent absolute (or
relative) URIs across Java and Scala. However, we don't use
ImplementationProvider anymore in v2 so we should also consider removing
it eventually.
Previously, Metals 2.0.0-M2 supported reading build metadata from
`.metals/mbt.json` but there was no straightforward way to generate this
file for Gradle/Maven codebases. This commit adds a script to the repo
that seems to work OK. This is not a super robust long-term solution but
it unblocks further progress for now. Ideally, I expect users to
manually download this script from the repo and run it themselves, and
use AI to extend it to fix bugs etc.
This replaces the current Find-References implementation, that works
only for synced targets, with a workspace-wide find-refs. It integrates
using existing "mtags" indexers and is less efficient than it could be,
but it fills a huge gap until we can rework it on firmer bases.

This does not yet implement "Find implementations" but this should
straight forward in a next PR.

It has a few known limitations:
- it does not find references to renamed imports
- it does not find references to constructor calls from secondary
constructors

It is currently enabled by the existing feature flag
`metals-mbt-reference-provider` which is already at 100% -- I think
that's ok, given that this feature was always documented as "missing" or
not working, and there's very little chance it impacts anything else. It
gives us a way to ramp down or pull the plug, though we will immediately
enable it for every one.
…lameta#347)

Previously, when doing workspace symbol search, it would sometimes crash
with
"Comparison method violates its general contract" when the query
returned a
mix of Protobuf and non-Protobuf files, and users would not see results.
The
crash happened because we had a sort() method call with a non-transitive
comparator: if a.proto compared equal to b.scala (returning 0) and
b.scala
compared equal to c.proto (returning 0), Java's TimSort expected a.proto
vs
c.proto to also return 0, but it returned the URI comparison instead.

Now, it no longer crashes because the comparator is transitive. Proto
files are
consistently grouped first and sorted among themselves by URI.

<!-- jj-stack-begin -->
- → **scalameta#347** ←
[[Commit](https://github.com/databricks-eng/metals/pull/347/commits/d1f11888847e5882e991d658d1d997683328a55d)]
<!-- jj-stack-end -->
Previously, you couldn't use workspace/symbol to find rpc methods on
proto2 gRPC services. This PR fixes the issue.

## Test Plan

See updated tests. I also manually confirmed this fixes the issue
reported by the user, and the original bug report in the JIRA ticket.
This release includes upstream bugfixes for OrganizeImports so that it
no longer removes entire lines of grouped imports when a single import
is unused. This should hopefully unblock us from being to claim that
OrganizeImports works reliably.

## Test Plan

See newly added test case that asserts the range of the unused import
warning, and that OrganizeImports actually removes an unused import.

<!-- jj-stack-begin -->
- → **scalameta#343** ←
[[Commit](https://github.com/databricks-eng/metals/pull/343/commits/632dbcf07c95d3ac27e373099a1c34a085b0524f)]
<!-- jj-stack-end -->
Implement navigation from super class to all subclasses, and from
methods to all overridden methods.

Implementations are found even when a def is implemented via a val, and
conversely, a val via a getter/setter pair. However, the setter is not
currently found, only the getter.

Closes the last big gap compared to IntelliJ
Previously, we would get multiple connections with the same ID, since
Bloop starts multiple connections for meta builds.

Now, I reverted the additional mapping since it doesn't seem to be used.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 21, 2026

Important

Review skipped

Too many files!

150 files out of 300 files are above the max files limit of 150.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants