Name	Name	Last commit message	Last commit date
parent directory ..
.mvn/wrapper	.mvn/wrapper
core	core
dev	dev
.gitignore	.gitignore
.scalafmt.conf	.scalafmt.conf
CLAUDE.md	CLAUDE.md
README.md	README.md
jni_license_header.txt	jni_license_header.txt
license_header.txt	license_header.txt
mvnw	mvnw
pom.xml	pom.xml

Java bindings and SDK for Lance Data Format

⚠️ Under heavy development

Lance is a new columnar data format for data science and machine learning

Why you should use Lance

It is an order of magnitude faster than Parquet for point queries and nested data structures common to DS/ML
It comes with a fast vector index that delivers sub-millisecond nearest neighbor search performance
It is automatically versioned and supports lineage and time-travel for full reproducibility
It is integrated with duckdb/pandas/polars already. Easily convert from/to Parquet in 2 lines of code

Quick start

Introduce the Lance SDK Java Maven dependency(It is recommended to choose the latest version.):

<dependency>
    <groupId>com.lancedb</groupId>
    <artifactId>lance-core</artifactId>
    <version>0.18.0</version>
</dependency>

Basic I/O

create empty dataset

void createDataset() throws IOException, URISyntaxException {
    String datasetPath = tempDir.resolve("write_stream").toString();
    Schema schema =
            new Schema(
                    Arrays.asList(
                            Field.nullable("id", new ArrowType.Int(32, true)),
                            Field.nullable("name", new ArrowType.Utf8())),
                    null);
    try (BufferAllocator allocator = new RootAllocator();) {
        Dataset.create(allocator, datasetPath, schema, new WriteParams.Builder().build());
        try (Dataset dataset = Dataset.create(allocator, datasetPath, schema, new WriteParams.Builder().build());) {
            dataset.version();
            dataset.latestVersion();
        }
    }
}

create and write a Lance dataset

void createAndWriteDataset() throws IOException, URISyntaxException {
    Path path = "";     // the original source path
    String datasetPath = "";    // specify a path point to a dataset
    try (BufferAllocator allocator = new RootAllocator();
        ArrowFileReader reader =
            new ArrowFileReader(
                new SeekableReadChannel(
                    new ByteArrayReadableSeekableByteChannel(Files.readAllBytes(path))), allocator);
        ArrowArrayStream arrowStream = ArrowArrayStream.allocateNew(allocator)) {
        Data.exportArrayStream(allocator, reader, arrowStream);
        try (Dataset dataset =
                     Dataset.create(
                             allocator,
                             arrowStream,
                             datasetPath,
                             new WriteParams.Builder()
                                     .withMaxRowsPerFile(10)
                                     .withMaxRowsPerGroup(20)
                                     .withMode(WriteParams.WriteMode.CREATE)
                                     .withStorageOptions(new HashMap<>())
                                     .build())) {
            // access dataset
        }
    }
}

read dataset

void readDataset() {
    String datasetPath = ""; // specify a path point to a dataset
    try (BufferAllocator allocator = new RootAllocator()) {
        try (Dataset dataset = Dataset.open(datasetPath, allocator)) {
            dataset.countRows();
            dataset.getSchema();
            dataset.version();
            dataset.latestVersion();
            // access more information
        }
    }
}

drop dataset

void dropDataset() {
    String datasetPath = tempDir.resolve("drop_stream").toString();
    Dataset.drop(datasetPath, new HashMap<>());
}

Random Access

void randomAccess() {
    String datasetPath = ""; // specify a path point to a dataset
    try (BufferAllocator allocator = new RootAllocator()) {
        try (Dataset dataset = Dataset.open(datasetPath, allocator)) {
            List<Long> indices = Arrays.asList(1L, 4L);
            List<String> columns = Arrays.asList("id", "name");
            try (ArrowReader reader = dataset.take(indices, columns)) {
                while (reader.loadNextBatch()) {
                    VectorSchemaRoot result = reader.getVectorSchemaRoot();
                    result.getRowCount();

                    for (int i = 0; i < indices.size(); i++) {
                        result.getVector("id").getObject(i);
                        result.getVector("name").getObject(i);
                    }
                }
            }
        }
    }
}

Schema evolution

add columns

void addColumnsByExpressions() {
    String datasetPath = ""; // specify a path point to a dataset
    try (BufferAllocator allocator = new RootAllocator()) {
        try (Dataset dataset = Dataset.open(datasetPath, allocator)) {
            SqlExpressions sqlExpressions = new SqlExpressions.Builder().withExpression("double_id", "id * 2").build();
            dataset.addColumns(sqlExpressions, Optional.empty());
        }
    }
}

void addColumnsBySchema() {
  String datasetPath = ""; // specify a path point to a dataset
  try (BufferAllocator allocator = new RootAllocator()) {
    try (Dataset dataset = Dataset.open(datasetPath, allocator)) {
      SqlExpressions sqlExpressions = new SqlExpressions.Builder().withExpression("double_id", "id * 2").build();
      dataset.addColumns(new Schema(
          Arrays.asList(
              Field.nullable("id", new ArrowType.Int(32, true)),
              Field.nullable("name", new ArrowType.Utf8()),
              Field.nullable("age", new ArrowType.Int(32, true)))), Optional.empty());
    }
  }
}

alter columns

void alterColumns() {
    String datasetPath = ""; // specify a path point to a dataset
    try (BufferAllocator allocator = new RootAllocator()) {
        try (Dataset dataset = Dataset.open(datasetPath, allocator)) {
            ColumnAlteration nameColumnAlteration =
                    new ColumnAlteration.Builder("name")
                            .rename("new_name")
                            .nullable(true)
                            .castTo(new ArrowType.Utf8())
                            .build();

            dataset.alterColumns(Collections.singletonList(nameColumnAlteration));
        }
    }
}

drop columns

void dropColumns() {
    String datasetPath = ""; // specify a path point to a dataset
    try (BufferAllocator allocator = new RootAllocator()) {
        try (Dataset dataset = Dataset.open(datasetPath, allocator)) {
            dataset.dropColumns(Collections.singletonList("name"));
        }
    }
}

JVM Engine Connectors

JVM engine connectors can be built using the Lance Java SDK. Here are some connectors maintained in lancedb Github organization:

Contributing

From the codebase dimension, the lance project is a multiple-lang project. All Java-related code is located in the java directory. And the whole java dir is a standard maven project(named lance-parent) can be imported into any IDEs support java project.

Overall, it contains one Maven sub-module:

lance-core: the core module of Lance Java binding, including lance-jni.

To build the project, you can run the following command:

mvn clean package

if you only want to build rust code(lance-jni), you can run the following command:

cargo build

The java module uses spotless maven plugin to format the code and check the license header. And it is applied in the validate phase automatically.

Environment(IDE) setup

Firstly, clone the repository into your local machine:

git clone https://github.com/lancedb/lance.git

Then, import the java directory into your favorite IDEs, such as IntelliJ IDEA, Eclipse, etc.

Due to the java module depends on the features provided by rust module. So, you also need to make sure you have installed rust in your local.

To install rust, please refer to the official documentation.

And you also need to install the rust plugin for your IDE.

Then, you can build the whole java module:

mvn clean package

Running these commands, it builds the rust jni binding codes automatically.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Java bindings and SDK for Lance Data Format

Quick start

Basic I/O

Random Access

Schema evolution

JVM Engine Connectors

Contributing

Environment(IDE) setup

FilesExpand file tree

java

Directory actions

More options

Directory actions

More options

Latest commit

History

java

Folders and files

parent directory

README.md

Java bindings and SDK for Lance Data Format

Quick start

Basic I/O

Random Access

Schema evolution

JVM Engine Connectors

Contributing

Environment(IDE) setup