Skip to content

Some questions for clarification #401

@frotsch

Description

@frotsch

After some review of the code and little experimenting with nrtsearch I came up with a few findings and questions I like to ask and clarify:

  1. Is it correct that there is no communication between replica nodes (like forwarding search requests from clients)?

  2. Is it correct that the client need to know which index is on which replica and then connect to this replica to execute a search request?

  3. It seems there is no kind of built-in loadbalancing mechanism so that client search requests are executed on the replica with lowest utilization (or in round robin fashion), true? Is loadbalancing something which should be done by the surrounding infrastructure like kubernetes? Any plans to leverage the gRPC-LB protocol (https://grpc.io/blog/grpc-load-balancing/)?

  4. Looks like HA (high availibility) is not a primary goal (which is fine) because it seems that there is no automatic fail-over (especially for primary nodes) mechanism. Is this something which should be handled by the surrounding infrastructure like kubernetes or by the client (queuing up index requests until primary is back) ?

  5. It seems possible to issue search requests against primary nodes. Is this something which is just not recommended or are there plans to forbid this technically.

  6. From what I saw in the code it appears that there is always only one shard per index (shard0). Is it possible (or planned) to have more than one shard per index. If not that would mean that all limits for lucene indices are also holding for "nrtsearch indices" and an index cannot exceed resources of a single physical machine (disk, max open files and memory).
    To let an "nrtsearch index" use resources from multiple physical machines it would be necessary to split the index into multiple shards (=multiple lucene indices) and distribute them across different machines. From my current understanding i think thats not something with nrtsearch was developed for, right? But to server as a replacement candidate for elasticsearch or solr that would be a neccessary functionality. I just like to get a better understanding what are the goals nrtsearch want to reach and what are the use cases nrtsearch is built to support.

  7. Can you elaborate briefly what "virtual sharding" is and what problem it solve? Is it related to 6)?

  8. I understand the purpose of primary and replica nodes and how they interact. But there is also a "STANDALONE" mode and I am not sure about why? Is this for testing purposes? For me it looks like a single primary node would be also sufficient given the fact that a client can issue a search request also agains a primary node, see 5)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions