# Compatibility between MyScaleDB 1.x and ClickHouse 23.3

# Table of Contents

# Overview

This document aims to provide a detailed explanation of the compatibility between MyScaleDB and ClickHouse 23.3, and lists the new or enhanced features in MyScaleDB that are not present in the current version of ClickHouse.

# New Features

# Vector Index

# Description

In MyScaleDB, vector indexing is provided for efficient processing and querying of vector data. This feature is analogous to the Distance table function in ClickHouse version 23.3, but MyScaleDB's implementation of vector indexing offers superior performance and accuracy. For more details, please refer to the documentation: Basic Vector Search.

# Inverted Index

# Description

MyScaleDB provides a more efficient and user-friendly inverted index based on the BM25 algorithm for efficient full-text searching. Additionally, building upon inverted and vector indexing, MyScaleDB introduces fused search capabilities, enabling users to effectively combine vector and full-text searches to obtain desired results. For more details, please refer to the documentation: Full-Text Search.

# Features or Fixes Introduced from Other Versions

# Experiment Features Enabled by Default

# Table Engine Configuration

  • allow_experimental_database_replicated

# Session Configuration

  • allow_experimental_object_type

# Modified Default Configurations

For detailed explanation of configuration parameters, please refer to the official ClickHouse documentation.

# Server Configuration

  • max_connections: 1024 -> 4096
  • max_concurrent_queries: 0 (unlimited) -> 1000
  • disable_internal_dns_cache: 0 -> 1
  • max_table_size_to_drop: 50000000000 -> 1000000000000
  • uncompressed_cache_size: 0 (disabled) -> DYNAMIC_SETTING (dynamically adjusted by server memory)
  • mark_cache_size: 0 (disabled) -> DYNAMIC_SETTING (dynamically adjusted by server memory)

# Table Engine Configuration

  • index_granularity: 8192 -> 128
  • merge_max_block_size: 8192 -> 256
  • max_bytes_to_merge_at_max_space_in_pool: 161061273600 -> 5368709120
  • number_of_free_entries_in_pool_to_lower_max_size_of_merge: 8 -> 2
  • number_of_free_entries_in_pool_to_execute_mutation: 20 -> 2
  • old_parts_lifetime: 480 -> 5
  • simple_merge_selector_base: 5 -> 1.2

# Session Configuration

  • min_insert_block_size_bytes: 268402944 -> 33554432
  • max_query_size: 262144 -> 262144000
  • connect_timeout_with_failover_ms: 50 -> 5000
  • use_uncompressed_cache: 0 -> 1
  • distributed_directory_monitor_batch_inserts: 0 -> 1
  • distributed_product_mode: DENY (disable) -> GLOBAL
  • send_progress_in_http_headers: 0 -> 1
  • join_use_nulls: 0 -> 1
  • prefer_global_in_and_join: 0 -> 1
  • max_result_rows: 0 (unlimited) -> 10000
  • default_table_engine: None (disabled) -> ReplicatedMergeTree
  • mutations_sync: 0 -> 1
  • allow_experimental_database_replicated: 0 -> 1
  • database_replicated_allow_replicated_engine_arguments: 1 -> 0
  • async_insert: 0 -> 1
  • allow_experimental_object_type: 0 -> 1
  • background_pool_size: 16 -> 4
  • default_database_engine: Atomic -> Replicated

# Added Configurations

# Server Configuration

  • primary_key_cache_size
    • Limit of primary key cache size
  • vector_index_cache_size
    • Limit of vector index cache size in cache
  • vector_index_cache_size_ratio_of_memory
    • Memory limit for vector index cache (as a ratio of total memory)
  • vector_index_build_size_ratio_of_memory
    • Memory limit for vector index build (as a ratio of total memory)
  • enable_brute_force_vector_search
    • Enable brute force search for vector search

# Table Engine Configuration

  • enable_primary_key_cache
    • Enable primary key cache for vector search
  • enable_decouple_vector_index
    • Enable using old vector index during part merge and vector search.
  • enable_rebuild_for_decouple
    • Enable rebuilding new vector index on decoupled parts.
  • min_rows_to_build_vector_index
    • Minimum number of rows to build vector index
  • min_bytes_to_build_vector_index
    • Minimum number of bytes to build vector index
  • float_vector_search_metric_type
    • Default metric type for float vector search
  • binary_vector_search_metric_type
    • Default metric type for binary vector search
  • max_rows_for_slow_mode_single_vector_index_build
    • Maximum number of rows for slow mode vector index build for data parts
  • default_mstg_disk_mode
    • Default disk mode used
  • vector_index_parameter_check
    • Enable parameter check for vector index
  • vidx_zk_update_period
    • Time interval for background update of vector index information on ZooKeeper
  • vector_index_cache_recheck_interval_seconds
    • Time interval for performing background operation of deleting legacy vector index cache.
  • build_vector_index_on_random_single_replica
    • Randomly build vector index on different replicas

# Session Configuration

  • database_replicated_always_execute_with_on_cluster
    • Always create or drop replicated databases on all replicas of the cluster
  • database_replicated_default_cluster_name
    • Name of the cluster for creating or dropping replicated databases
  • database_replicated_allow_explicit_arguments
    • Allow explicit arguments for creating replicated databases
  • database_replicated_always_convert_table_to_replicated
    • Always convert tables in the database to replicated tables using Replicated engine
  • database_replicated_default_zk_path_prefix
    • Prefix to be used when filling in the zk_path of the replicated database engine when creating a database. If empty, zk_path will not be automatically set.
  • optimize_move_to_prewhere_for_vector_search
    • Enable or disable special PREWHERE optimization for vector search in SELECT queries, moving all feasible WHERE conditions to PREWHERE.
  • two_stage_search_option
    • Enable two-stage search for vector search
  • enable_brute_force_vector_search
    • Enable brute force search for vector search
  • max_build_index_train_block_size
    • Maximum block size (in bytes) for building index training
  • max_build_binary_vector_index_train_block_size
    • Maximum block size (in bytes) for building index training for binary vectors
  • max_build_index_add_block_size
    • Maximum block size (in bytes) for adding vectors in one round of index build

# Other Configurations

  • vector_index_event_log
    • Configuration for vector index event table
  • vector_index_cache_path
    • Directory for vector index cache
  • tantivy_index_cache_path
    • Directory for full-text search vector cache
Last Updated: Fri Nov 01 2024 09:02:06 GMT+0000