# Compatibility between MyScaleDB 1.x and ClickHouse 23.3

# Table of Contents

Compatibility between MyScaleDB 1.x and ClickHouse 23.3

# Overview

This document aims to provide a detailed explanation of the compatibility between MyScaleDB and ClickHouse 23.3, and lists the new or enhanced features in MyScaleDB that are not present in the current version of ClickHouse.

# New Features

# Vector Index

# Description

In MyScaleDB, vector indexing is provided for efficient processing and querying of vector data. This feature is analogous to the Distance table function in ClickHouse version 23.3, but MyScaleDB's implementation of vector indexing offers superior performance and accuracy. For more details, please refer to the documentation: Basic Vector Search.

# Inverted Index

# Description

MyScaleDB provides a more efficient and user-friendly inverted index based on the BM25 algorithm for efficient full-text searching. Additionally, building upon inverted and vector indexing, MyScaleDB introduces fused search capabilities, enabling users to effectively combine vector and full-text searches to obtain desired results. For more details, please refer to the documentation: Full-Text Search.

# Features or Fixes Introduced from Other Versions

Support reading empty files from S3
PR52733 (opens new window)
PR52763 (opens new window)
PR49519 (opens new window)
Fix the issue of writing append files to incremental backup
PR49725 (opens new window)
Report the correct status (FAILED) of Executable Dict loading failure
PR48775 (opens new window)
Add different behavior when stderr stream of external command has data
PR43210 (opens new window)
Properly destroy tasks in ShellCommandSource
PR53573 (opens new window)
Fix data race in shell command
PR53631 (opens new window)
Properly clean up in case of exception in ShellCommandSource constructor
PR55103 (opens new window)
Use default {replica} and {shard} parameters in ReplicatedMergeTree
PR48961 (opens new window)
Optimize execution of ALTER statements on a single shard of a Replicated database
PR51049 (opens new window)
Consider deleted rows when selecting parts to merge
PR58223 (opens new window)

# Experiment Features Enabled by Default

# Table Engine Configuration

allow_experimental_database_replicated

# Session Configuration

allow_experimental_object_type

# Modified Default Configurations

For detailed explanation of configuration parameters, please refer to the official ClickHouse documentation.

# Server Configuration

max_connections: 1024 -> 4096
max_concurrent_queries: 0 (unlimited) -> 1000
disable_internal_dns_cache: 0 -> 1
max_table_size_to_drop: 50000000000 -> 1000000000000
uncompressed_cache_size: 0 (disabled) -> DYNAMIC_SETTING (dynamically adjusted by server memory)
mark_cache_size: 0 (disabled) -> DYNAMIC_SETTING (dynamically adjusted by server memory)

# Table Engine Configuration

index_granularity: 8192 -> 128
merge_max_block_size: 8192 -> 256
max_bytes_to_merge_at_max_space_in_pool: 161061273600 -> 5368709120
number_of_free_entries_in_pool_to_lower_max_size_of_merge: 8 -> 2
number_of_free_entries_in_pool_to_execute_mutation: 20 -> 2
old_parts_lifetime: 480 -> 5
simple_merge_selector_base: 5 -> 1.2

# Session Configuration

min_insert_block_size_bytes: 268402944 -> 33554432
max_query_size: 262144 -> 262144000
connect_timeout_with_failover_ms: 50 -> 5000
use_uncompressed_cache: 0 -> 1
distributed_directory_monitor_batch_inserts: 0 -> 1
distributed_product_mode: DENY (disable) -> GLOBAL
send_progress_in_http_headers: 0 -> 1
join_use_nulls: 0 -> 1
prefer_global_in_and_join: 0 -> 1
max_result_rows: 0 (unlimited) -> 10000
default_table_engine: None (disabled) -> ReplicatedMergeTree
mutations_sync: 0 -> 1
allow_experimental_database_replicated: 0 -> 1
database_replicated_allow_replicated_engine_arguments: 1 -> 0
async_insert: 0 -> 1
allow_experimental_object_type: 0 -> 1
background_pool_size: 16 -> 4
default_database_engine: Atomic -> Replicated

# Added Configurations

# Server Configuration

primary_key_cache_size
- Limit of primary key cache size
vector_index_cache_size
- Limit of vector index cache size in cache
vector_index_cache_size_ratio_of_memory
- Memory limit for vector index cache (as a ratio of total memory)
vector_index_build_size_ratio_of_memory
- Memory limit for vector index build (as a ratio of total memory)
enable_brute_force_vector_search
- Enable brute force search for vector search

# Table Engine Configuration

enable_primary_key_cache
- Enable primary key cache for vector search
enable_decouple_vector_index
- Enable using old vector index during part merge and vector search.
enable_rebuild_for_decouple
- Enable rebuilding new vector index on decoupled parts.
min_rows_to_build_vector_index
- Minimum number of rows to build vector index
min_bytes_to_build_vector_index
- Minimum number of bytes to build vector index
float_vector_search_metric_type
- Default metric type for float vector search
binary_vector_search_metric_type
- Default metric type for binary vector search
max_rows_for_slow_mode_single_vector_index_build
- Maximum number of rows for slow mode vector index build for data parts
default_mstg_disk_mode
- Default disk mode used
vector_index_parameter_check
- Enable parameter check for vector index
vidx_zk_update_period
- Time interval for background update of vector index information on ZooKeeper
vector_index_cache_recheck_interval_seconds
- Time interval for performing background operation of deleting legacy vector index cache.
build_vector_index_on_random_single_replica
- Randomly build vector index on different replicas

# Session Configuration

database_replicated_always_execute_with_on_cluster
- Always create or drop replicated databases on all replicas of the cluster
database_replicated_default_cluster_name
- Name of the cluster for creating or dropping replicated databases
database_replicated_allow_explicit_arguments
- Allow explicit arguments for creating replicated databases
database_replicated_always_convert_table_to_replicated
- Always convert tables in the database to replicated tables using Replicated engine
database_replicated_default_zk_path_prefix
- Prefix to be used when filling in the zk_path of the replicated database engine when creating a database. If empty, zk_path will not be automatically set.
optimize_move_to_prewhere_for_vector_search
- Enable or disable special PREWHERE optimization for vector search in SELECT queries, moving all feasible WHERE conditions to PREWHERE.
two_stage_search_option
- Enable two-stage search for vector search
enable_brute_force_vector_search
- Enable brute force search for vector search
max_build_index_train_block_size
- Maximum block size (in bytes) for building index training
max_build_binary_vector_index_train_block_size
- Maximum block size (in bytes) for building index training for binary vectors
max_build_index_add_block_size
- Maximum block size (in bytes) for adding vectors in one round of index build

# Other Configurations

vector_index_event_log
- Configuration for vector index event table
vector_index_cache_path
- Directory for vector index cache
tantivy_index_cache_path
- Directory for full-text search vector cache