# Quickstart

This guide will show you how to launch a cluster, import data, and execute SQL queries in a few simple steps. More information on other developer tools, such as the Python Client, can be found in the Developer Tools section.

# How to Launch Your First Cluster

Launch a first cluster

Before performing any data operations, you need to launch a cluster. The Clusters page lets you create a cluster and configure computing and storage resources to meet your requirements.

Follow these steps to launch a new cluster:

  • Go to the Clusters page and click the +New Cluster button to launch a new cluster.
  • Name your cluster.
  • Click Launch to run the cluster.

TIP

The development tier of MyScale is limited to the default configuration and does not support multiple replicas. Refer to the Changing Your Billing Plan section and upgrade to the standard plan for a more powerful configuration.

# Environment Setup

You have the option of connecting to a MyScale database using one of the following developer tools:

However, let's use Python to get started.

# Using Python

Before getting started with Python, you need to install the ClickHouse client (opens new window), as described in the following shell script:

pip install clickhouse-connect

Once the ClickHouse client has been installed successfully, the next step is to connect to your MyScale cluster from within a Python app by providing the following details:

  • Cluster host
  • Username
  • Password

To find these details, navigate to the MyScale Clusters page, click the Action dropdown link, and select Connection Details.

Actions

As seen in the following image, a Connection Details dialog will display with the necessary code required to access MyScale. Click on the copy icon to copy the corresponding code and paste it into your Python app.

Python

TIP

See Connection Details for more information on connecting to your MyScale cluster.

# Using the MyScale Console

To use the MyScale console to import data into your database and to run queries, navigate to the SQL Workspace page. Your cluster will be selected automatically, as shown below:

Enter the SQL Workspace page

# How to Import Data into the Database

Follow these steps to import data into MyScale:

# Create a Table

Creating a database table in MyScale before importing any data is mandatory.

TIP

To learn more, navigate to the document describing how to Create a Database and how to Create a Table in the SQL Reference section.

Using the following code samples, let's write a SQL statement (both in Python and SQL) to create a new table named default.myscale_categorical_search.

  • Python
  • SQL
# Create a table with 128 dimensional vectors.
client.command("""
CREATE TABLE default.myscale_categorical_search
(
    id    UInt32,
    data  Array(Float32),
    CONSTRAINT check_length CHECK length(data) = 128,
    date  Date,
    label Enum8('person' = 1, 'building' = 2, 'animal' = 3)
)
ORDER BY id""")

# Insert Data Into the Table

TIP

MyScale supports data imports from Amazon s3 and other cloud services using s3-compatible APIs. See the s3 Table Function for more information on importing data from Amazon s3.

As described in the following code snippets, let's use SQL to import data into the default.myscale_categorical_search table.

Supported file formats include:

TIP

See Formats for Input and Output Data (opens new window) for a detailed description of all supported formats.

  • Python
  • SQL
client.command("""
INSERT INTO default.myscale_categorical_search
    SELECT * FROM s3(
        'https://d3lhz231q7ogjd.cloudfront.net/sample-datasets/quick-start/categorical-search.csv',
        'CSVWithNames',
        'id UInt32, data Array(Float32), date Date, label Enum8(''person'' = 1, ''building'' = 2, ''animal'' = 3)'
    )""")

# Build a Vector Index

In addition to creating traditional indexes on structured data, you can also create a Vector Index in MyScale for vector embeddings. Follow this step-by-step guide to see how:

# Create MSTG Vector Index

As the following code snippets show, the first step is to create an MSTG vector index, a vector index using our proprietary algorithm, MSTG.

  • Python
  • SQL
client.command("""
ALTER TABLE default.myscale_categorical_search
    ADD VECTOR INDEX categorical_vector_idx data
    TYPE MSTG
""")

TIP

The index's build time depends on the size of your data import.

# Check Vector Index Build Status

The following code snippets describe how to use SQL to check the vector index's build status.

  • Python
  • SQL
# Query the 'vector_indices' system table to check the status of the index creation.
get_index_status="SELECT status FROM system.vector_indices WHERE table='myscale_categorical_search'"
# Print the status of the index creation.  The status will be 'Built' if the index was created successfully.
print(f"index build status is {client.command(get_index_status)}")

The output is seen in the following examples:

  • Python
  • SQL
index build status is Built

TIP

See the Vector Search for further information on vector indexes.

# Executing SQL Queries

Once you have imported data into a MyScale table and built a vector index, you can query the data using the following search types:

TIP

The most significant benefit of building an MSTG vector index is its lightning-quick search speed.

Conventionally, text or images, such as “a blue car” or images of a blue car, is queried. However, MyScale treats all queries as vectors and returns a response to the query based on the similarity (“distance“) between the query and the existing data within the table.

Use the following code snippets to retrieve data using a vector as the query:

  • Python
  • SQL
 # pick a random row from the table as the target
random_row = client.query("SELECT * FROM default.myscale_categorical_search ORDER BY rand() LIMIT 1")
assert random_row.row_count == 1
target_row_id = random_row.first_item["id"]
target_row_label = random_row.first_item["label"]
target_row_date = random_row.first_item["date"]
target_row_data = random_row.first_item["data"]
print("currently selected item id={}, label={}, date={}".format(target_row_id, target_row_label, target_row_date))
# Fetch the result of the query.
result = client.query(f"""
SELECT id, date, label, 
    distance(data, {target_row_data}) as dist FROM default.myscale_categorical_search ORDER BY dist LIMIT 10
""")
# Iterate through the rows of the query result and print the 'id', 'date',
# 'label', and distance for each row.
print("Top 10 candidates:")
for row in result.named_results():
    print(row["id"], row["date"], row["label"], row["dist"])

The result set containing the ten most similar results is as follows:

id date label dist
0 2030-09-26 person 0
2 1975-10-07 animal 60,088
395,686 1975-05-04 animal 70,682
203,483 1982-11-28 building 72,585
597,767 2020-09-10 building 72,743
794,777 2015-04-03 person 74,797
591,738 2008-07-15 person 75,256
209,719 1978-06-13 building 76,462
608,767 1970-12-19 building 79,107
591,816 1995-03-20 building 79,390

TIP

These results are in vector embeddings, which you can use to retrieve the original data by referencing the results' id.

Not only can we query data using a vector search (using vector embeddings), but we can also execute SQL queries using a combination of structured and vector data, as the following code snippets describe:

  • Python
  • SQL
 # Fetch the result of the query.
result = client.query(f"""
SELECT id, date, label, 
    distance(data, {target_row_data}) as dist
    FROM default.myscale_categorical_search WHERE toYear(date) >= 2000 AND label = 'animal'
    ORDER BY dist LIMIT 10
""")
# Iterate through the rows of the query result and print the 'id', 'date',
# 'label', and distance for each row.
for row in result.named_results():
    print(row["id"], row["date"], row["label"], row["dist"])

The result set containing the ten most similar results is as follows:

id date label dist
601,326 2001-05-09 animal 83,481
406,181 2004-12-18 animal 93,655
13,369 2003-01-31 animal 95,158
209,834 2031-01-24 animal 97,258
10,216 2011-08-02 animal 103,297
605,180 2009-04-20 animal 103,839
21,768 2021-01-27 animal 105,764
1,988 2000-03-02 animal 107,305
598,464 2003-01-06 animal 109,670
200,525 2024-11-06 animal 110,029
Last Updated: Sat Apr 13 2024 10:45:55 GMT+0000