Sign In
Free Sign Up
  • English
  • Español
  • 简体中文
  • Deutsch
  • 日本語
Sign In
Free Sign Up
  • English
  • Español
  • 简体中文
  • Deutsch
  • 日本語

Everything You Can Do with 5 Million Vectors

Vectors (opens new window) form the backbone of modern AI systems, allowing algorithms to understand and manipulate data in many complicated ways. In different areas like machine learning, data analysis, computer vision, and the most discussed domain of the current era, Large Language Models(LLMs) (opens new window), vectors provide a way to represent information in a format that computers can efficiently process and analyze.

With the advent of LLMs, the application scope of vectors has dramatically expanded. During this time a lot of vector databases have emerged to fulfill the needs and demands of developing applications. While vector databases have a wide range of use cases, they also come with certain limitations such as cost, scalability, and accuracy.

MyScale (opens new window) is a vector database that has been developed especially for AI applications keeping in mind all the factors like cost, scalability, and accuracy. It allows to store up to 5 million vectors for free to encourage everyone who want to build an AI app to try and comprehensively explore its features.

In this blog, let’s learn what kind of applications you can develop using those 5 million free vectors with MyScale.

Related Article: Getting started with MyScale (opens new window)

# Understanding Vectors

Vectors in computational contexts are arrays of numbers that represent data points in multi-dimensional spaces. Each number corresponds to a feature (opens new window) or attribute, making vectors the ideal medium for complex data representation. The power of vectors lies in their ability to condense complex information into a structured, manageable form. They allow computational systems to process vast amounts of data by performing operations on these numerical arrays, facilitating everything from simple calculations to complex transformations.

Vectors are important because they provide a standardized way for algorithms to interpret and process data. In machine learning, a model learns patterns from vectors that represent training data. The results of the model depend heavily on the quality and structure of the vectors. Moreover, vectors help to find similarity measures in high-dimensional spaces. For instance, calculating the distance between vectors can help determine how similar two pieces of text are, or identify nearly identical images. This ability to quantify similarity and difference is crucial for classification, clustering, recommendation systems, and more.

Text to vectors

Vector databases are specialized storage and retrieval systems designed to handle high-dimensional vector data efficiently. They differ from traditional databases in their ability to perform operations relevant to vectors, such as nearest neighbor searches, which identify the vectors closest to a given query vector in the database.

The efficiency of vector databases comes from their optimization for vector operations. They use indexing techniques and algorithms that are specifically designed for high-dimensional spaces, overcoming the challenges posed by the "curse of dimensionality," which traditional databases struggle with.

Related Article: What is Vector Search (opens new window)

# Exploring MyScale Vector Database

MyScale (opens new window) is a SQL vector database built over the ClickHouse database. It allows you to interact with vectors and perform all kinds of operations using SQL.

At its core, MyScale facilitates efficient storage, retrieval, and management of vector data, making it a perfect fit for AI-driven projects that require high-speed similarity searches and data analysis.

MyScale Architechture

It’s built to be secure and easy to use and runs on a strong, shared Kubernetes (opens new window) setup. It’s fully managed on well-protected AWS platform. It focuses on keeping user data safe and keeps customer information separate in individual containers and has strong rules about who can see the data. The data can only be reached through API service calls.

MyScale allows users to store up to 5 million 768 dimensional vectors for free, allowing the early adopters to explore the true potential of the database before paying. When using the free storage, you can access almost all the features of MyScale that are available to premium customers. This includes the MSTG algorithm which is a state-of-the-art optimized retrieval algorithm. It offers you more accuracy and better performance. Also, you can integrate MyScale with AI frameworks like LangChain and LlamaIndex. This makes it easier to fit into your AI project, helping you keep costs down while making the most of your AI tools.

For larger datasets, MyScale now reports an enhanced performance with 110 QPS (Queries Per Second) on the LAION 5M dataset, achieving a 99.1% recall rate and maintaining an average query latency of 15ms with the x1 pod. This gives you a unique opportunity to test and experience these advanced functionalities for free with MyScale.

MyScale Performance

Note:

Find a detailed comparison here (opens new window) that MyScale outperformed other vector databases in terms of speed and accuracy.

Let’s explore some of the applications you can develop for free using 5 Million vectors in MyScale.

Related Article: A Deep Dive into SQL Vector Databases (opens new window)

# Utilizing 5 Million Vectors for Application Development

If you are new to the domain of vector databases or want to create the MVP version of your application, then 5 million vectors are more than enough for you. Typically, each record/image can be represented by a single vector in a vector database, especially when using embeddings from deep learning models. Therefore, if you're using MyScale, you could theoretically store representations for up to 5 million records/images. With these 5 million vectors, you can create prototypes for all kinds of large applications or even create a full fledge small application.

Let's look at some of the possible applications that you can develop using MyScale.

  • Image search application (opens new window): You can develop a versatile image search application that uses MyScale's features. This application will let users search for an image by either writing descriptions or uploading images, making the process of finding images more flexible and quick.
  • Recommendation system (opens new window): You can develop a recommendation system by integrating OpenAI's advanced text embeddings with MyScale's capabilities. This setup will allow your model to learn an enhanced semantic understanding of data, improving the accuracy and relevance of recommendations. The system can be easily scaled and can adapt to various types of recommendation scenarios.
  • Data analysis application (opens new window): You can develop various data analysis applications utilizing MyScale free storage. It allows you to combine the features of vector data with SQL to perform more precise and efficient data analysis. By doing this, your applications can achieve deeper insights and improved data handling, accommodating a wide range of analysis needs.
  • Chatbot (opens new window): You can develop an advanced chatbot equipped with Retrieval-Augmented Generation(RAG) to improve conversation quality and relevance. This helps the creation of scalable chat solutions capable of complex, nuanced interactions and personalized chat experiences.
  • Anomaly detection: In anomaly detection, you can use MyScale to identify unusual activities effectively. By converting both standard and atypical behaviors into vector formats, it becomes faster and more efficient to spot and track anomalies. This contributes significantly to maintaining system integrity and performance.

If you plan to scale these applications or build a large application at the start, MyScale offers very competitive pricing (opens new window). The latest released capacity-optimized pods (opens new window) provide double the capacity with a 15% cost savings comparing to other vector database.

MyScale Price

# Conclusion

The use of vector databases has significantly increased after the rise of large language models, and the market is now filled with various options. Typically, utilizing these databases requires learning from scratch, along with ongoing usage challenges. Additionally, they often come with issues like scalability and cost.

However, MyScale has provided solutions to these common problems. With MyScale, there's no need to learn anything new; you can interact with it simply using SQL syntax, with better speed and accuracy compared to its competitors. Moreover, MyScale provides free storage for all developers, allowing you to explore and assess its suitability for your next application.

If you have any suggestions, reach out to us through Twitter (opens new window) and Discord (opens new window).