June 6, 2023

Dialogue with Zilliz Xingjue: The vector database is the “memory” of the large model | Geek Park

While the large model is progressing with “daily updates”, it also unknowingly brings a sense of anxiety: Grammarly, an AI writing tool with a valuation of 13 billion US dollars, plummeted in website users after the release of ChatGPT; the AI ​​chat robot unicorn company Character .AI’s self-built large model has been questioned whether it can form enough competitive barriers under the progress of ChatGPT…

After the release of the ChatGPT Plugins, more entrepreneurs began to worry that the technological progress of large models would involve them in the “strike range”, instantly erasing their technical accumulation and advantages in their field.

We seem to have fallen into the “WTF syndrome” brought about by the large model-in the roller coaster-like technological acceleration, people will repeatedly jump between the surprise of “what the fuck” and the thinking of “what’s the future”.

Calm down and think about it, besides the wave of localization of the underlying large model and the prosperity of the application layer, what else is worthy of attention in the field of large models?

Vector database (Vector database) has been repeatedly mentioned in our various exchanges with industry professionals. It is regarded as a key link in AI Infra, a database system dedicated to storing, indexing, and querying embedded vectors, which allows large models to store and read knowledge bases more efficiently and at a lower cost. finetune (model fine-tuning) will further play an important role in the evolution of AI Native applications.

  • What is the value and significance of vector databases for large models? Will it be swallowed by the progress of the big model itself?
  • How will the software development paradigm be structured around large models? What role can vector databases play?

With these questions in mind, Geek Park approached Zilliz founder & CEO Xingjue for an exchange. Zilliz was founded in Shanghai, China, and is headquartered in Silicon Valley, USA. In 2019, it open sourced the world’s first vector database product, Milvus. It has obtained more than 18,000 Stars on GitHub and has more than 1,000 corporate users worldwide, making it the most popular in the world. The open source vector database. As early as 2022, Zilliz has completed a cumulative B-round financing of more than 103 million US dollars, with a valuation reaching an astonishing 600 million US dollars.


Figure | Zilliz’s paper published at SIGMOD’21, Milvus is the first real vector database product in the world

Before the big model boom, the entire vector database market was only a few hundred million dollars a year. Until the launch of ChatGPT last year, such as the killer app (killer application), the market ceiling was generally raised, and large models and vector databases began to really go out of the circle.

At the NVIDIA GTC conference in March this year, Huang Renxun mentioned the vector database for the first time,and stressing the importance of vector databases for organizations building proprietary large language models,Zilliz has also been officially announced as the official partner of NVIDIA’s vector storage. Immediately afterwards, in the official article published by OpenAI ChatGPT plugins, Milvus and Zilliz Cloud were mentioned as the first batch of plug-in partners at the same time, becoming the only vector database company selected for both open source projects and commercial cloud products. In the past month, Vector Database has ushered in a wave of financing, Qdrant, Chroma and Weaviate have obtained financing; and Pinecone has officially announced a new round of financing of 100 million U.S. dollars, with a valuation of 750 million U.S. dollars.. ….


Figure | Zilliz Founder & CEO Xingjue

From March to May, we witnessed the complete process of the vector database from obscurity to becoming a hot industry with Xingjue, and discussed with him the evolution of large models, the value and significance of vector databases, and the evolution of AI Native applications And so on a series of questions.

The following is a selection of conversations, organized by Geek Park:

Geek Park: In your opinion, what is a big model?

Star Lord:The large model is an intelligent processor, a brain. Traditional processors manually arrange circuits, while large models use neural networks to arrange circuits.

The future development of the large model will be more powerful. On the one hand, there will be a large model with central processing, like ChatGPT’s cloud brain, with tens of thousands of GPUs behind it. It will develop in the direction of larger scale and stronger capabilities, but its shortcomings It is also obvious that energy consumption and cost are also relatively high; another dimension, it will go in the direction of higher efficiency and lower energy consumption ratio, and there will be small models, such as the Dolly model (Note: Databricks released 120 100 million parameters, ChatGPT-like open source model), each person carries a “brain”.

Geek Park: How did you come to this conclusion?

Star Lord:I look at it from the perspective of the history of human technological development, not the big model itself.

The essence of the large model is to open an era of human intelligent computing, but the computing power will definitely be differentiated, and it is impossible to say that there is only one.

Human computing is eventually differentiated from large to small, and “big” may not be a requirement that suits all products. The computer was originally the mainframe era, a mainframe in a large computer room, and now the essence of ChatGPT is the same. For this calculation method, the disadvantages are obvious, the response is very slow, and the throughput rate is not high.

The trend of miniaturization is due to the differentiation of functional requirements. If you want to work, you can just buy a laptop with Internet access, and you don’t need a supercomputer.

Geek Park: Will there be a clear stage division like the evolution from mainframes to microcomputers? Or will it be an era of mixing large and small models?

Star Lord:From the emergence of large models to small models, this evolution actually took 6 months, and the evolution of human civilization has become faster. Now if you want to run a small model like Deep Speed ​​Chat from Microsoft, you can do it on a laptop.

You can think that the small model is equivalent to the current PC, and the large model is the current supercomputer.

I think both the big model and the small model will exist side by side. Behind the essence of demand differentiation is not a technical problem, but a problem of how to optimize the allocation of resources.If centralized management is the best, then it will definitely be centralized.

Geek Park: What is the relationship between large models and vector databases? Most of the vector database companies actually existed before the big model craze. I am more curious about what the big model brings to the vector database?

Star Lord:In my opinion, the large model is a new generation of AI processor, which provides data processing capabilities; then the vector database provides memory, which is its storage.

The application of the vector database was relatively scattered before,The emergence of ChatGPT has ushered in the killer app moment for Vector Database,All of a sudden, the ceiling of the industry has been raised by several orders of magnitude, and the number of developers targeted by the vector database has increased from tens of thousands to tens of millions around the world. Just like doing mobile development and application development in the past, maybe everyone will have a MongoDB database, and will also have a vector database in the future.

Geek Park: At present, the development of large models is very rapid. Will the value provided by the vector database be directly swallowed into the system of large models?

Star Lord:Regardless of whether it is a large model or a small model, it is essentially an encoding of the world’s knowledge and operating rules, and a compression of all our human data.

But it’s hard to really put all the data into the big model. For example, there is a point of view that ChatGPT is a very efficient compression coding method, but it is not lossless compression and cannot have all knowledge. This process will inevitably bring about entropy reduction and information loss. If all the information is encoded into the neural network, the neural network will become very bloated, and the scale of parameters will be huge, resulting in slow operation. So it can’t fit all in, which means it may need external storage.

There is a similar situation in the computer structure: there will be an on-chip storage SRAM in the CPU, which is generally made smaller, because the price of on-chip storage is 100 times more expensive than memory DRAM, and 10,000 times more expensive than disk. Neural networks are on-chip storage for large models, and larger models have more on-chip storage. However, using neural networks to store data is quite expensive, which will lead to a sharp increase in network size, so large models also need a more efficient data storage method, which is off-chip storage outside the neural network, and the vector database is it. off-chip storage. If you find that there is information you don’t know when running, you can go to the vector database to get it.The storage cost of the vector database is 2 to 4 orders of magnitude lower than that of the neural network.

Geek Park: You made an analogy with hardware, but we can also see that some chip companies are building an architecture that integrates storage and computing, where storage and computing are put together.

Star Lord:In essence, every kind of computing needs storage. Computing and storage are the two most basic abstractions, but they can be transformed into each other. Storage can be exchanged for computing, and computing can be exchanged for storage. In the process of achieving a better input-output ratio, there needs to be a balance.

The first computer of human beings was integrated with storage and calculation. Why did they separate it later? Because of efficiency and cost. The reason why the large model cannot completely replace the vector database is because the storage cost of the vector database is 1,000 to 10,000 times lower than that of the large model, depending on whether you want to do it or not.Historically, storage has always been cheaper than computing,Compute is also dropping in price, but you find storage is cheaper all the time.

Geek Park: But this logic is actually the logic of hardware, will software be the same?

Star Lord:Software and hardware complement each other and support each other. Why should the intermediate calculation process be saved in the software instead of recalculating? Why do you need to cache and save intermediate results? Because saving is for less calculation, the reason for less calculation is to exchange space for time and storage for computing resources. This is the most classic design pattern in software. The reason why the software is designed this way is also determined by the characteristics of the hardware: storage is cheaper than computing.

Geek Park: There was a joke recently that the last wave of voters AIGC Most VC companies have basically regretted it, which means that under the big model, the barriers of many application layer companies have been erased. Will private domain data be the core of competitiveness, and can vector databases play a role in this direction?

Star Lord:Helping users manage private domain data is indeed the core application scenario provided by the current vector database. Many companies and individuals are unwilling to let big models get their own data.

So how do we use vector databases to exchange data with large model manufacturers like OpenAI? First of all, the large model itself can crawl all public domain data on the Internet. For private domain data, it can be arranged in a vector database and converted into vectors in the vector database. Now to find answers to questions from the knowledge base of private domain data, you can use the similarity search capability of the vector database to accurately locate the relevant data in the database, and then compile these pieces of information into a prompt.

Although the capacity of the prompt can theoretically be unlimited, this is too inefficient and difficult to achieve. Through the above method, it is easy to control whether there are 2000 tokens or 8000 tokens, and pass it to the large model to give an answer. Therefore, through the vector database, private domain data and large models can be used to enhance and complement each other.

Geek Park: Will the big model take away all private data?

Star Lord:There is another particularly good point after this wave of AI in foreign countries is that they have figured out the protection of private data.

Why do so many developers dare to use it, and so many tens of billions of dollars of companies are willing to integrate their services into OpenAI, that is, OpenAI has guaranteed that the prompt can only be used as input (input), and cannot be stored, trained, or learned. Otherwise, if I give you all the data, what if you kick me out after you finish studying. Foreign countries have drawn a large boundary between public data and private data. I believe that China will eventually legislate and come to this point.

Geek Park: What are the other applications of vector databases in large models?

Star Lord:In the short term, there is another application method, using the vector database to update the data for the large model.

In fact, this is also a cost consideration.The cost of finetune (fine-tuning) update of the model is much greater than that of using vector database data for storage.

Whether it is a large model or a small model, finetune is relatively slow. Unless you make a super invincible computer, get new data in real time, and update the model in real time, but it’s not necessary. The data used for ChatGPT training is as of September 2021, and it doesn’t know the following things, and it will give wrong answers. Therefore, by using the vector database to update the data of the large model, the problem of “nonsense” of the large model can also be solved to a certain extent.

Geek Park: Zilliz also launched GPTCache to give ChatGPT Do cache layer. How do you understand this?

Star Lord:Caching is also a good application scenario.we consider the global CDN And the cache will have a chance to redo it once.In the future, the way of information exchange in the public AI scene will change, and it will be more intelligent and semantic-oriented, and more unstructured data will be digested. The previous CDN was based on precise conditions, but now it can be based on some approximate conditions. In the future, a smarter CDN and a smarter cache are needed.

Geek Park: The recently popular AutoGPT is also related to the vector database. What role does the vector database play in it?

Xingjue: The vector database is one of the core components in AutoGPT.Our product Milvus has been connected to AutoGPT, which has brought us a lot of traffic. You can understand it like this,AutoGPT equals ChatGPT Plus the vector database.Through the vector database, AutoGPT has long-term memory ability. It knows what it searched before, and records all the history. Otherwise, there is no context for each query.

Geek Park: The paper “Generative Agents: Interactive Simulacra of Human Behavior” jointly released by Stanford University and Google has also attracted strong attention from the industry. The experiment constructed a virtual town. There are 25 virtual people with “memory” in the town. They can combine their own “personal settings” to produce different decisions and behaviors in social activities, which allows us to see the possibility of building a realistic version of “Western World”. The very important thing in this experiment is the memory mechanism of the virtual human. Can the vector database do it?


Figure|Virtual Town

Star Lord:able. Adding a character’s avatar to AutoGPT can create the simplest smart Avatar. In the future, all kinds of intelligent agents must have a memory, and the memory is provided by the vector database. This imagination space is quite large. So what could it be an opportunity for? It’s really hard to define. For the first time in human history, a virtual human with independent memory appears. This is a historic opportunity, and the demand for vector databases will increase tens of thousands of times.

This is essentially using the vector database as the memory of the large model, which is applied to the scene of the virtual agent. I think everyone is gradually reaching a consensus that as long as the application of large models is done, vector databases will be used as memory.

Geek Park: If a large model wants to handle complex tasks, it must first have a memory, and then reason based on the memory (context), and the vector data is the memory. Is it enough to have only the vector database, or do we need to add something else?

Star Lord:Basically enough. Because all large models and data exchanges in AI use vectors. This essentially means that the exchange format of off-chip and on-chip storage data is unified. This is why Oracle cannot be used as the memory of the agent; ChatGPT did not choose MongoDB or Oracle, because they are not suitable for large models.

Geek Park: Is it possible to understand the large model as a new type of computer based on natural language programming, vector data is its most original data, and vector database is its most original database. This is actually a complete system ?

Star Lord:Yes. The neural network is indeed a vector, because all its information transmissions are vectors, and the parameters of each layer are also vectors. So it can be seen as a new computing architecture based on vector embedding.

This architecture is simply “CVP Stack”.Among them, “C” is a large model such as ChatGPT, which is responsible for vector calculation; “V” is a vector database (vector database), which is responsible for vector storage; “P” is Prompt Engineering, which is responsible for vector interaction.

Geek Park: For AI For application development in the Native era, what has become more important? What doesn’t matter anymore?

Star Lord:In the era of large-scale model development, I first proposed the concept of “CVP Stack” mentioned above, and it is now widely accepted.

What I want to point out here is that my definition and understanding of “P” will be more open, not limited to prompt engineering. It is essentially a creative process, rather than simply writing a prompt. The core of “P” is the ability to ask questions, or the ability to find demand and find the market. For example, how do you design your user process and how to find a good commercial landing scenario? These are all included in “P”. middle.

For current developers, whether you are a product manager or a front-end engineer, if you use ChatGPT plus a vector database, write a prompt, and then use LangChain to string projects together, you can basically write an application in a week, which greatly improves Reduced development costs for large model applications.

Geek Park: Facing the era of large models, what is your more accurate definition of Zilliz?

Star Lord:We are a DB for AI company. About 3 years ago, when we published the first SIGMOD paper in this field, we proposed this concept, but it is an academic way of speaking. According to a better understanding, we are actually an AI Native database company.

The biggest opportunity in the past 10 years is to do Cloud Native (cloud native) Data Infra, which has produced giants like Snowflake; the biggest opportunity in the next 10 years is to do AI Native (AI native) Data Infra,Vector database companies like Zilliz will have a historic opportunity.

Geek Park: In this wave of large-scale model entrepreneurship, the most frequently asked question for application layer entrepreneurs is how to build their own competitive barriers without being “drowned” by the progress of the large-scale model itself? What do you think about barriers?

Star Lord:The application layer actually has barriers. When the technical barriers become lower, other barriers become higher. I think creativity and operation become more important.

Geek Park: Different from China’s focus on the application layer and large model layer, the United States is AI Infra ushered in the prosperity of open source, what will be the competitive landscape of AI Infra? Is there anything to look forward to in China at AI Infra next?

Star Lord:Infra is a winner-takes-all market, and Snowflake is the leader in the field of cloud-native databases.

I think China has not yet started in this area, and China’s large-scale models are about half a year behind foreign countries. In the case of Infra, I think China may still lag behind by half a year to one year. I think AI Infra will experience unprecedented rapid development in the next six months.

Geek Park: At the NVIDIA GTC conference in March, Huang Renxun emphasized that vector databases are essential for large models, and announced that you are a partner of NVIDIA’s vector database. What happened in the middle of the story?


Figure|NVIDIA GTC conference, Huang Renxun mentioned the vector database for the first time, and emphasized that the vector database is crucial for organizations that build proprietary large-scale language models

Star Lord:At the beginning of our business, we firmly believed that computing in the AI ​​era must be heterogeneous, and we have been adhering to the technical route of heterogeneous computing including GPU acceleration.

NVIDIA saw in the second half of last year that in the development trend of AI, the processing of vector data will become more and more important. It is hoped that more enterprises can use GPU acceleration. They researched and contacted companies and teams in the field of vector databases around the world, but they also found that we are the only one that can really have a layout and strength in heterogeneous computing.

After getting to know us deeply, NVIDIA has also become a very important contributor in our Milvus open source community. In addition to co-publishing the GPU-accelerated vector database, NVIDIA also sent several engineers to contribute code to Milvus.

In addition, NVIDIA also has a GPU-accelerated open source recommendation system framework Merlin, which makes Milvus a key component of Merlin to help the recommendation system manage the following data. Now NVIDIA is not only our partner, but also a big user of us.

Geek Park:OpenAI What kind of story?I saw it posted ChatGPT In the official article published by plugins, Milvus and Zilliz are the first batch of partners, contributing vector database plugins.


Figure|OpenAI release ChatGPT In the official article of plugins, Zilliz’s product is mentioned twice

Star Lord:OpenAI is even simpler, and there have been some cooperations a year ago. They told us to build a platform, which is ChatGPT plugins. They saw that our vector database is the most popular and influential in the developer community in the world, so they hoped to join. At the beginning, we were very peaceful, thinking that it was a basic open source compatible cooperation. But after the plugins were released, we received unprecedented attention from developers in the field of large models and GPT user groups, which I did not expect.

Geek Park: Since the vector database is so important, are you worried that other people in China will build another vector database?

Star Lord:Honestly we welcome it, as vector databases are still in the early stages of development and need more market education.

Geek Park: How do you think about the question “Why you?”

Star Lord:I also didn’t expect to have the opportunity to talk about vector databases with everyone this year. I used to tell people every year that the vector database is the next thing that is particularly awesome, but most people said after listening to it, it is interesting, if you have this idea and vision, just do it well.

Looking back now, without a firm vision and long-term persistence, we could not have come to the present.

Of course, a successful business also requires some good luck. For example, every time we raise funds in history, we can meet some investors who share the same vision and insist on long-termism. We started preparing for commercialization this year, and we just encountered the “iPhone moment” of the large model, which pushed us into the spotlight. Some people doubted our commercialization potential before, but now some people tell me that it is very difficult for Zilliz not to make money.

Many times, you insist on doing some difficult things, but many times you fail to catch up, and you may die. Because in fact, many excellent companies have died in this way. Although their products and technologies are very good, they do not coincide with the trend of market economic development, and do not coincide with the timing of the outbreak of user demand.

For those that can be controlled, then we will desperately do what is necessary for success; while some are uncontrollable, for us, we are very grateful for this luck.

Geek Park: So how do you see yourself? Is the winner of long-termism?

Star Lord:Survivor is more appropriate.

Geek Park: To what extent does the large model accelerate the vector database?

Star Lord:It is an exponential growth. The growth of developers in the past 6 months can probably match our growth in the past three years.

This is an era when the vector database begins to grow wildly, and it is also the beginning of the rapid growth of the AI-native database in the next ten years.

Ewen Eagle

I am the founder of Urbantechstory, a Technology based blog. where you find all kinds of trending technology, gaming news, and much more.

View all posts by Ewen Eagle →

Leave a Reply

Your email address will not be published.