Represented by ChatGPT AI Technology will fundamentally change every category of software services – Microsoft President Nadella’s assertion has become the consensus of most technology practitioners in the world today.
But when the foreign technology industry is enthusiastically investing in this wave, Chinese practitioners have discovered sadly:There is almost a blank in the field of large models in China.Only a few large companies have sporadically announced that they will launch their own large-scale models in the future, and a few star entrepreneurs have revealed that they will start businesses in this field.
Against this background, the “birth” of the start-up company MiniMax is undoubtedly an unexpected surprise. The company, which was founded one and a half years ago, took large-scale models as the main research and development direction from the first day of its establishment.Today, there are already three basic large models of modalities, involving the generation of different content of voice, image, and text.
Based on the self-developed large model, they have launched an intelligent dialogue robot generation platform Glow, which currently has nearly five million users and hundreds of millions of user calls per day.
Before ChatGPT became popular, making a large model was a “stupid business” with huge investment, side track, and very unclear commercialization prospects.Not to mention ordinary entrepreneurs, even Internet giants who have a lot of resources rarely dabble or have limited investment. And this is also the direct reason why the domestic large-scale model is blank today.
Because of this, the existence of the MiniMax is curious.After chatting with several early members and technical backbones, we found that this is a batch of experiences, with different backgrounds, but for AI A technical idealist with continuous thinking and exploration. They came together because of their belief in AGI (Artificial General Intelligence).
When people lament that the long-termism of technology is difficult to exist, the emergence of such a team seems to be exactly what people expect.
three large models
From the very first day of its establishment, MiniMax has chosen large-scale models as its main research and development direction.
Currently, MiniMax has foundation models with different capabilities and three modes: Text-to-Text, Text-to-Visual, and Text-to-Audio.
These three models respectively correspond to the conversion and generation of content between different forms.Text-to-Text corresponds to the conversion between text and text (for example, you can answer questions through the generated text), Text-to-Visual corresponds to the conversion between text and visual images (for example, you can generate images through text descriptions), and Text -to-Audio generates audio from text.
The large model is a complex system engineering. Allen (Yang Bin), the co-founder of MiniMax, described it as building a rocket. The technologies and papers involved are public, but it does not mean that the rocket can be built. As a startup company, it needs to achieve the set goals within limited time and resources.
Ge Wen, an early member of the team, described, “Every technical judgment will directly affect the final effect, and every step is connected in series, so every decision is important.” The technical background of the team members is different, This allows them to complement each other’s perspectives and discuss fully.
Allen told Geek Park that the first milestone set by the team is to achieve the world-leading level of the three large models within half a year. This tests the team to make correct decisions in each technology choice, and also prompts them to explore more basic and low-level technologies. Allen said, “We have done things in the underlying technology that startups usually don’t do,” Allen said.
The bottom layer of MiniMax self-developed technology is the hardware infrastructure built to support large models—— with efficient GPU It provides stable and reliable parallel computing capabilities, supports voice, text, and visual multi-modal computing, has strong self-training computing capabilities, and also has strong adaptability. Through this infrastructure layer, data and computing power are provided to the large model as nutrients.
In addition to advanced technology, the ultimate goal of the large model is to export services to the outside world. Last November, the company released its first product: Glow. After four months, the app has nearly five million users.
Some users described Glow as “Open world from first-person perspective“, the team felt very appropriate.players pass and AI Driven intelligent body dialogue, to build their own world. Glow can provide the experience of talking to multiple different “personal” intelligent bodies. Players can choose an existing intelligent body, such as a character in the novel “Three-Body Problem”, or describe their personality in words, “Pinch “Create your own smart body.
The significance of Glow for MiniMax is to run through the interaction between the large model and the real world.Through this product, the ability of large models serves users through concrete forms. For example, the user can generate an avatar of an agent through language description, which is the image generation capability of Text-visual; different agents have different timbres and sound quality, which is the voice generation of Text-audio.
You can create your own agents on Glow｜Source: Glow
Glow currently has hundreds of millions of user calls per day.In order to provide the capabilities of large models to people so widely, it is technically necessary to solve the challenges of low cost, high efficiency, and stability. Therefore, on top of the model, MiniMax built a reasoning platform (Computing Platform).
Allen described “how to make a very heavy thing light? This is actually a very difficult engineering thing.” In the future, this reasoning platform will support more applications. Through these applications, the model and the real world People’s behaviors interact extensively, and the data will guide the continuous iteration of the model.
A team that believes in AGI
MiniMax was established in December 2021.Several core technical backbones of the team are mostly from well-known domestic and foreign AI companies and technology giants.
Ge Wen (flower name) graduated from Johns Hopkins University and studied computer natural language in the university’s laboratory for 10 years. Ge Wen’s last internship before graduation was at the Microsoft headquarters in the United States, during which he was exposed to generative dialogue systems, and the possibilities of technology excited him.
“What I want to do in natural language processing is an algorithm, model, or intelligent body that can understand human speech and communicate with people. This is my original intention of studying this major.” To be a large number of users in the real world Interaction, feedback, and iterative language models are the most attractive places for him to join a start-up company like MiniMax.
The founding employee, Scallion (flower name), previously worked for SenseTime, convinced AI possibility, but having experienced the last wave of AI, he is also deeply aware of the limitations of the previous generation AI technology paradigm.
past,AI The working method of the technical team is to customize models one by one according to specific application scenarios. There are more and more models, but they cannot be truly connected. It is not realistic to maintain thousands of models for a long time. Even with all the effort spent on improving the state of the art, AI technology is having less and less impact in the real world. He has been paying attention to the progress of language models since GPT-1 came out in 2018, and gradually realized that language may be used as an interactive interface to integrate technologies of different modalities.
Allen’s research background iscomputer visionPh.D., during his overseas study, he was a founding member of Uber ATG Research Institute, experienced the establishment of the entire research institute, and also experienced Uber’s self-driving team being packaged and sold, and then joined the self-driving start-up Waabi as a founding member. Experienced in data-driven end-to-end systems. In 2021, Allen met the currentpartner, from time to time they exchange breakthroughs in their latest papers. The step-by-step breakthrough made him feel that AGI (General Artificial Intelligence) is getting closer.
For the team, three small things that happened in different industries from 2020 to 2021 made them have a firm judgment on the arrival of AGI.
The first thing is the release of GPT-3 in June 2020.The number of parameters of the model has increased from millions and billions to hundreds of billions in the past, and the training method has also changed from past data labeling to learning from various corpora.Both the amount of parameters and the amount of data have caused a magical qualitative change, giving GPT-3 reasoning capabilities and forming the past AI The general generalization ability that the model does not have.
The second thing is that half a year later, in January 2021, the cross-modal model CLIP will come out.CLIP can not only explain pictures in natural language, but also generate pictures from text descriptions. This opens up the conversion of two different media forms, language and text. OpenAI’s subsequent Text-to-Image generation tool DALL-2 is based on the CLIP model technology.
The significance of this matter is that in the past, different proprietary models were designed for different modalities, but now a set of technical frameworks can handle data of different modalities, and can achieve very good cross-modal generation and transformation.
The third thing happened half a year later.In July 2021, Tesla will AI The latest self-driving technology was demonstrated on Day, which proved for the first time that this end-to-end fully data-driven technical path can be successfully applied in real-world self-driving cars. After that, most of the autonomous driving companies in the world began to gradually believe that the end-to-end deep learning technology stack can really work in the real world.
Allen said that the three things that happened in different industries were connected by a group of people who always had the dream of AGI.they believe AI There will be qualitative changes and qualitative upgrades in technology in the next two to three years; based on this upgrade, AGI may come within the lifetime of this generation.
Therefore in Tesla AI Four months after Day ended, MiniMax was officially established.According to the team, MiniMax, established at that time, may be the first All in AGI company in China.
Another interesting little thing: When preparing to start a business, several people on the team liked to play Detroit Become Human. In Allen’s view, this game depicts the era of symbiosis between humans and machines after the realization of AGI.
He believes that human-computer symbiosis will definitely be realized in the future. Robots may have entities or virtual existences, but their intelligence and completeness will enable them to form a real relationship with humans, which may provide productivity or may It is emotional companionship.
Users share stories co-created on Glow｜Source: Shared by Xiaohongshu
“After ChatGPT became popular, we felt very happy, which saved a lot of effort in the education market.” At a small media communication meeting, a founding member of MiniMax talked with reporters participating in the event. This is also the company’s first official small-scale debut. In the past 14 months, the company has rarely spoken out, and has been quietly developing technologies and products.
ChatGPT’s paid account was opened, and the number of users exceeded 100 million in just two months, which also made it a brand new existence. It’s a big model in itself, but its popularity and frequency of use by people also make it a “product”-like existence.
“The biggest revelation of the ChatGPT incident seems to be that the things we are doing are indeed in demand.” Ge Wen thinks this is a great encouragement to himself.
In Allen’s view, this is the most amazing point of the current large model,“When it is general enough and its generalization ability is strong enough, it has enough multi-tasking general ability, and it can be used directly in many cases.”
At present, many people have used ChatGPT to modify code bugs, check information, write articles, and even try to use it to generate reports. People will use it according to their needs. The threshold of use is low enough to be used by various groups of people, so that large models naturally have certain product attributes.
“AGI company is actually a brand-new company type.” Allen introduced at the communication meeting that large-scale model companies are no longer based on AI Technology provides targeted solutions, but allows more people to directly interact with technology dynamically and in real time through various methods.
Under this system, the original concepts of toB and toC are no longer important.Green Onion said, “We don’t really distinguish between this point. In fact, the main thing is how many user groups we can cover, how much efficiency improvement or other value we can bring to them.”
It is conceivable that when MiniMax was first established in 2021, this logic would cause them to hit the wall repeatedly when they were looking for investors, partners and even employees in the early stages of their business. “There is no way to convince investors, because no one can understand. We have said it many times, and few people believe it.” A founding member said.
One end is the core technology, and the other end is the specific users. Between these two ends, real smooth feedback and linkage can be realized. This is one of the core thinking logics of MiniMax, which the team sums up as “User-in-the-Loop”.
The inspiration for this, Allen said, comes from Tesla in 2021. AI Day. The first version of academic prototypes of many technologies displayed on AI Day originated from him and some former collaborators, but Tesla loaded these technologies on countless cars to interact with users in the real world and give feedback iterations .
“I think it taught me one thing, when you have a very cutting-edge technology, how to put it in the real world from the perspective of a commercial company, make real impact for everyone.”
When asked what plans are in the future, the team members’ favorite saying is “go at your own pace”.They said that the model will be open this year APIand then new products will be developed according to the capabilities of the model.