After ChatGPT exploded, a new consensus is gradually forming:Language large models will be the super accelerators of the next technology era.China also needs its own AI large model.And there are not many people who are capable of doing this.Li Zhifeiis one of them.
As one of the most senior natural language scientists in the technology startup circle,Li ZhifeiBorn in the Language and Speech Processing Laboratory of Johns Hopkins University in the United States, he has been researching during his Ph.D.machine translationand natural language processing.After leaving university, Li Zhifei joined Google AI The team has led the development of a series of products including offline translation of Google mobile version. In 2012, he chose to return to China to start a business in the field of voice interaction and founded Mobvoy.
After ChatGPT became popular, he traveled to Silicon Valley twice a month, and from Open AI, Google, DeepMind and other engineers and scientists exchange experience. “The first year of AI large models” is here, which is his most intuitive feeling. After some research, he understood that the contestants in this large-scale model battle were not limited to giants, nor would it be a “survival game” with only one or two survivors.
Language model, human-computer interaction, changes happen inLi ZhifeiThe field of study, research and work for more than ten years. He told Geek Park,I have made up my mind to devote myself to it and make a Chinese language model.
“I have always wanted to do something that I can do, I like to do, and it also has profound value.” He said that as early as 2020 when GPT-3 was first released, he proposed GPT-3 is a victory for “violent aesthetics”saw the possibility of leading to AGI, and was the first to develop the Chinese version of GPT-3 UCLAI.
recently,Li ZhifeiI had a nearly two-hour exchange with Geek Park, sharing my views on large models, ChatGPT, and my next idea of starting a business in this field.
The following is a transcript of the conversation, organized by Geek Park.
“This is AI
The Opening of the Large-scale Model Era”
Geek Park: How to understand the wave of frenzy brought about by ChatGPT? What’s new? Why is everyone so excited right now?
Li Zhifei :ChatGPT starts with a statistically-based language model. Through the training of hundreds of billions of parameters, it has various abilities and can quickly learn various tasks. This time, ordinary people have experienced that ChatGPT’s performance in terms of language expression, answering knowledge-based questions, and contextual logic of multiple rounds of dialogue has exceeded everyone’s expectations. It can program, do arithmetic, write poetry, and even do better than real people to some extent.
Geek Park: In addition to the shocking experience itself, how do entrepreneurs or capital see it as a transformative business opportunity?
Li Zhifei :I went to the United States again on the second day of the Chinese New Year. My original intention was to find out if anyone knew how it was made and why the large model was so powerful.
I chatted with people from Google, OpenAI, DeepMind, Meta, and Amazon. Even the tech giants still don’t know how ChatGPT has this ability. But you can see many phenomena. First, the users are too crazy; second, the United States is too crazy. In Y Combinator, a well-known American incubator, more than one-third of the projects may be based on large-scale models.
Everyone thinks it’s AI The opening of the era of large models is just like the era of mobile Internet in 2010.Less than Silicon Valley, feel it firsthand AIGC I will not make a conclusion that this is the “AI era of large models”.I have seven or eight meetings every day from morning to night, my throat is hoarse, and I always discuss this thing. It makes you feel like this is the beginning of an era.
Image source: DeepMind
Geek Park: For this time point, many people compare it to the iPhone moment of the mobile Internet. Have you already figured out what kind of change this is?
Li Zhifei:why say now AI When the big model comes out, everyone looks at the mobile Internet in 2010 or 2011? I was in Silicon Valley in 2010. At that time, everyone thought that the mobile Internet was a big deal, but everyone also felt that the business model was not clear, because the screen was too small, advertisements could not be done, and the network signal was not good. But at that time, everyone saw a few things, such as the screen experience was already very good, and 3G communication was also available. Although it was slow to send emails to check restaurants on the road, GPS was very accurate.
I feel that way about now too, gut feeling it’s a big deal. American venture capital has awakened in October 2022, when I was in the United States,A Sequoia investor told me,Li Zhifei, your time has come. I said why? He said you know what?In the United States, Sequoia-owned managementpartneronly discuss AIGCI didn’t look at other projects.
Geek Park: Today, Invest AIGC It is still the focus of many VCs, but recently everyone is concerned about the underlying AI The enthusiasm for the big model also rose. What is the reason?
Li Zhifei:Now many people put AIGC、ChatGPT、AI Large models are stirred together. It is necessary to clarify that these three are different concepts. The bottom layer is the general AI model, on which AIGC can be done, for example, Midjourney can draw, Jasper can write copywriting, etc.; it can also be based on the bottom model to make dialogue robots, such as ChatGPT. In the United States, in addition to OpenAI and the giants, there are three or four other start-up companies that are also working on AI general-purpose models, all of which have invested hundreds of millions of dollars.
But in China, you suddenly find that you have to enter AI In the era of large models, there is no basic AI large model that can play well. How to apply it?If this is the beginning of the mobile Internet, are Android and iOS super important? But today China lacks the Android and iOS of the big model era, so it is impossible to develop applications at all. Even if China and the United States are completely connected, with the development of China’s technology and the power of capital today, the AI model must be a necessary infrastructure.
Geek Park:AI Is the big model the existence of the operating system level in the AI era?
Li Zhifei:I don’t really want to compare it to an operating system, and I don’t want to think of this as an iPhone moment or a Netscape moment. Because I think all these metaphors will make us judge this matter wrongly. If we compare it to an operating system, we will think that China is definitely out of business based on history; if we regard the present as the Netscape moment or the iPhone moment, then entrepreneurs should choose to build a website or mobile app, but what China lacks now is the opposite. is a browser or iPhone in the era of big models.
In addition, in terms of form, whether it is a browser, hardware, or Android or iOS, it is an offline thing or a static thing.
AI The large model is an existence that is highly integrated with data and business and requires dynamic iterative development. It is a service, and the underlying is constantly changing, and it is deeply integrated with the application.It is far more diverse and has more possibilities than the static things back then.
Geek Park: Getting closer to a new-age cloud.
Li Zhifei :I think it is better to compare it to an intelligent cloud OS, which is an integrated and flowing service. For offline, hardware, and physical analogy,AI Large models will be biased.Any simple induction of its historical form may restrict its correct understanding.
“It may make the whole
The value chain has been reshaped”
Geek Park: Since we can’t make a definition, how do we understand the current situation? AI The super power and imagination displayed by the large model?
Li Zhifei:In my opinion it is a“Universal Cognitive Engine”. First of all, it has super language ability. In the process of learning the language, it has also learned a lot of knowledge and logic. With these basic abilities, you can quickly have the ability to do various tasks.
For example, you only need to give it a small amount of data to do translation. Originally it only understands Chinese, but if you give 10,000 examples of Chinese-English translation, it can quickly do a good job of translation.Just like getting through the two veins of Ren and Du, it will soon be able to collude with its own abilities.
So this ability to recognize a large model will bring many possibilities. For example, taking the current large model and adding some protein structure data, the ability to predict the structure it exhibits is likely to be much better than other models that are not based on large language models.
Geek Park: Why does the general-purpose model have such a powerful potential?
Li Zhifei:The model learns very low-level structures and mechanisms. All things are produced naturally, whether it is language or biological structure, it must conform to some laws that we are currently unable to explain. After the model has been trained with all the data on the Internet, it also acquires some kind of interpretation of its own.
Geek Park: Is this ability acquired through language learning?
Li Zhifei:Language is the breakthrough at the bottom, and now the versatility is reflected in the fact that this system can do various tasks through the language model. Previous language models can only do a specific task.For example, there is a well-trainedpre-trained model, has the potential to do many tasks, but once the Fine Tuning (fine-tuning) is performed, it can only do one task. Fine-tuning makes it more accurate at handling tasks, but at the cost of losing the ability to multitask.
The current general-purpose large model can still do multiple tasks even after fine-tuning. The core of pre-training is to allow it to have basic cognitive and logical abilities. Through fine-tuning guidance, it can handle various tasks better and know how to use existing knowledge.
Geek Park: It’s like letting a person acquire basic abilities after finishing college, and then he can work in different positions and do different things. Instead of being in kindergarten, start training it to screw screws.
Li Zhifei:This metaphor is quite right.used to do a single task, such asmachine translation, It’s like teaching it to screw a screw as soon as it comes up. Of course, this also requires certain language logic and intellectual ability. But if the first day is only taught to turn screws, it may be better to let it go to college and then teach it to turn screws. The first is to learn quickly and efficiently. For example, it may take 5 years to teach before, but now it only takes 5 days to teach. Second, not only can it be screwed, but it can also be taught to write papers and be a professor. You can make it learn quickly with just a small number of examples.
Geek Park: What does the emergence of such a general-purpose large model mean for AGI (General Artificial Intelligence)?
Li Zhifei:This year can be said to be AI The first year of the Universal Mockup.For AGI, I think there is a clear light, which is getting closer and closer, and may never reach it.Today’s human intelligence may not be fully tapped. The ceiling of AGI may be a collection of human collective intelligence. If you aggregate the abilities and unique things of everyone in the world into an abstract unity, this leads to collective intelligence.
If so assumed, now is the starting point for this phase.
Geek Park: If now we can see AI new productivity. How will it affect reality?
Li Zhifei:Now ChatGPT or AIGC, also exists in the virtual world to help humans improve efficiency, such as automating some steps, doing some repetitive work, or giving some brainstorming ideas. In the next three to five years, they will be people’s right-hand men.
Image source: DeepMind
Why do we think it is powerful? The following applications may far exceed the Internet, because this is a “universal cognitive model”. Once this methodology and foundation are put into different fields, many things may be reshaped.
I think it might really reshape the entire value chain.For example, for programmers, in the future, they can communicate in natural language, provide data, and write programs directly from the model. This may lead to a huge change in the computing paradigm. The operating system, distributed computing, and even most of the work of the chip itself will change from program-driven to data-driven. After this change occurs, some companies that are still in business today may not be in business in another 10 years.
「AI General model
is a nuclear weapon,
It has a time window”
Geek Park: When GPT-3 came out in 2020, everyone was shocked, and there was a wave of heat in China. When that change happened, everyone didn’t continue to do it well, why?
Li Zhifei:Abstractly speaking, first, there is no belief in AGI; second, even if you have faith, like mine, which has already been born, and also trained GPT-3 Chinese version UCLAI at that time, it is still not firm enough; third, At the implementation level, there is not enough money; fourth, without a large online model with real users, there will be no closed loop of products and data. GPT-3 has been running online since 2020, got the data, and then re-optimized iterations every week.
GPT-3 Chinese version UCLAI | Image source: Go out and ask
Geek Park: In China, Functional and Dedicated AI It has been applied in various scenarios for a long time, but everyone thinks more about the application scenarios, and does not have more imagination about the revolutionary power of the large model.
Li Zhifei:Yes, I just talked about some abstract things. The whole soil is different, and the people born in the soil are also different. People and money matter. For example, in the United States, there are a group of financially free people who always want to do something different and super uncertain from others. Even very paranoid at times. In the first three years, OpenAI burned 500 million US dollars every year under the premise of no revenue.
We are naturally more afraid of uncertain things, but it is much better than 10 years ago. What was the situation in China 10 years ago, when I returned to China to work on the Chinese version of Siri and a voice app, everyone felt like a god.If I just put API(Application Programming Interface Application programming interface) to package it and make a ChatGPT, and everyone will feel that it is a copycat. Because everyone has begun to realize that it takes 1 billion US dollars to invest in it to make a large model.
Geek Park: If today is like Columbus has discovered the New World, in China we are going to reproduce our own AI How difficult is the large model?
Li Zhifei:First, we know that there must be gold in the New World; second, we roughly know what the route looks like, but we don’t have a particularly accurate map. We know that LLM can be realized, and we probably know what its principle looks like, but in the process of doing it, there must be endless storms, and many decisions need to be made before we can reach the other side.
It is now rumored in the market that the gap between China and the United States is two years, or less than two years. I think that if we have enough money, computing power, and people now, we can start to develop a Chinese ChatGPT today, which can achieve similar or worse performance than it. This is a college student we have trained. Others’ college students have already scored 80 points, and what we train is 60 points. As long as we work hard, we will get to 80 points faster and faster.
Geek Park: Comparatively speaking, what China currently does AI What are the advantages and disadvantages of large models?
Li Zhifei:Let me talk about the shortcomings first, for large-scale AI For large models, we have very few talents, because China has not trained good large models. In the past, we may have many model parameters, but they are not universal. Speech recognition, TTS, and face recognition are all done separately, so the methodology is a bit different. Still using the metaphor of college students, the current large model trained in China may only have 40 points, not 60 points. Only by first making a large model with 60 points and having the ability of self-study, can it be raised to 80 points by hard work.
At the same time, we also have advantages. For example, at the data level, we can label massive amounts of data and refine it. Second, if the direction is clear, China is very good at “aesthetics of violence.”
Geek Park: For AI For the big model, no one can tell what the business model and the final product form are like. In this unclear state, in China, is it more appropriate to make a smaller thing that leads directly to a certain goal?
Li Zhifei:I think that the most first-class and cutting-edge investors will most likely choose the biggest thing at this moment, because there is no need to worry about things in the vertical field.Everyone knows GM AI The large model is a nuclear weapon, and it has a time window. Once talent barriers, time barriers, data barriers, and financial barriers are established, small teams will be out of play.
Currently, the U.S. does general AI The financing window for large models has been closed. Except for OpenAI, several companies have hundreds of millions of dollars. Unless there are super talented people coming in, there will be no VC going in again.
Geek Park: If a large-scale model similar to OpenAI and a subdivision model in vertical fields appear in China in the future, what will the future industrial form be like?
Li Zhifei:Definitely not a big model. In the United States, Amazon may have one, whether it is self-made or acquired, Microsoft and OpenAI will have one, Google will have one, and startups will have one or two. It is a general cognitive model, and there are various business models behind it, such as an application model in the financial field. But the premise is that you have a college student with a score of 60 before you can learn to engage in finance through hard work.
I think the number of large models will not be just two like the previous Internet operating system, it is a static thing.Because we can’t imagine what the general large model can do now, and whether it can reshape various forms in other application fields. For example, the manufacturing industry may also have one, but the premise is to have the ability to use general large models.
“It’s time to race against time
get tickets
Geek Park: Now the environment is changing, and determined people are also appearing, such as Wang Huiwen. There are people who are rich enough to take risks, and there are investors. With these conditions in place, what are the difficulties and uncertainties in doing this?
Li Zhifei:Factors other than technology, such as CEO Can you reach a key agreement with the chief scientist on certain decisions or expectations on time?For example, the chief scientist needs to spend 100 million RMB to buy 1000graphics card, a model with 100 billion parameters can be trained after three months. If the CEO is very impatient and says 100 million yuan, but he still doesn’t know if it can be made in three months, can he ask if it’s okay to make 50 billion yuan in a month? Things that seem simple, if the judgments of the two sides are inconsistent, it may lead to inability to make up your mind, or lead to deformation of the action.It is difficult to find artificial intelligence scientists, and it is even more difficult to make good use of artificial intelligence scientists.
High-level consensus on time, rhythm, and investment must be highly consistent. Including how much money is invested, how much data, how many people, and how much computing power.The same is true between teams, and there will be many engineering choices, such as making models, using more pre-designedtraining data, or to label more data? In the model structure, should the length of the so-called context be made longer, or should the character vector be expressed wider?at least dozens ofhyperparametersneed to be dealt with.Any change in each hyperparameter may affect your time, money, and what you need to use GPUthe final result is uncertain.
Geek Park: In the face of a huge amount of engineering, there are many uncertain factors that affect the results.
Li Zhifei:These OpenAI will not tell you the answer, he may have tried a lot, what kind of data, methodology, and budget is the optimal solution. Even if he gives you the parameters, you may not be able to get it right.
All things considered, there is a risk if the higher-ups don’t communicate well with the programmers. There is also an implementation level, whether the data is cleaned well;GPU How well the parallel training is handled makes the GPU utilization rate relatively low; whether the data labeling quality is good enough; each module may have hundreds of factors, if one factor is not done well, it will either waste money or fail to train.
Geek Park: To deal with these problems, for such technology companies CEO It’s a huge test.
Li Zhifei:absolute.At the starting point of this moment, the chief scientist is definitely the most important.
Geek Park: To be such a leader, in addition to understanding technology, you must also be able to make decisions and gather talents. Besides, what other characteristics are important?
Li Zhifei:It is difficult to generalize, but it can be compared. For example, in Silicon Valley, how to judge whether a person is very technical or very Silicon Valley, you only need to ask him a few questions.
This is the efficiency of communication brought about by cognitive experience. The same is true for masters in academia. For example, I have been thinking about some problems for a long time, and I have tried various methods. I know that the other party is also doing this. We may only spend 5 minutes communicating to align the answers. He may say a noun, which paper can do this, or which part of a paper can solve this problem, and you will immediately know the person’s ability level. It must be a very long process for him to think to this extent. Even if we have different definitions of the problem, at least the two sides have really thought deeply about this place.
Geek Park: How do you see the current competitive environment for large models? What is the timing like?
Li Zhifei:How to compete with the giants is still unanswered. Top investors are more concerned about how much money can be spent to get it done, and whether it can be done.
In my imagination, by next June, as long as you can make this 60-point basic model, even if there are 5 on the market, you can enter the next round of competition.Now you have to race against time to qualify for the competition. Thinking too much will only make you hesitate and feel that the risk is too great. If you are doing a large vertical model or application, don’t worry at all, just take your time.
Now to my main battlefield,
must participate
Geek Park: So how did you make your own decision?
Li Zhifei: This is my main battlefield, and I must participate.I have read for many years NLP Ph.D. He also worked on language translation at Google. He has been working on voice interaction and generative programming for ten years. AI. Now that there is such a big event in the NLP field, China also needs its own general-purpose large model. If it is not done now, when will it be.
This is my profession and my passion, and I also believe that deep barriers and far-reaching values can be established.Regarding barriers, before I went to the United States this time, I always wondered whether it was easy for Google to make ChatGPT. But after chatting with many people, I realized that many barriers can be built here, and it is not easy for Google to achieve the level of ChatGPT immediately.
Geek Park: Do you also want to make the Chinese version of OpenAI?
Li Zhifei:The Chinese version of OpenAI just makes it easier to describe this to the public. But at the core, I am optimistic about the “universal cognitive model” itself.I started to make large models two years ago, and I was the first group in China to start making large models seriously.
In 2020, as soon as GPT-3 came out, we trained a large model GPT-3 Chinese version UCLAI. On this basis, we did the practice of classical Chinese and vernacular Chinese translation, ancient painting generation, music synthesis, etc., and successfully made The industry’s Top1 dubbing product “Magic Sound Workshop”, etc., has the best domestic AIGC The number of users and revenue scale are second only to Midjourney and Jasper in the world.
Li ZhifeiShare GPT-3 at IF Innovation Conference 2021|Source: Geek Park
Geek Park: What new ideas do you have when making large-scale models now?
Li Zhifei:If I make a general-purpose large model now, I have to make the skeleton stable enough and have strong plasticity, and then go to fine carving. Just like building the Leshan Giant Buddha, after having a skeleton, the nose, eyes, and hands are beautifully repaired. When you really have the ability of a 60-point college student, we can train this college student very well through diligence.
On this basis, I have to innovate. It is meaningless to follow OpenAI, and we may not be able to keep up. We must innovate.
Geek Park: You have been in business for quite a long time. Does your past experience have any significance for doing what you are doing today?
Li Zhifei:All past experiences are beneficial wealth. First, let me have a more precise judgment. Second, richer engineering practice and comprehensive ability.
To do this now, I only recruit the most powerful people to do the core technology. And it has more long-term focus, instead of doing things that make progress in the short term and consume in the long term.
Geek Park: You used to be a scientist. After starting your business these years, has anything changed? How do you position yourself?
Li Zhifei: I am a scientist CEO。I can communicate with scientists and engineers in depth, formulate routes with scientists, build beliefs, and let the whole team work in one direction. This is also a very important factor for OpenAI’s success.
Geek Park: Wang Huiwen’s momentum is also very strong. After getting funds, he can always recruit excellent people. Will you care about this matter?
Li Zhifei: People are the most important factor, and every entrepreneurial team will have its own core competitiveness. But the most important of the first stage are:Find talents who really understand the core technology, and can cooperate with them in the right way and rhythm.
Geek Park: How did you plan it?
Li Zhifei:The short-term goal is to make a 60-point general model. In the medium and long term, after I have a base model with 60 points, I will make great efforts to polish it to 80 points, so that it can be used stably in real business scenarios.My advantage is the general AI I have a very strong interest in technology, and I have my own judgment and grasp of how technology will evolve in the future, which enables me to run long distances on this track.
I already had a clear road map in my mind and saw that endgame.