What should come has finally arrived.
On March 14, local time in the United States, the popular OpenAI officially launched its latest work, GPT-4. After re-igniting the imagination of the entire technology circle through ChatGPT, GPT-4 has undoubtedly become the focus of the entire industry.
In the official website of OpenAI, compared with the previous generation of products, this generation of large model GPT-4, the biggest evolution lies in “multimodal” and long content generation.
In the previous ChatGPT, users could only enter text, but GPT-4 can now recognize the content of pictures and give answers, and even recognize some common “memes” on the Internet, and tell users what the “smile” is. In terms of output, GPT-4 can output up to 25,000 words, which is a significant improvement compared to ChatGPT.
At the same time, compared with the previous generation of products, GPT-4 gives fewer mistakes in the answers, and the answers are more “safe” when it comes to ethical and sensitive issues.
Can GPT-4 sweep the technology circle again like “brother” ChatGPT?it’s next AI What impact will the direction of the industry have?
An extra pair of “eyes”, more intelligent
According to OpenAI, compared to ChatGPT, GPT-4 has three main improvements.
1 Ability to read pictures
You can ask questions directly with pictures during the dialogue, and it can give logical answers on the basis of understanding the content of the pictures.For example, you can give it a picture of milk, eggs, and flour, and ask him “What can these materials be used for?” It will give a series of options:
pancakes or waffles
French crepes, French toast
omelette or frittata
custard or pudding
cake or cupcakes
muffins or bread
biscuits or biscuits
When the ability to read pictures and write code is combined, it is like magic. At the press conference, Open AI showed how to let GPT-4 help you make web pages:
Just enter to draw a sketch in a notebook, take a photo, and say to him: Use short HTML/JS to turn this sketch into a colorful website, within seconds, a complete web page can appear.
At present, Open AI has not opened the image recognition capability in GPT-4. In order to better optimize the image input function, OpenAI is working closely with BeMyEyes. This is a company in Denmark. What they are doing is using software to allow visually impaired people to interact remotely with volunteers, who act as the eyes of the former and help them complete life tasks.
After accessing the latest image recognition capabilities, the visually impaired are like having a “smarter camera”,Also known as “virtual volunteers”, previous algorithms cannot make logical reasoning based on visual information-this is the biggest difference between GPT-4 and previous visual algorithms. The company’s CTO Jesper Hvirring Henriksen explained that the new functionality “is not only about the ability to talk, but also about the analysis capabilities that the technology provides.” Basic image recognition applications can only tell people what is happening in front of them. Now the algorithm can judge what is happening on the ground. “is it not just a ball, but a tripping hazard, and then communicate.”
2 answers are more reliable
Can give more reliable and creative answers to more complex tasks that involve more logic.
Open AI let GPT-4 take a series of exams, and the performance of GPT-4 has improved, and some even directly changed from poor students of GPT-3.5 to top students.For example, on the mock bar exam, it scores around the top 10%; while GPT-3.5 scores around the bottom 10%. SAT Math, which has a full score of 800, has improved from 590 to 700.
“In casual conversation, the difference between GPT-3.5 and GPT-4 can be subtle. When the complexity of the task reaches a sufficient threshold, the difference emerges—GPT-4 is more reliable than GPT-3.5, More creative, and able to handle finer-grained instructions,” said Open AI.
3 The content is longer and richer
In addition, the information processing capacity of a single question and answer has also been improved.When asking questions to GPT-4, you can enter up to 25,000 words at a time.Before, some screenwriters communicated with Geek Park. When they tried ChatGPT, they could only disassemble the script and show it to the model. Now, such troubles should be greatly reduced.
That is to say, he can directly read the article for you. You can also throw him a link and ask him to do reading comprehension directly.
For example, give it the Wikipedia link for American singer Rihanna and ask her, “Describe her performance at the Super Bowl and why it is so special.”
The answer given is this:
Rihanna’s Super Bowl VII halftime performance was special for a number of reasons. For starters, it was her first live performance in more than five years, and her first Super Bowl halftime appearance. Plus, she made history by becoming the first person to perform at the Super Bowl while pregnant. The show itself was critically acclaimed, with many praising her energetic performance and impressive vocals. It also garnered a massive audience, becoming the second-most-watched halftime show in history.
- 4. About “making up nonsense” and “harmful answers”
Of course, GPT-4 still has the problem of “fabricating” the current large model. However, Open AI said that in the internal test of “fabricated facts”, GPT-4 performed 40% higher than GPT-3.5.There is also a time limit for the knowledge domain of GPT-4, which ends in September 2021.
Open AI also highlights the risks of generating harmful ideas. “GPT-4 and its successor models have the potential to have significant impacts on society in both beneficial and harmful ways. We are working with external researchers to improve the way we understand and assess potential impacts, and our ability to assess possible dangers in future systems to evaluate.”
Due to the addition of an additional safety reward letter in the reinforcement learning based on human feedback (RLHF), this makes GPT-4’s current performance a bit better. In addition, OpenAI also hired more than 50 experts from the fields of AI alignment risks, cybersecurity, biorisk, trust and safety, and international security to conduct adversarial testing of the model. Two examples are given in the findings.
GPT-4 lands faster
Regarding the performance of GPT-4, OpenAI concluded: We spent 6 months iteratively adjusting GPT-4, using lessons from ChatGPT and our adversarial testing procedures, in authenticity, controllability (steerability) and refusing to go outside of guardrails got our best results ever.
Obviously, the iteration speed of OpenAI’s GPT large model has become faster. GPT-4 is less than 4 months away from the last ChatGPT release. This is related to its initiative to open ChatGPT to individual users and corporate customers, making machine-based reinforcement learning (RLHF) based on human feedback faster, forming the advantage of a data flywheel.
What is faster than the iteration of GPT-4 is the speed of its application.
Although GPT-4 is not yet fully openAPIinterface, but in fact, OpenAI’s major shareholder Microsoft’s New Bing has been using GPT-4 for more than a month.After OpenAI officially announced GPT-4, Microsoft immediately announced the news on the official blog. Even 5 days ago, Microsoft Germany CTO Andress Braun released the news for OpenAI as the exclusive source of information on the entire network.
Perhaps the reason why people didn’t notice GPT-4 in New Bing earlier is that the progress of GPT-4 is subtle. OpenAI elaborates, “In casual conversation, the difference between GPT-3.5 and GPT-4 may be subtle, and when the complexity of the task reaches a sufficient threshold, the difference will appear-GPT-4 is more than GPT-3.5 More reliable, more creative, and able to handle finer-grained instructions.”
Microsoft’s Bing search already uses GPT-4｜Microsoft
The iteration of Microsoft’s blessing of the OpenAI large model is also reflected in the underlying infrastructure.Over the past two years, OpenAI has co-designed a supercomputer for the workload from the ground up with Azure, rebuilding the entire deep learning stack.
Not only that, more applications have also joined the early use camp of GPT-4:
- Stripe announced the use of GPT-4 to scan commercial websites and provide summaries to customer support;
- Language learning tool software Duolingo builds GPT-4 into new language learning subscription;
- Morgan Stanley is creating a GPT-4 powered system that will retrieve information from company filings and provide it to financial analysts;
- Khan Academy is using GPT-4 to build some sort of automated tutor.
There is no doubt that OpenAI will keep getting faster and faster. In addition to being open to customers to use the data flywheel, with the release of GPT-4 this time, OpenAI also open sourced OpenAI Evals, which is its framework for automatically evaluating the performance of AI models, allowing anyone to report shortcomings in their large models, to help guide further improvements.
In this regard, go out and ask the founder Li ZhifeiRate it as a crowdsourced reviewThe task of helping the system to find faults is crowdsourced to developers and enthusiasts, which not only gives everyone a sense of participation, but also allows everyone to help evaluate and improve the system for free, killing two birds with one stone.
On November 30, 2022, when OpenAI launched the ChatGPT beta version, it may not have been expected that this dialogue robot supported by a large language model will become the first product in the history of technology to exceed 100 million users in the shortest time. Global technology practitioners are once again excited by the progress of AI.
Only 3 months later, OpenAI launched the next-generation product GPT-4, and its iteration is very “Silicon Valley speed”. It can also be seen from the update frequency that the team must win the big language model track. heart of.
Although the evolution of GPT-4 is not “revolutionary” compared to the previous generation of products, the improvement of tens of percent in various indicators will still add fire to the already hot AI track.
at the same time,While teaming up with Microsoft, let ChatGPT land on the world’s largest commercial software Office and infrastructure Azure cloud; API OpenAI has also successfully transformed itself into a platform-based enterprise similar to the cloud, creating a road to commercialization of large models, and pioneering the commercialization of research results.
Whether GPT is the holy grail of artificial intelligence—the correct path to general artificial intelligence—is not yet clear. But what is certain is that the success of GPT has made people want to use AI to “reinvent everything” just like the Internet revolution back then.
The timely launch of GPT-4 has given people who are eager for transformation and change a shot in the arm.
The following is the founder of GoMaskLi ZhifeiFor this evaluation of GPT-4:
- Amazing ability: If the GPT3 series models have proved to everyone that AI can do multiple tasks in one model (that is, the so-called general purpose), GPT-4 is already human-level in many tasks. Outperforms 90% of humans in academic exams. How should all kinds of primary and secondary schools, universities and professional education respond?
- Efficient alchemy: The GPT-4 model is too large and the cost of each training is very high, but at the same time training the model is like alchemy requires a lot of experiments. If these experiments have to be run in a real environment, no one can bear it. To this end, OpenAI has engaged in the so-called predictable scaling, which means that the results of each experiment (loss and human eval) can be predicted at a cost of one ten thousandth. This has upgraded the large-scale model training from a chance alchemy to a “semi-scientific” alchemy.
- Crowdsourcing evaluation: This time, an open source OpenAI evals is provided, which is to crowdsource the task of systematically finding faults in the system to developers and enthusiasts, which not only makes everyone have a sense of participation, but also allows everyone to help evaluate for free Improve the system, kill two birds with one stone.
- Project filling: a system card was also released this time, which probably means that in order to alleviate the serious nonsense problem, the system has applied various patches for pre-processing and post-processing, and will open the code later to crowdsource the patching ability for everyone. This marks that LLM has finally entered various messy engineering hacks from an elegant and simple next token prediction task.
- Multimodality: The much-anticipated multimodality is actually not much different from the multimodal capabilities described in many papers on the market. The main difference is that it combines the few-shot of the text model with the logic chain (COT), which is also in the A text LLM with good basic capabilities plus the benefits of multimodality (other multimodal models feel that LLM is too weak).
- Planned King Explosion: The GPT4 model was refined in August last year, but it was only released today. The explanation is that it took a lot of time to do a lot of testing and various leaks and fillings. Google engineer Fu probably has to stay up all night to follow?
- No longer open: The paper does not talk about model parameters and data scale, nor does it talk about any technical principles. It is explained that it is for the benefit of everyone, and I am afraid that everyone will learn how to use GPT4 to do evil. I personally do not agree with this kind of money-free here practice.
- Unity: It took three pages in the paper to list the contributors to each part of the system. It is estimated that there are more than 100 people, which once again reflects the state of team members in OpenAI who are united and highly collaborative.