June 4, 2023

Founder of Stability AI: Large models should not only belong to giants | Geek Park

Tracing back to the AI ​​arms race triggered by ChatGPT, we can go back to July last year, when the image generation model Stability Diffusion was open sourced (hereinafter referred to as SD). This model, which was released after Open AI’s painting model DALL-E2, attracted the most attention and discussions at the time because of its open source source code and its openness to the public. (Application is required to use DALL-E2, the pass rate is very low)

Emad Mostaque, the founder of Stability AI, the company behind the model, an Indian who grew up in the UK, may not have realized at the time that he pressed the accelerator key of AI “restart”.

The popularity of SD has affected Open AI’s subsequent decision-making: the management decided to postpone GPT-4, which has been researching, and quickly launch ChatGPT, which is available to ordinary users and has a friendly interface.

This brought a story that everyone is familiar with – ChatGPT set a record of more than 100 million visitors in 2 months, opened the battle between the giants between Microsoft and Google, and marked the advent of the era of AI large models.

Not long ago, Musk criticized Open AI for departing from the original intention of open source and non-profit, which is also the controversy that Open AI has always had. How to manage and operate large models is also one of the key differences between Stability AI and Open AI.

Emad Mostaque believes that large models need more supervision, rather than operating within large companies, and the openness of the community system is also crucial.

The 39-year-old has spent most of his career as a hedge fund analyst and father of an autistic child. He knows some AI technology, but thinks that what he does more often is “mechanism design”, which integrates the pictures of different modes. He doesn’t like the rules of the game where the giants control the traffic algorithm – behind it is the manipulation of people, and even chatted with all the former employees in the documentary “The Social Dilemma”.


Emad Mostaque|Source: Stability AI

“As a business, we are just part of the community”the company’s CTO said in the previous IF 2023 sharing.Stability AI will also insist on open source models, so that people around the world can have access to the latest technology.

In supporting the open source community, Stability AI recently funded the research community Eleuther AI with several startups such as Hugging Face. In 2022, Stability AI donated cloud computing resources to the organization. The donated resources came from another tech giant, Amazon.

Stability AI signed an agreement with it and obtained more than 4000 Nvidia A100. Prior to this, computing resources basically came from 32 GPUs purchased by the founders themselves.

And according to Reuters, Stability AI may be seeking its next round of valuation at a valuation of $4 billion. After the last round of financing, the company became a new unicorn with a valuation of US$1 billion.

Judging from the existing charging model, Stability AI is basically similar to Open AI, charging through API, or charging individual users to provide value-added services.In addition, Stability AI will focus on the creative industry and customize exclusive models for content production companies.The company already has a joint venture with Indian investment firm Eros Investments, which has a library of 12,000 movies.

In the era of giants laying out large models, the exploration path of Stability AI is undoubtedly worthy of attention. This article is a compilation of his two podcast interviews last November and October, from Weight&Biases and Hard Fork respectively. Before founding Stability AI, he had experience in different fields such as AI pharmaceuticals and technology public welfare, which undoubtedly influenced his thinking on how technology should be created and used.

In the new crown project

access to large models

I started my career in Mathematics and Computer Science at Oxford University. During the Gap Year, I was an enterprise developer. Then I did hedge fund management for many years, and I was a big investor in AI and video games. Then when my son was diagnosed with autism, I took a break to use AI for drug discovery. Analyzing biomolecular pathways of neurotransmitters, reviewing literature, repurposing drugs to help improve some symptoms; I also advise some hedge funds and other governments on AI and technology, geopolitics, etc.

I started this experience about 12 years ago. A few years ago, I was one of the chief architects of CAIAC, a project called Collective and Augmented Intelligence Against COVID-19 that was launched at Stanford University in July 2020 to Take the world’s coronavirus disease knowledge, compress it with AI, and make it useful. That was my first real exposure to these new models.

I was like, “Oh my god, this is so important. They’re getting good enough, fast enough, and soon enough cheap enough to go anywhere.” And, “All these technologies that are so powerful Is it reasonable to be dominated by big corporations who believe their strengths are?” No, let’s move forward.

I have some experience with AI and other things, but most of the time,What I do is see the big picture and patterns and put them together, kind of like mechanical design.

Stability AI the establishment of

Three years ago, we had the idea of ​​Stability AI. The first thing my co-founders and I did was to participate in the Global XPRIZE for Learning (Note: A public welfare project that uses technology to help poor children learn to read, write and count), with a $15 million prize to the first app that can teach literacy and numeracy without the Internet.

We deployed tablets to refugee camps, “What happens if we use AI to make it better and more powerful?” We haven’t used AI yet, but we’ve just done a randomized controlled trial, and in 13 months of education, Refugees in refugee camps are taught to read, write and count for one hour each day.

Two years ago, we built the Stability AIto carry out UN-backed work on COVID-19 in AI, only to run into many bureaucratic and other problems.

Initially, we helped support communities like Eleuther and LAION. My thinking is, it’s like the Web3 Dow. Like “Let’s reward all community members and bring them together”.But after about a month, we realized that scale and service of commercial open source software was the answer.

While I’m funding the entire open source art space, I think at least the next year will be close to the quality we’re seeing now. I think it’s (due to) the speed of knowledge compression, the ease of use, and being able to connect to some people’s devices. This surprised me, I thought it would be at least a few more years before we got there.

Stable Diffusion is the first model that is good enough, fast enough, and cheap enough for anyone to run. It’s like a 2G file with data from 100,000G. I think it was this crazy thing that made it explode on a massive scale, that was the main catalyst.

Stability is basically built on the belief that these new models that we have — these Transformer-based models, and models like them — are critical to unleashing human potential in some of the most powerful technologies we’ve ever seen. It is vital that they be open-sourced so that people can build on them and use them,Believe that this is not only a great business model, but critical to bridging the digital divide and making these technologies as accessible as possible.

Stability AIIts official mission is to build foundations to activate human potential, with the motto “Making People Happier”.We basically catalyze the building of open source AI models, and then we take those models, extend and customize them for customers.

Stability AI Large models can be open sourced,

And Big Tech can’t

We have 100 employees and a community of 100,000 people. This is where our strength comes from, we come from all over the world. We also give them a revenue share, which is weird. We give them favors because we try to treat them as artists.

I talked to everyone in the documentary “The Social Dilemma.” From the perspective of the big tech companies, the reason why the big tech companies build the panopticon is because there is nothing they can do, they have no choice. We’re giving it a choice now. We’re working with the big tech companies right now to give them an outlet to be a part of this thing. We are a bit like Switzerland, everyone can participate as a neutral party.

Engineers in particular, want to make things free and open, but at the same time have governance, trust and security. We were given guidance and opinions on the subject, and came to find a middle ground, because it couldn’t be the extreme of pure liberalism, and the other extreme where no one owns anything.


The Social Dilemma Poster

I think having these factors will help us do that. With venture capital, we raise money on our own terms, so there is complete independence. And unlike OpenAI, which raised $1 billion from Microsoft, which has an exclusive license to the technology. Inconsistent incentives are hard to fight. We want the community, our team, and our position to help us balance that, and it’s a good thing for us to be in that position, and no one else really occupies that position.

Likewise, we are in active negotiations with regulators. The role of the public is community and extension. So we released Stability Diffusion, and it went a little crazy. 1000s of projects have sprung up.

The community would say, why doesn’t the stability AI step in and coordinate and have an official spokesperson? We said, okay. So we went in and made Reddit the official Reddit. How dare you, they say? (Really) corporate overlord.

We just want to make things more organized. Then we had to send it back. There’s always this pushing and shoving relationship.I think community comes first, but not direct democracy.We’re going to make mistakes, we’re going to do the right thing, and we’re going to get more and more scrutiny because what we do actually matters.

The big tech companies are in the unenviable position because they can’t release it for PR. It’s like Prometheus fire from the gods — it’s next-gen communication, it’s crazy — it can be used to burn things down. It can be used to activate the light of humanity. But the only way we’ll find out how to deal with it is by working together.That’s why I want to work with big tech companies, I want to work with small tech companies, I want to work with regulators, I want to work with everyone, trying to figure out the right way.

Computing resources are a public good

Now we have a lot of control because we are the fastest computer provider. Part of what we’re trying to do is give researchers the ability to use their own computers and at the same time stimulate some clusters of countries to be more open. No more 6-12 months to get A100 or H100 access.

In my opinion, it should be a little more diverse. All parties are at the table, not centralized. This is a deliberate move we are taking to gradually enable more and more distributed endpoints from an ethical and moral point of view. It also works for us from a business point of view.

If we’re supposed to be in control, we don’t know what’s going to happen there. Coordinating the entire community will take a lot of effort, but probably won’t be positive. Assuming 100 million, 1 billion people are involved as we expect, it’s a lot of work to coordinate all the pieces. Instead, it should be a self-contained entity from which all voices can be heard.

We also have our own characters. We went from being the main provider of computers to being the provider of computers.It is hoped that all computers in the world can be provided to do this more efficiently. Because it is a public good.This is good for us because it saves us money, and the creation of open source models costs us nothing.

It made sense for us to be the first infrastructure layer, and then get to work, build a business model to scale that.

Based on the basic model,

Communities can fork

(Team divergence) Happened after Stable Diffusion was released. People said, “This can be used for unsafe work, and we don’t feel comfortable supporting it inside Stability.”

As a team, we had discussions and decided not to release any models through Stability AI that were not safe to work. Some people are not happy with this. Most people can live with that, but it’s easier because it’s a team decision.

On a community basis, this is the governance structure. We are working on EleutherAI, and we want to turn it into an independent community, because it has many different entities and many different points of view. This is a governance structure that is just getting started. But we need to make it adaptive because we’re not sure where these things are going to go.

Currently, Stability AI has a lot of control over GPU access, and similar resources. This should not happen in the future,Because no single entity—whether it’s us, OpenAI, DeepMind, or another entity—should control this technology, because it’s a mutual good.

We want to be contributors to independent non-profit organizations, not control the technology, and then play a role in supporting and promoting open source. I think what’s going to happen eventually is that if people really disagree, they fork. We’ve seen it in every community. This is the beauty of open source.

You can fork the model. I think the key is the baseline model. That’s a lot of computation up front, and relatively little computation to fine-tune and run. This is the exact opposite of the current model at Google or Facebook, where relatively little computation is done to turn it into a database structure, and most of the computation is done at inference time. It’s a paradigm shift, but it’s not a community fork.

A community fork is a disagreement about the security or insecurity of work, such as data sets, “crawlers or permissions” or similar things. I think around some key issues, there will be different communities.


Stability AI official website

prevent giants from using

mockups manipulating people

Large molds are the problem. We should have more oversight on this, in case some AI combination is right but dangerous.

Imagine Apple, Amazon, Google integrating emotional text-to-speech into their models. Siri suddenly has a very seductive voice and whispers that you should buy something. You may buy more. Will this be regulated? Not yet, and won’t be in time.

Making these models public makes people think, “Actually, this might be something that should be regulated.” If something is regulated, that’s okay because it’s a democratic process.

Companies that use this technology to manipulate us—or, to be precise, the advertising model—I don’t think it’s appropriate.People understand this technology, means people will be more picky curated output, and then it will be a mixed product of detection technology. It’s a complex debate that largely can’t be decided in San Francisco. This is important because technology inevitably exists in the world.

If you really poke people and say “don’t want Indians to use this technology,” they’ll say, “of course we want to!” “when?” “when it’s safe” “who decides?” “we do “So they’re not smart enough to make a decision?” “No, they need an education.” And then it gets ugly, right? Again, I think it’s understandable because it’s both scary and grim.

Stability AI business model:

Provide custom models

These models—and the data they run on—can do almost anything. If you aggregate different players to achieve quality, and open source the version, where is the value? If the model can do anything, then its value cannot possibly reside in the model, the value must lie elsewhere. Scale allows me to do that, we have the API, and DreamStudio.AI, which is our own version of the implementation. (Note: DreamStudio is a user interface that allows everyone to use SD models directly. The first 500 images are generated for free, and you can recharge later

(Follow-up: So every time someone creates an image through your API, you get a cut?) Yes, or through Dream Studio, we have a nice cut. The second part is the service, very few people can build these models, but every content provider in the world wants to have their own version of the model. Do you want a Hello Kitty model, or you want a Bollywood model.

Basically the value is in taking Hello Kitty into the market as a business and turning its assets into interactive assets. It can be used in the metaverse, it can be used in new experiences, it can be used anywhere. Then develop tools that give them access to their models, make their models accessible to others, and ship those tools around the world. Our main job as a business is basically to cater to the needs of large corporations and then help other people with the software we build. For example DreamStudio Lite is just a very basic software. DreamStudio Pro is a full-featured animation suite with storyboarding, fine-tuning capabilities, the ability to create your own models, and more.


DreamStudio AI Recharge interface

We will (partnership) with a large content library. We call it the multiverse because we think everyone should have their own model. So we embed teams there, create models for them, and share the benefits. You have service contracts and all these contracts are built around it because they’re a specialized thing now.

I think that’s where the strength of sustainable development comes in: the mix of content and experience, and content. Case in point: our relationship with Bollywood’s Eros (Note: Indian Film Corporation) did a deal, Netflix of India, with 200 million daily active users.

You can have a rich generative future where everyone can personalize and contextualize these things. The entire media space will be generatively assisted. I don’t think it replaces, it enhances. From a business point of view, media is by far the most profitable, and it can fund a lot of other things.

I think it’s a reasonable model that Disney and Paramount would end up having to change their entire dossier. Like the VHS to DVD uplift, because you know how hard it is to do these models. We just thought, “What can be the best thing to benefit the community and attract assets?” That’s what media means to us.

Decentralized decision-making through the community

If you’re an active member of any of the communities: from Harmony AI for music, to Eleuther for language models, to LAION for images, there’s a good chance you’re getting compute resources this way. It can be one A100 to five hundred A100, depending on how good your stuff is, especially if you have members of the community as your team. This is the main method.

We’re building a grants portal, we’re working with some universities, and we’re figuring out what to do, maybe something like “Google Colab” (Note: A cloud service provided by Google Research that allows anyone to write and execute arbitrary Python code through a browser), allowing people to unlock things from day one.

This is also in line with the next phase of our project. We have funded a number of PhDs who are active members of the community. We plan to fund 100 in 2023 and will also provide dedicated computing support for labs and projects. There is an independent board responsible for making decisions because there will always be conflicts between our business and the wider business.

Why are we funding OpenBioML, (Note: amachine learningAn open, collaborative research laboratory at the intersection of biology) because it is useful. Currently there is no business logic.We want to maintain a portfolio that supports the entire ecosystem so we have a good position in it, and then focus on some of the business side of things, currently generative media.

What we’re doing is basically, let’s say you create Facebook and Twitter without ad incentives, you’re also accelerating the use of tools to balance that?

We trust the community and trust this decentralization, not centralized coordination, where these decisions are made separately. These algorithms are locked away and cannot be interrogated. They are incomprehensible. It’s not perfect, you can question the dataset, you can question the model, you can question the code of Stable Diffusion and other things. Again, we believe this is a public good and public right. We’ve seen it improve on bias, trust, and safety all the time. In large corporations, the motive is not the public good.

We want to be open to discussion. So we just announced a $200,000 bounty for the best open source deepfake detector. We spent 10 times the computing power we had on the image generation model, and on the image recognition model, it will be used to identify bad, illegal and other content. So that’s the approach we’re taking, trusting people, trusting the community, and not having one centralized, unelected entity control the most powerful technology in the world.

I believe this is one of the ultimate tools for freedom of expression. I believe speech should be free. I think that’s where the power lies. Strength is in diversity.

The future of technology:

Let people express and communicate better

For us, the easiest way to communicate is to talk with words. The next step is to write emails or chat with each other. It is very difficult to write a truly excellent piece of work. The most difficult thing is to communicate visually as a species. That’s why artists are great. We’ve all used slides and got stuck there too.With the combination of language model, vision model, language generation model and code model, you don’t need PowerPoint anymore. You can speak while creating beautiful slideshows every time.

Humans can now finally communicate through text and language models—you’ve seen how software like Copy.ai, Sudowwrite, and Jasper make this easier—and now through vision. The next step is 3D. This is a huge change in the way humans communicate.

Previous iterations of the web have been all about AI being used to target ads. Now it’s about something else, moving from consumption to creation. My focus has been on this area as the main driver.

In terms of influence and global things, on a human level,The ability to dynamically switch between structured and unstructured data is a very important thing.Because when it’s combined with search enhancements, and other things that check factual accuracy, being able to understand principles means you can write reports, you can do legal matters, you can break free from bureaucracy.

It’s the first technology that does so much, and it’s so general that it’s not sure where its value lies. However, I do see the value in anyone being able to better express themselves and communicate.


Stability AI Published 3D generation software Blender|Source: Company official website

Opening up is dangerous,

But the advantages outweigh the disadvantages

We have many tools like photography and others. If you create a copyrighted entity with Photoshop and then sell it, that’s your fault. These tools can’t do anything by themselves. You input a 2G file, and it creates an output. So we have to go back to the original human nature.

What it does now is it opens access, like a printing press opens access. Now anyone can be visually creative. Like the first version I did for my seven-year-old daughter because she said: Dad, I want to create, it’s fun. Here’s painting, look at all the stuff you’re doing. She created a great piece called “Happy Big” which was sold for $3,500 as an NFT for India’s COVID Relief and she gave all the money away. I thought to myself, my God, this is a big deal. I said, why don’t you make more? She made eight more. She said, Dad, the unique value of a person will only rise with the development of the industry. So she plans to pay for her own college tuition.

Either way, the technology is on the rise. We saw that and said, well, we have a responsibility to direct this as best we can before we let other people into this room. I think by doing it separately, you never know what it’s going to look like. But when someone breaks, they probably break it from a not-so-good angle. I’m terrified of it. Because this technology is being used for very nefarious things.

However, in my opinion, the good far outweighs the bad because there is nothing more important than creating. We are now in a consumer society. If you look at what art therapy does, look at the things around you, the joy that goes with creating and people using this technology, why do we shut it off from the world? Who is self-confessed to decide this? I think this is wrong. This is a blockade of means.

The possibility of any form of evil means we cannot have anything.The best thing is when we grow stronger together as a community to fight evil and promote good.

Ewen Eagle

I am the founder of Urbantechstory, a Technology based blog. where you find all kinds of trending technology, gaming news, and much more.

View all posts by Ewen Eagle →

Leave a Reply

Your email address will not be published.