“Artist Dali + Robot WALL-E”, Metaverse and Zero Marginal Content

“AI-generated image” is here, is “AI-generated video” still far away?

Shenyi Bureau is a compiling team under 36氪, focusing on technology, business, workplace, life and other fields, focusing on introducing foreign new technologies, new ideas, and new trends.

Editor’s note: OpenAI has recently made a big thing, DALLE 2, which can create images based on text. Judging from some examples released on the Internet, the effect is quite shocking, and some even capture the soul of the text. So what is the significance of such an AI? Well-known tech bloggers analyzed the co-evolution patterns of different forms of content creation means, and summed up the corresponding economic impact, arguing that this kind of AI will give economics to the future of the Metaverse. The future of the Internet will be closer to us, and even stranger, when virtual worlds can be created with virtual content that can be completely customized for individuals at near zero cost. Article from compilation.

Focus on:

Gaming is at the forefront of technological development, leading the way from text to images to video to 3D

Social networks have undergone a similar medium evolution to gaming, but delayed by two decades

TicTok’s zero-cost UGC + pure algorithmic scheduling of dynamic content has network effects

Zero-Cost Content from DALL-E 2 Offers Economics for Metaverse Future

Last week, OpenAI released its text-generating image tool DALL-E 2 (DALL-E comes from a combination of the artist “Dalí” and the robot “WALL-E”); this Twitter post from @BecomingCritter shows a large number of generated images Examples include this “Teddy bears launch new artificial intelligence research on the moon in the 1980s”:

"Artist Dali + Robot WALL-E", Metaverse and Zero Marginal Content

Teddy bears working on new AI research on the moon in the 1980s

Image generated by the text “A photo of a quaint florist’s storefront with a clean white façade full of greenery, an open door, and a large window”:

"Artist Dali + Robot WALL-E", Metaverse and Zero Marginal Content

A photo of the storefront of a quaint flower shop with a clean white facade with greenery, an open door and a large window

However, the most appropriate one is this one, “a human being bathed in the sun of AGI utopia”:

"Artist Dali + Robot WALL-E", Metaverse and Zero Marginal Content

A human being bathed in the sun of AGI utopia

OpenAI has a video describing the DALL-E on its website. Although OpenAI’s promotional video does mention some of the shortcomings of DALL-E, it is still very optimistic about its possibilities. Some excerpts from it:

Dall-E 2 is a new AI system from OpenAI that transforms simple textual descriptions such as “koala dunk” into lifelike images never before seen. The DALL-E 2 can also edit and retouch photos with realistic results…

DALL-E is created by training a neural network based on images and their textual descriptions. Through deep learning, it can not only understand individual objects like koala bears and motorcycles, but also learn the relationships between objects, and when you ask DALL-E to generate an image of a “koala on a motorcycle”, it knows How to create such a picture, or a picture in relation to any other object or action.

The DALL-E study has three main outcomes: First, it helps people express themselves visually like never before.Second, AI-generated images can tell us a lot about whether the system understands us, or is just repeating what it has been taught. Third, DALL-E can help humans understand how AI systems see and understand our world. This is a critical part of developing useful and safe AI…

What’s exciting about the method used to train DALL-E is that it can learn from a variety of other labeled images and then apply it to new images. Give it a picture of a monkey, and DALL-E can deduce what it’s going to look like when it’s doing something it’s never done before, like paying taxes with a funny hat on. With imaginative humans and systems empowered by ingenuity, how can humans and machines work together to create new things and amplify our creative potential? DALL-E is a powerful example.

The phrase “human-machine collaboration” may cause some people to question: at first glance, DALL-E and artists and illustrators seem to be in a competitive relationship; however, there is another point of view, DALL-E points to the future of the Metaverse a major missing part.

Game and Media Evolution

Gaming has long been at the forefront of technological development, and as far as the media is concerned, this is certainly the case. The earliest computer games were nothing more than words:

"Artist Dali + Robot WALL-E", Metaverse and Zero Marginal Content

The Oregon Road game screenshot

This is followed by graphic games, generally of the bitmap type; I remember playing the game Where in the world is Carmen San Diego a lot of times in the library:

"Artist Dali + Robot WALL-E", Metaverse and Zero Marginal Content

Very quickly, games started introducing action, where you could guide sprites in a 2D world; then 3D came along, and we’ve been working on making 3D games more realistic for the better part of the last 25 years. However, almost all of these games are 3D images projected on a 2D screen. Virtual reality provides the illusion that we are in a game.

Still, this evolution comes with challenges: creating more realistic 3D games means creating more realistic image textures to whitewash all those polygons; in the context of virtual reality, this problem is only magnified. It’s one of the reasons that even open-world games are ultimately limited in scope, and that kind of gameplay is largely deterministic: by knowing where you’re going and all the options to get there, developing People can create all the necessary assets ahead of time to provide an immersive experience.

That’s not to say that games are anything but procedurally generated roguelikes (a subgenre of role-playing games that feature a series of randomly generated levels of dungeons, turn-based combat, tile-based graphics, and character perma-death) games. No random element: One of the most obvious ways to provide an element of unpredictability is to have humans play against each other, albeit in a well-defined and controlled environment.

Social Content vs User Generated Content

Social networking has undergone a similar medium evolution to gaming, but delayed by two decades. The earliest forms of social networking on the Web were text-based bulletin boards and user groups (USENET). Later, e-mail, AOL chat rooms, and forums became popular. Facebook came into existence in the mid-2000s. Part of its popularity is the addition of images. Instagram was a picture-only social network, but soon added video, which is what TikTok is all about. Now, especially in the past few years, video conferencing via apps like Zoom or Facetime has started to provide 3D images on 2D screens.

Still, the importance of media to social networking has always been low, simply because the social part of it is inherently fun. Humans love to communicate with other people, even if that requires dialing up to any BBS, downloading a message, writing a reply, and then dialing back to send the message. Games may be largely deterministic, but humans are full of surprises.

Furthermore, it means that social networks are much cheaper: platforms don’t need to generate all the content themselves, but users themselves. This makes it harder for new platforms to emerge because you need users to attract users, but it also makes such platforms more sticky than any game (or, to put it another way, the most sticky games have inherent network effects).

Dynamic Messaging and Algorithms

Aside from time, the first iteration of a social network had no specific algorithmic component: newer posts were at the top (or bottom). That began to change with the introduction of Facebook’s News Feed in 2006. Now, instead of visiting all of your friends’ pages, you can simply browse the News Feed, which determines from the start what to include and in what order.

Over time, News Feed has evolved from a relatively simple algorithm to one powered by machine learning, with results so puzzling that it took Facebook six months to fix a recent ranking error. The impact is huge: as algorithm-driven News Feed gets better, not only Facebook but Instagram has seen a massive increase in engagement and growth; News Feed is also great for monetization because it determines what you see The same kinds of signals for content can also affect the ads shown to you.

However, the reason algorithm-driven newsfeeds are not discussed in the same chapter as social networks is because the ultimate example of its power is not social networks at all: it’s TikTok. Sure, TikTok is all user-generated content, but its key difference from Facebook is that the content isn’t limited to your network of connections: TikTok pulls the videos it thinks are most interesting to you from across the web. I explained why this is Facebook’s blind spot in 2020:

Interestingly, Facebook misses this out, and here’s why: First, Facebook sees itself as a social network, so it’s reluctant to see it as a liability. Second, Facebook’s treatment of Snapchat reinforces this view. The point of my article is that Facebook’s use of Instagram’s social network to stop Snapchat’s growth only reinforces the “web is Facebook’s greatest asset” and makes TikTok a bigger blind spot.

TikTok combines two things, one is user-generated content with zero-cost features, and the other is purely algorithmic dynamic content separated from the network; this combination has network effects, because TikTok needs a lot of content to choose from, but It does not require a specific network.

The Metaverse of Machine Learning

I know the Metaverse is too 2021, but what strikes me is that examples in science fiction, including Avalanche and Ready Player One, are actually very game-like in implementation. Their virtual world is created by a visionary company, or by a visionary developer who also develops a deterministic game that competes for ultimate ownership of the virtual world. Yes, third parties can and do build experiences with strong social components, most notably Da5id’s Black Sun club in Avalanche, but its core mechanics, and its core economy, are closer to multiplayer , closer than anything else.

However, this is extremely challenging in the real world: keep in mind that game development is expensive, the art of the game is especially expensive, and the higher the cost, the more immersive the experience. Social media, on the other hand, is cheap because it uses user-generated content, but that content is generally embodied in more basic media, such as text, pictures, etc., and video is a relatively recent phenomenon. Of course, content doesn’t have to be restricted to your network – algorithms can make any content on the network available to any user.

The fascinating thing about DALL-E is that it points to a future where these three trends can be combined. At the end of the day, DALL-E is ultimately a product of human-generated content, just like its cousin GPT-3. Of course, the latter is for text generation, while DALL-E is for image generation. But note that this is moving from text to images; machine learning-generated video is next. Granted, this could take a few years; video is a harder problem, and a responsive 3D environment is even harder, but this is how the industry has traveled before:

  • Game developers push the limits of text, then images, then video, then 3D
  • Social media reduces the cost of creating text content to zero first, then images, then video
  • Machine learning models can now create text and images with zero marginal cost

In the long run, this points to a vision of a Metaverse that is far less deterministic than a typical video game, yet far richer than social media in terms of the richness of generated content. Imagine an environment not drawn by an artist but created by artificial intelligence: this not only increases the possibilities but, crucially, reduces the cost.

Zero Marginal Content

We can also think about DALL-E and GPT and similar machine learning models in a different way, going back to a point I have always advocated, that the Internet is a transformative technology that only the printing press can match. The latter is revolutionary in that it greatly reduces the marginal cost of consumption. The following is from The Internet and the Third Class:

At the same time, the economics of printing books are fundamentally different from those of hand-copying.The latter is purely an operating expense: the output depends entirely on the input of labor. The former, in turn, is mostly capital expenditure: first, you have to build the printing press, and second, set up the movable type for a book. The best way to pay for these significant up-front costs is to make as many copies of a book as possible for sale.

So, how can you maximize the number of copies that can be sold? The answer is to print in the most widely spoken dialect of a particular language, which in turn will incentivise adoption and standardisation of the language across Europe. In turn, this in turn deepened the affinity between city-states that spoke a common language, especially as a common culture developed over decades around books and later newspapers. The speed at which this merger took place varied, with England and France hundreds of years earlier than Germany and Italy, but in almost all cases the first estate was not the clergy of the Catholic Church, but the monarchs of the state, even though these monarchs transferred power It was handed over to a new type of aristocratic elite typified by Burke.

The Internet has had two effects: First, the marginal cost of consumption has been reduced to zero. Even with a printing press, you still need to print the physical thing and distribute it, which costs money; at the same time, sending this article you’re seeing now to anyone interested in the world doesn’t actually cost money. This completely upended the publishing industry and destroyed the power of gatekeepers.

However, another effect occurs on the supply side. I wrote about TikTok in Mistakes and Memes:

“Facebook may also be attractive because of the content it presents itself, it doesn’t matter who presents it” This sentence can actually be used to describe TikTok. This sentence describes how Tiktok is wrong. The appeal of the latter lies in what it presents, it doesn’t matter who created it… In other words, I’m too focused on demand (which is the crux of Aggregation Theory), so I don’t care about supply The evolution of the side has not been given enough thought. User-generated content doesn’t have to be pictures of cats and dogs and political complaints from people in one’s network. It could also be the basis for a new type of network in which Metcalfe’s law results not in the number of connections available to any one node, but in the number of inputs customized to dynamic messages.

Machine learning to generate content is the next step after TikTok: GPT and DALL-E and other similar models do not source content from anywhere on the web, but generate new content from content at zero marginal cost. This is where the economics of the Metaverse will finally work: Virtual worlds require the creation of fully customizable virtual content at near zero cost.

Of course, DALL-E asks us many other questions, many of them philosophical. There has been a lot of discussion on this topic last week, and there should be more in the future. Still, its economic implications matter, and after last week’s DALL-E launch, the future of the internet is closer and weirder than ever.

Posted by:CoinYuppie,Reprinted with attribution to:https://coinyuppie.com/artist-dali-robot-wall-e-metaverse-and-zero-marginal-content/
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2022-04-22 10:41
Next 2022-04-22 10:44

Related articles