The Race For the Best Stock Footage

Yesterday marked the “public release” of Sora, OpenAI’s video generator. Of course they had to almost immediately shutdown signups so was it released? We’ll need a team of philosophers to weigh in on that.

The release was also five days after Google made their video generator, Veo, “available” by launching a private preview. I thought it was already privately previewed for Donald Glover who they said was making something with it at Google I/O.

Hilariously, The Verge published this waaaaay back last week:

With Google’s video model now in the wild, OpenAI is notably behind its competitors and running out of time to make good on its promise to release Sora by the end of 2024. We’re already seeing AI-generated content appearing in ads like Coca-Cola’s recent holiday campaign, and companies have an incentive not to wait around for Sora – according to Google, 86 percent of organizations already using generative AI are seeing an increase in revenue.

Oh no! Everyone who can’t be in the Veo preview, or missed the ability to sign up for the few minutes Sora was available is missing out on the random factoid of an 86% increase in revenue! That’s a big percentage of things that are definitely related!

I’m concerned with the breathless way that people discuss these products. That the application of these technologies themselves will have a positive monetary impact, and that there is a race where people are already behind in doing that. This kind of talk pushes the people involved with the money-side of things (like producers) to consider these unreliable tools as replacements for shooting video, or doing effects work. Like we’ve seen in that awful Toys”R”Us video ~~made of lies~~ with Sora and visual effects, and the recent Coca-Cola ad.

These things make stock footage from other stock footage and whatever other material they scraped, licensed, or were fed. The models didn’t go to film school, they don’t have conflicted feelings about Steven Spielberg’s later career, they can’t go shoot their first movie with a 27mm lens, they just mush stock footage together to make new stock footage.

The ad spots (with VFX intervention) still look like sizzle reels for a pitch, and not a finished product. Even that Coca-Cola one, which was based on a previous ad, but now with random moments inserted.

There’s nothing wrong with stock footage, but you have to be pretty incompetent to assume that Sora and Veo are currently replacements for material shot for a particular purpose any more than stock footage is. You use stock footage as supplementary assets, not the whole enchilada.

Morally and creatively bankrupt people might excuse these stock footage montages by saying that the public doesn’t mind them, and can’t tell what’s real and what’s not. That critics are looking for faults (Disclosure: I’m absolutely looking for faults, but I don’t have to look very hard). They might correctly surmise that the tools will improve, like all “AI” tools have improved, and will require less artist intervention. However, that improvement is in temporal stability, or weights on physics, not in creativity or originality.

The final result of this endeavor is not merely flooding the market with very similar, and indistinct, ads of slow-motion smiles.

As for narratives longer than a typical ad? I’d send you right back to what I wrote initially about Sora because I see nothing in these demos that changes my mind about that at all.

Marques Brownlee has a YouTube video where he posts about his thoughts, and notes the areas where he feels it performs well, and doesn’t (like the leg swapping thing still happening, and object permanence). He is somehow wowed by the garbled footage of two news anchors discussing a “TRAVEL ADDIAVISTOfRIEY” for “CARA NEWS NEWS” but … I don’t know why? From his Threads thread:

This video has a bunch of garbled text, the telltale signs of AI generated videos. But the cutaways, the moving text ticker, the news-style shots… those were all things SORA decided to do on its own, and those news anchors looked very… real

Sora didn’t decide to do them, the footage Sora sourced news anchors from likely had those elements. It’s pattern matching and those things are part of the pattern. You’re unlikely to ever reverse engineer exactly what material went into making the Sora news anchor video, but ask yourself why it’s better than the stock footage of news anchors from iStock or Shutterstock? You can even get those with assets to make your own specific pieces if you needed it for storytelling. Like if you needed specific text or the client wanted to change the color of the graphics. Is Sora better because it’s technology?

Remember that this kind of news stock footage is the stuff that goes in an out-of-focus TV in the background of a shot, or tiled in some TV wall with the audio muted. We’ve all seen that sort of thing used on TV and film mixed in with news stuff that was shot specifically for the story being told. Something fun, like the intro to a dystopia, or what have you.

These kinds of stock elements cost $60, and you can have them in any resolution you like without having to wait for anything other than a download. AI isn’t really saving money, and all those graphics need to be replaced so it’s not like it made something uniquely suited to your needs.

Potentially, in the future there will be audio synced to synthetic voices, it will have non-garbled text that can exactly match the prompt, and then it wouldn’t be used as stock footage. That future, purpose-built performance will be in place of news anchors that would have been filmed specifically for a project, and motion graphics put together by an artist. It also assumes a whole other level of this technology that is not being shown at all, and has many other ramifications I’ll discuss later.

Right now, when tech reporters and finance journalists write about the impact of video generators, it’s as if we’re in a mad rush to get to that state of labor-less money-generation. That the end goal is replacing actors with smiling simulacra. A grinning kid assembled from the finest training data from other grinning kids that they would have ordinarily had to pay.

The reality is that this is a race to make more expensive stock footage that might malfunction and need to be repaired under time and budgetary constraints dictated by what someone reckoned the technology could do. Then the money people will need to find money in their project budgeted for Sora to have very expensive last minute work under a time crunch.

Oh the director wanted to change the color of something which made the model associate it with a different colored object that it was trained on, and they don’t like the new shape of the stuff in the output even though it’s the right color?

There’s no file to open and edit. All the work has to be done on top of the Sora output as if it was photography, or as a total replacement. Maybe they can extract and change the color from the prior version.

How could anyone have foreseen difficulty in making a blackbox product spit out final imagery to exact specifications? No one could have known! They watched that MKBHD video where he added a golf course to the cliffs, and that worked in that instance.

From my prior Sora post:

OpenAI and Google are both selling these video generators as technological breakthroughs in filmmaking. The reality is that it’s artifacting stock footage.

Bad clients, and bad producers will tell their editors to put Sora or Veo output in to the initial edit, then they’ll turn to a VFX house and say that the shots are “90% there” and they “just need someone to take it across the finish line.”

How do I know this? Because that happens with stock footage and weird composites and retimes that editors make in Avid when clients want to have something in the edit so they can figure it out. Even if the client agrees to replace it, they can get married to how the stock footage or temp looked, or how it was timed (remember that playback speed is a factor).

Ill-conceived ideas about what this technology is currently capable of based on news coverage, or the financier messing around with Sora for a few minutes, is not only a threat to the people that work on film, TV, and commercials, but a threat to those very bozos that want to push hard into these tools as total replacements.

Sure, But It’ll Get Better

I would implore the bozos to look no further than how the majority of the movie-making industry (and self-proclaimed “film nerd” dipshits with social media accounts) have trained the public to devalue “CGI” (ironically, computer generated imagery is generated by people).

Herculean effort has gone into marketing materials about how a movie really built (1/4) of the set, and even the silly things that they do like turn bluescreen and greenscreen gray in those marketing materials to try and obfuscate anything artificial (great job with that, by the way).

From that ridiculous Guardian piece last year, “‘It’s exactly as they’d have done it in the 1910s’: how Barbenheimer is leading the anti-CGI backlash”. The one that opens with a bluescreen photo:

For the past 12 months, Hollywood has been facing a serious case of CGI fatigue, with critics tearing into would-be blockbusters for their over-reliance on it. In the New Yorker, Richard Brody wrote that heavy effects work in Ant-Man 3 “instead of endowing the inanimate with life, subtract it”, while Ellen E Jones wrote in the Guardianthat Little Mermaid was “rendered lifeless” by CGI. The Netflix rom-com You People, starring Jonah Hill, made headlines when it was revealed that the final kiss in the film was done with CGI and the actor Christian Bale didn’t mince words when he said working exclusively in front of green screens on Thor: Love & Thunder was “the definition of monotony”.

As if in response, 2023 has delivered a buffet of practical-effects-driven films to the multiplex. Greta Gerwig used techniques dating back to silent film and soundstage musicals to bring her fantastical, hot-pink vision of Barbieland to life, Christopher Nolan reconstructed Oppenheimer’s Trinity test using miniatures, and Christopher McQuarrie hoisted a train carriage 80ft into the air in order to film Mission: Impossible – Dead Reckoning Part One’sstomach-churning final stunt. Indie films have been getting in on the fun, too: Wes Anderson turned a piece of Spanish farmland into a real town, complete with plumbing and electricity, for Asteroid City; the “penis monster” in Ari Aster’s Beau Is Afraid was made entirely with prosthetics; and the buzzy horror film Talk to Me has been praised for its gory and “disturbingly real” prosthetics.

Never mind that there’s VFX used in every one of those movies (you can check the credits if you don’t believe me) the backlash in public perception is real. The ability to leverage that “discerning” moviegoer to your own project’s benefit has been deemed valuable.

If there’s ever a perfectly stable, perfectly editable, perfectly lip-synced synthetic performance —instead of mushy stock footage— why would the public embrace such a thing when they won’t embrace CGI?

Riddle me this, bozos: What advantage could synthetic performances have in any form of movie marketing, or in winning any awards which are often about knowledge of the actor outside of their performance, or appearance, in the specific project they worked on?

Andy Serkis, who is definitely a real person, has been after an Oscar for years in his “motion capture” roles and no one can stomach the thought of it.

Humans want to see humans perform. They want them to do well. They want to be attracted, or repulsed by them. All our stars start with small roles, and if those small roles are synthetic, how can we have stars? Do the bozos want to market synthetic stars? Good luck. The same goes for TV, and there’s never been more crossover between TV and film than there’s been in the last few years.

Commercially Viable

I don’t think the public is willing to go along with a sea of ads that are all like the Toys”R”Us and Coca-Cola commercials —that are brand marketing montages with a music bed. However, if this gets super-duper stable, and editable then the place I see it most likely being used is endorsements from dead celebrities.

Not living celebrities, mind you, but people who no longer have control over their own image. Here’s a local Fox station talking about the Audrey Hepburn Dove chocolate commercial 10 years ago:

They really capture the full spectrum of responses in that local news coverage, don’t they?

What if it was really cheap to make those ads with Sora or Veo, and didn’t require shooting anything? It was just the generative fees and the licensing rights to the cash-strapped heirs. All the booing and hissing from the people that think it’s creepy won’t really matter. It’ll be “worth a shot” because the barrier to entry will be so low, and if people hate it at least they’ll circulate it widely on social media. What’s the worst that can happen? Free publicity?

AIttention Shoppers

People would probably either ignore influencers, or place them at the bottom of the stack under brand marketing as the most replaceable form of video, but that’s not true at all. If anything influencers are the hardest thing to replace since their whole business proposition is their “authentic selves” as a brand. They live a life beyond any particular brand endorsement which they either perform for their audience, or maybe they’re a comedian with a schtick and they pick brands that align with that.

There’s a more direct retail side of things with accounts that chop up videos from other influencers, rip some product photography, and make scammy sites to sell drop-shipped items. These are run by people that are not influencers but take advantage of the theft of their image. They have no issue with acquiring the likeness, or being generally duplicitous, with today’s technology. They don’t even need Sora or Veo — but think about how they could more precisely tailor their duplicity.

Safiya Nygaard has a good video on this that I’m embedding below because I really think you should watch and it, and not dismiss it out of hand. Influencers are the best at critiquing this kind of commerce.

Platforms are disincentivized to sniff out scams because the scammers pay them, and the people who have been wronged don’t. It seems like the perfect place to apply complex pattern matching software instead of turning a blind eye to collect money from sponsored placements.

Will Sora and Veo make a big difference here? The ability to scam people with copied work hinges on what was copied seeming somewhat authentic, because they are stealing that. Sora and Veo can’t generate authenticity, but they could be used to obfuscate where video clips are sourced from, because they are designed to obfuscate their sources. The over-all quality of the video spots from fake influencers is certainly of very little consequence if they can make sales to a certain level with the rubbish they already use.

OpenAI has a bunch of checkboxes where you agree you have rights to use what they’re uploading, but unless they’re trying to run a scam on intellectual property from a large company then there’s no way to catch it. Finger printing the output would also require the social networks care about those finger prints. It really is mostly the honor system, and would benefit a fly-by-night company if they decided they wanted to take advantage of these tools.

It remains to be seen if the level of work required to to mush together stuff from Sora or Veo is less effort than the current way that they exploit the system.

There’s also the possibility that these “legit” tech companies will fully integrate the falsified shopping experience into their system under some kind of safe harbor excuse.

One of the things to come out of Google I/O 2024 was Product Studio. They show off online merchants generating product photography, videos, 3D assets, and linking to relevant social accounts. From Google’s blog post in May:

Product Studio will also give you the ability to generate videos from just one photo. So, with just the click of a button, you can animate components of still product images to create short videos or playful product GIFs for social media. Product Studio is now available in Australia, Canada, U.K. and U.S. in Merchant Center Next and the Google & YouTube app on Shopify and coming to India and Japan in the next few weeks.

Yay, (awkward laugh) the future we’ve all dreamed of.

Eliminate the labor from entertainment. Eliminate the labor from commerce. Eliminate the labor from lifestyle as entertainment and commerce. Let’s try to slim down the pipeline to just be people with a twinkle in their eye, and a scheme in their hearts.

Here’s to the Idea Guys

Those are the three things that I see bozos using Sora and Veo for to generate entire videos when the technology gets stable enough. Stock footage montages, dead celebrities, and masquerading as a real retail company.

There are many other applications for AI in video, and just like I wrote about before, it’s far more attractive when it’s applied as a step in a process that can be adjusted instead of as a final result. Where I really think the AI bubble is in that misconception that executives can sit in offices and just dash off a prompt to make an ad then go grab some lunch. It appalls me that people want a way to optimize our whole world for “idea guys”.

We’ll just need to continue our breathless coverage of how behind everyone is in getting to these unpalatable futures geared solely towards bozos.

2024-12-10 15:55:00

Category: text

Unauthoritative Pronouncements