-->

The Generative AI Industrial Revolution

Article Featured Image

Generative AI may prove even more divisive to society than the Indus­trial Revolution. There are many questions well worth examining re­garding gen AI’s impact on the streaming in­dustry, the media and entertainment world, and more. In this article, I’m going to focus on a few streaming use cases in the hope of pro­viding a broader perspective on the changes happening in our world now and those we can expect going forward.

At Streaming Media Connect in August 2023, I moderated a session on How Generative AI Will Impact Media Technologies. A year later, the conversations are becoming much more grounded and nuanced. Media cus­tomers are willing to do prototypes for the proj­ects for which they want to use gen AI to test out ideas. This is a large step forward, says a rep from one midsize enterprise.

But let’s start out by defining what gen AI is—and what it’s not—and discussing how us­ers typically interact with gen AI models.

AI vs. Generative AI

There’s generally some confusion about the differences between AI/machine learning and gen AI. We have to be clear about separating machine learning from gen AI since they are very different. The distinction is that gen AI is trained on extensive amounts of data to dupli­cate human output. Whether it’s code, graph­ics, text, video, or audio, the interaction with the system uses natural language process­ing and can create output based on training knowledge. Machine learning is used for un­derstanding and predicting based on an ex­isting body of data.

Welcome to Your Chatbot

The interaction layer for many, if not all, gen AI projects is accomplished by chatbot, in which input is either speech or text. This prompt engineering (structuring instructions to required output) is now replacing the tradi­tional user interface on existing products. Con­figuring a workflow in this way can save time and—in theory—requires less expertise in us­ing a specific application. However, producing the desired output still requires skills, just dif­ferent skills. In other instances, the use cases create an entirely new way of doing something. Meanwhile, the gen AI model learns from the data you’re providing and has the potential to suggest the best way to get from A to B, han­dling the many nuances along the way.

“The foundational apps I see coming out now are really guided assistance, augmenting the use of a piece of software,” says a rep from one of the larger vendors. “The next step would be automating automation, whether it’s spe­cific tasks that you can do conversationally or whether it’s a complete workflow.”

Accuracy

Before launching into use cases, I wanted to find out what types of questions vendors were receiving from media companies.

Globant has engineering staff and media technology products in use with many of the well-known large media companies. Accord­ing to Luciano Marcos Escudero, the compa­ny’s VP of media engineering, customers want to know how gen AI works with their data and how accurate the output will be. An example might be that they could use metadata associated with a content library for creating a recommendation service. An inaccurate re­sult might be an improper classification of a movie genre, not associating an actor with all of their films, or some other hallucination when the system makes something up. “We try to build some sort of correlation with the input and the output, but it’s not 100%,” says Escude­ro. “We throw things to a model that has been training and self-training again. The reason­ing based on the data is not always accurate.” Nor is it clear how that response was gener­ated. Both of these things can make custom­ers very uncomfortable. In all of the use cases I discuss here, vendors put a rule engine (or prompt engineering to restrict output) on top to filter out inaccurate responses.

Large language models (LLMs) out in the wild today are trained on general data that’s not specific to media and entertainment. Naveen Narayanan, Quickplay’s head of product inno­vation and strategy, says his company “bound [its LLM] to questions related to media and entertainment catalogs, fine-tuning the gen AI large language model.” The result, he notes, was that the model was able “to answer on a specific question and not answer certain other questions that were not on media. Fine-tuning took us close to 3 months of different prompts and different boundaries. Some of the prob­lems that we [encountered] when dealing with these LLMs were [how they] handle things like consistency, accuracy. … LLMs are not consis­tent. Sometimes they give you [responses] in certain formats, sometimes they don’t. There is some randomness to it.”

After 3 months, this fine-tuning process yielded a first version that was fairly usable, with a chat-based interface that understood the history of what the user was trying to find. This AI-based Quickplay Media Companion helped users to search and discover contents through a voice-based natural language interface.

quickplay media companion

Quickplay Media Companion

But It’s My Data

A question that comes up often, Narayanan says, is “‘How do I ensure that what you’re do­ing is not feeding back to the model and get­ting the benefit of all of my data?’ The propri­etary models that the AWSs and the Googles of the world have launched have a master mod­el that is available. They instantiate that mod­el into a local instance, and all of the training for that particular model is residing within that local instance. It does not go back to the core base model.” The base model, Narayanan maintains, is not learning from the proprietary data that any of the customers have.

“You can easily upgrade the model in your local project, still maintain all of your fine-tuning, and get the benefit” of that without hav­ing to worry that you’re “giving any of the propri­etary information back to the core base model,” Narayanan states. “That is a big architectural decision that these large cloud vendors have made to tackle this enterprise concern.” The theory is that the base model is getting smarter from data from the open internet, but that en­terprise data, for better or worse, is not contrib­uting to this knowledge. Quickplay’s instance of the model, trained within its enterprise, is operating within a closed loop of information.

Do clients always insist on closed-loop gen AI models? Sports organizations seem more ready to jump into using gen AI and are more flexible about how their data is used. “[Sports] are a little bit more flexible on using external models that are already trained,” says Escu­dero. “We are actually training models with their data too.”

By contrast, Escudero says, “Media [compa­nies] are completely the opposite. Media groups are not that open to external models. They want to run models with their specific IP. Putting their IP into a model has security and gover­nance processes, and we need [to provide] a lot of documentation about how that is going to be processed.”

Cost

There are two ways to look at gen AI costs: what you save and what you spend. “Right now, a lot of our big customers are looking to leverage gen AI at scale to do things that they’ve traditionally been doing, but with a lot more cost optimization and better go-to-market speed,” says Amit Tank, senior director of solutions architecture at MediaKind. “For medium-to-smaller-size customers and prospective cus­tomers, what we’re seeing is that they want us to help them leverage [gen AI] to differentiate by offering certain features that are other­wise not feasible.” This includes more easily managing configurations for all encoding de­vices or providing custom clips, depending on user preference.

While one conversation is about what you can save, another concerns model manage­ment and oversight, which is especially rele­vant if you’re planning on replacing a person with a gen AI function. This has been an under­lying theme in many of these conversations.

“One of the initial questions we get is, ‘How much is it going to cost me’” to run an AI proj­ect, says Escudero. Globant clients typically want to know, “How much is it going to cost me running one asset for 2 hours, on a daily ba­sis, multiple times?” Other questions he hears are, “What’s the quick time to value? What are the things that can be implemented within a month or 2 that can really generate value?”

Magic Content Creation

Our first use case involves virtual produc­tion of highlight videos for live or VOD sports content. Using AI-enhanced or -generated metadata, Linius searches through video, sorts typically 10- or 20-second clips, and builds a video from this for either fans or internal con­tent curation teams. The company specializes in AI-enhanced personalized video experienc­es, particularly for sports. “We’re using a com­bination of machine vision to identify players, actions, noise analysis, and commentator sen­timent to rank and prioritize moments in the game, and OCR to interpret on-screen data,” says Linius CEO James Brennan.

You could literally ask the model to produce a video clip of all of the left-handed pitches in a baseball game or the goals a particular hockey player scored in Toronto only in the month of March. “Our system accesses the original vid­eos,” notes Brennan. “We don’t ever copy, move, process, or restore those videos. We’re just cre­ating a lightweight data representation and then using the AI to manipulate that data.”

Explaining how the process works, Brennan continues, “Gen AI interprets all data to deter­mine what should be in a highlight reel, writes a summary, creates a commentator script (for text or converted to speech), and then gener­ates the API search queries, which creates the video on-the-fly.” Linius uses standard HLS, MP4 content, and AI to standardize size (16:9, 9:16, 1:1), resolution, etc. “To publish, I copy this URL, paste that into my web CMS, and the video is live.”

But the video viewers see is not a discrete video file. “I’m publishing an instruction set to tell the system, ‘Go pull these different [clips] on-the-fly,’” notes Brennan. Linius’ Whizzard Captivate solution is available by UI or API.

Linius captivate in action

Linius Captivate in action

Stringr provides WeatherGen, a platform that deploys gen AI models to create custom­ized news and weather programming. “About a year ago, we began developing a suite of solu­tions that leveraged gen AI to create automat­ed video content. Our thinking was that there’s too much risk to call it hard news if there are hallucinations, but if you’re focused on things that have very, very clean data, then there’s an opportunity,” says Stringr COO and chief product officer Brian McNeill.

Stringr began with weather forecasts using content from the National Weather Service. “We parsed that through our own systems, and then we sent that into a variety of large language models to generate a script. That script comes back and goes into a different gen AI model to make a voiceover. We stitch that together into a video with motion graphics to make a weather fore­cast. Because this is entirely au­tomated, it allows our customers [local media companies] to create voiced video weather forecasts for markets where it might not have made economic sense to actually have a reporter.”

stringer weathergen

Stringr WeatherGen

The typical length for these forecasts is 30, 60, or 90 seconds at 720p. Stringr is producing con­tent in English and Spanish. Cox Media Group has a FAST chan­nel that uses this, and in August, three other larger station groups signed contracts to use Stringr for their digital properties. Much as with Linius’ product, customers  either use a UI to configure their streams, or they can do bulk programming via API.

Here’s another live use case. “The big prob­lem with live is the processing time. We are not talking about seconds; we’re talking about minutes or even hours. Because of that, it is not worth it to do live at this stage,” says Globant’s Escudero. “If we [process the video] to create metadata of all content, audio, etc., it’s going to take time and money. If we go live, it’s even worse, because the latency is much worse than any 4K transcoding today.”

One issue with attempting to deploy gen AI to customize live streams, Escudero explains, is that it means “trying to contextualize key moments not based on the video processing or image processing, but more on metadata.” Sports metadata, created in the stadium in real time by a real person, will track key moments: a fight, a yellow card, a goal, or a time-out. “We’re trying to develop a key and identification based on that metadata without analyzing the vid­eo,” he says. This can trigger the creation of a “high-interest” automated clip without requir­ing human video crew members to produce it.

Your Network Wants to Talk to You About Something Serious

One gen AI application that offers a boon to streaming users, according to Jon Alexander, Akamai’s VP of products, is the ability to have more natural, conversational interactions with the data in their systems. “We have a reporting application today where we’ve introduced a chat interface where you can ask questions of your data,” he says. “We’re primarily using gen AI as a natural language interface for a customer to say, ‘Hey, tell me what’s going on in this segment of my network,’ or ‘What type of threats am I seeing over here?’ One of the areas personal­ly that I feel is most interesting about gen AI is actually a user interface improvement. You can have more conversational interaction with your data with a system versus pushing buttons or writing very arcane instructions in some kind of predefined language.”

The output, much like Excel, can display the data in a number of formats, from table to line chart to bar chart. If you don’t like one output, you can request another.

This gen AI implementation took an open source model and fine-tuned it based on Aka­mai’s security solution. It can understand prompts in English, French, German, and Spanish, for starters. “It’s a chat interface that allows you to interact with your security data in the Akamai products,” says Alexander. This was designed for security engineers or site reli­ability engineers—technical users who are run­ning infrastructure or application developers who are accountable for the service that’s be­ing supported. “They understand our web ap­plication firewall, bot management, or our mi­cro segmentation services, and they’re using this gen AI capability to understand it better,” he notes.

How to Glow Up a Pipeline

“For most of the large sports customers or TV operators, one of the biggest pain points is setting up the flows that take their video through that entire live production video pipe­line and then ultimately send it out to the end consumers,” says MediaKind’s Tank. “It is cost­ly in terms of labor, it is error-prone, and the risk is that a mistake can take you offline.”

To address this problem, at NAB 2024, MediaKind introduced a new AI-driven pipeline man­agement tool called Fleets & Flows. It has a natural language processing layer to enable video engineers to talk to their encoders. “Pre­viously, it used to take an operator’s team 50 to maybe 60 clicks on five different screens to configure five or 10 different devices,” Tank says. “Now they can simply type, ‘I’m located in New Jersey, and I want to get this channel up and running.’”

mediakind fleets & flows

MediaKind Fleets & Flows

What’s more, Tank adds, “An operator cus­tomer can manage all encoding devices, like the devices which accept the contribution feed, and then in turn send that feed uplink to the platform with the right amount of en­coding, etc.”

As great as this sounds, it begs the question, “How do we make sure that the actions tak­en by the NLP or gen AI capability are accu­rate and aligned with what the human intent was? Our design and architecture is more human-intent based,” Tank explains. “We train and fine-tune our AI capabilities to under­stand the intent of that operator.” The AI mod­el populates whatever needs to be configured, and then the operator is able to check to en­sure they are happy with the result.

Talking to Your Media Supply Chain

The next gen AI use case involves a com­pany replacing its entire user interface with a chatbot. The premise is that enterprise ap­plications have a bigger learning curve than consumer products, and what better way to flatten that curve than by letting users talk directly to the media supply chain platform?

“Instead of having a requirement for high­ly trained operators or video engineers to operate our platform, now a very entry-level person can sit down and accomplish the same thing just by typing out what needs to happen,” says Dan Goman, CEO of media supply plat­form provider Ateliere. This seems to be eas­ier said than done. Who will ensure that the right questions are being asked about what needs to happen?

Typical questions might be: Do I have this content? Do I have all of the right variations? Is it in a state that’s monetizable? Once those issues are resolved, Goman says, the operator can choose to send this to a platform. The sys­tem can check about requirements to be mod­ified before doing it by asking, “Would you like to fix this, or can I fix it for you?”

The other use case Ateliere sees is an exec­utive being able to query the costs for a piece of content to be processed. “In studio land, they always ask us what the operational overhead is so they know if they made the right deal or not,” says Goman. “We have all of the costs as­sociated with fulfilling an order, and they could include things like transcoding costs, the cost of packaging that content, and delivery costs to provide a more accurate, more complete cost to business owners,” he says.

Connect AI, the gen AI-augmented, natural language version of Ateliere’s Connect solution is only an option; API access and the interface for the platform will still be available for cus­tomers who want a more advanced or tradition­al way of interacting with the platform.

Ateliere Connect AI

Uploading assets in Ateliere Connect AI

Gen AI Ad Context

Advertising-centric gen AI use cases take contextual insight based on video, audio, and transcripts analysis (among other things) and find the right placement and the right contex­tual ad. Previously, FAST programming had many irregular approaches to ad insertion, with some channels running ads every X min­utes. “We figured out that there’s a much bet­ter way to identify key points where ads can be inserted, and we can also provide the context of the segment before the break point,” says Quickplay’s Narayanan. “We have done a pilot with one of the largest Spanish operators in Latin America.”

The Quickplay AI model performs analy­sis to determine the right place to put an ad break, as well as requesting a contextually matching ad.

Globant has been working on the same use case. “We built a solution where we receive an asset in a scene and then, after the scene, we break it down in frames and process them to identify what’s in that frame,” says Escudero. Customers use this metadata to look at the context of the content 30 seconds before an ad break so they can pass labels related to what’s happened in the content to the ad server.

I was curious to find out how this data is stored. The metadata lives in Globant’s CMS. Its data model for this can be configured to ex­pand from 10 traditional fields to thousands of different vectors stored along with metadata of that particular asset, Narayanan says. “Some is user-readable, a set of predefined topic key­words in plain text. There is also a process to summarize a fingerprint for every video, and all of the information is stored in vector for­mat. Think of it as a series of numbers that rep­resent what the content is, and it’s easily stored as a set of numbers in our database,” he says.

This metadata can then provide the basis for building out consumer recommendations, transcription, translation, and other use cases that need deeper insight into the video content.

Buy and Sell Side

This next use case is an idea I was always skeptical about for bigger brands, because I figured there were too many brand guidelines to let gen AI create an ad. However, Waymark is targeting local and SMB advertisers and of­fering its self-serve tech to give smaller adver­tisers an option for easily designing creative, with multiple versions, in 5 minutes or less. First, it will use any content that exists on the company’s website—the logo, the site copy, and style—and then let users pick a color palette and provide instruction about what kind of storyline to promote.

This product has been white-labeled by Paramount, Scripps, Spectrum Reach, and Fox, and now there are a few international con­tracts, including with Australia’s Nine and U.K. newspaper publisher National World, PLC. It helped garner more than $200 million in new revenue, giving a much-needed boost to ad rev­enue in streaming. Spectrum Reach and Way­mark generated more than $27 million in rev­enue alone last year from AI-generated ads for local clients.

“We’ve built a ton of components that cre­ate video, like animations, transitions, end cards, and all of those pieces that make a com­plete video,” says Waymark CEO Alex Persky-Stern. Assets are merged with an AI-generated script, and customers can choose from 150 dif­ferent voices for narration. The customers can revise as needed or revert back to an older output. Waymark delivers to two different HD formats: one for broadcast and streaming, the other for social.

waymark ai

How Waymark works

On the sell side, Operative provides ad man­agement solutions for larger-dollar monetiza­tion campaigns. One of the complex puzzles for anybody sitting on a lot of inventories is, “How much of that inventory do I sell direct through my direct sales team? How much do I move to programmatic? How do I optimize inventory for different channels for different advertisers? How do I respond to an unstructured RFP?” says Ben Tatta, chief commercial officer at Op­erative, whose platform helps many large me­dia companies package and sell advertising.

Media planners can compose an entire plan on where to buy advertising, whom to target, budgeting, and many more details in a conver­sation with Operative’s chatbot Adeline, and it will populate requirements within the software. You can even upload a voicemail. After a plan is generated, the planner would go through and make modifications.

“A really good plan for BMW starts by pull­ing all of the historical plans for BMW and/ or [similar] manufacturers,” says Tatta. This increases the hit rate from the pre-AI 50%– 60% average to 80% or above and puts com­panies with a lot of data in a much better po­sition. “We have a very structured taxonomy for tens of thousands of data elements, which could be how you define an audience, a cam­paign,” notes Tatta.

When a campaign hits delivery targets, it can be shut down, and the next campaign can begin.

Recommendations

What’s the most common request from cus­tomers in the streaming world for what gen AI can deliver? As you might expect, it’s content recommendations.

“We are close to deploying recommenda­tions for our customers in Q4,” says Narayanan. “The way it works is, you have the catalog in­formation that is passed to the gen AI model. You capture the usage information—this per­son watched this content. The catalog and the consumption information are merged togeth­er, based on that mapping, and now you are able to figure out what is the likely content that the user is going to engage next.” They can either optimize on watch time or completion rates for what’s up next.

Being Responsible

What has become clearer in the gen AI era is that talking to your technology will now be­come much more prevalent. What also is clear is that not all customers understand how AI operates. They believe you put an algorithm in with sample data, and it works. But you have to do A/B testing with AI and expect to iterate through approaches. “You need to test multiple samples that are multiple gen AI algorithms and then find that one with the proper results,” says Escudero. “We need to educate customers about how to think about AI.”

Other important issues (beyond the scope of this article) are copyright within the data that LLMs are trained on and fair represen­tation on a wide range of topics, such as race, sex, age, location, and so many more. Unfor­tunately, AI models are no less biased than the humans who trained them. Ethics is also another important topic I didn’t look into here, because we needed to first look at use cases. But these other areas are very important, if not more so.

Use cases are what will bring the media and entertainment industry along into what some of the possibilities are, and there are many more than what I covered here (like localiza­tion). One company identified five areas to bring value to its customers:

  • Increasing monetization opportunities
  • Driving user engagement
  • Enhancing discovery
  • Increasing reach
  • Lowering costs

Another company focused on making its product easier to use and controlling costs. A third focused on allowing its customers to make data queries that are specific to their problem.

As in the Industrial Revolution, opinions are very divisive on what gen AI will do to the na­ture of work. One thing is for sure: You need to both closely parse what your chatbots as­sume and have wide-ranging conversations about how gen AI will work for your particu­lar use case.

Streaming Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Q&A: Simon Crownshaw, Microsoft’s Worldwide Media and Entertainment Strategy Director, Talks Gen AI

In this expansive interview with Simon Crownshaw, Microsoft's worldwide media and entertainment strategy director, we discuss how Microsoft customers are leveraging generative AI in all stages of the streaming workflow and how they're using it in content delivery and to enhance user experiences in a range of use cases. Crownshaw also digs deep into how Microsoft is building asset management architecture and the critical role metadata plays in effective large-language models (LLMs), maximizing the value of available data.

AI and Streaming Media

This article explores the current state of AI in the streaming encoding, delivery, playback, and monetization ecosystems. By understanding the developments and considering key questions when evaluating AI-powered solutions, streaming professionals can make informed decisions about incorporating AI into their video processing pipelines and prepare for the future of AI-driven video technologies.

Q&A: Google Cloud's Anil Jain Explains How Generative AI Is Driving Digital Transformation for M&E and DTC

For Anil Jain, leading the Strategic Consumer Industries team for Google Cloud has meant helping traditional media companies embrace cloud and AI, move to an OpEx-centric business model, and adapt to a more development-oriented mindset in engineering and product management. In this interview, Jain discusses the global shift to cloud-based operations in the media industry, the ways generative AI is disrupting everything from production to packaging to all aspects of the user experience, and what media companies should be afraid of if they're not already.

Generative AI and the State (and Future) of Media Financing

Following the disruptive rise of ChatGPT, in the coming months and years, if a VC or private equity firm wants to invest their pool of money in a streaming or media technology business, will Generative AI be a must-have component?

Companies and Suppliers Mentioned
" class="hidden">綦江论坛