WEBVTT

00:00:00.000 --> 00:00:02.720
Imagine just for a second that you are the CEO

00:00:02.720 --> 00:00:06.750
of a massive multinational company. OK. Or maybe

00:00:06.750 --> 00:00:09.009
you're an A -list celebrity, right? You just

00:00:09.009 --> 00:00:11.429
starred in the biggest summer blockbuster of

00:00:11.429 --> 00:00:13.750
the year. You wake up, you pour your morning

00:00:13.750 --> 00:00:16.829
coffee and you sit down with one burning question.

00:00:17.129 --> 00:00:19.390
What is everyone saying about me? Exactly. What

00:00:19.390 --> 00:00:21.969
is the entire world saying about me today? Now,

00:00:21.989 --> 00:00:24.370
you obviously don't just you don't pull out your

00:00:24.370 --> 00:00:26.649
phone and Google yourself. No, you don't have

00:00:26.649 --> 00:00:28.010
the time for that. You don't have the time. And

00:00:28.010 --> 00:00:30.850
honestly, the sheer volume of information out

00:00:30.850 --> 00:00:33.229
there would be completely paralyzing. You'd never.

00:00:33.289 --> 00:00:36.250
Right. It's just too much data. So instead, you

00:00:36.250 --> 00:00:40.329
rely on an invisible, massive, and highly sophisticated

00:00:40.329 --> 00:00:44.390
industry to do that scrolling for you. Welcome

00:00:44.390 --> 00:00:48.909
to today's deep dive. We are exploring the hidden,

00:00:48.969 --> 00:00:52.170
surprisingly high -stakes world of media monitoring

00:00:52.170 --> 00:00:54.149
services. Or if you want to use the historical

00:00:54.149 --> 00:00:56.030
term, press clipping bureaus. Press clipping

00:00:56.030 --> 00:01:00.789
bureaus. I love that. It is a massive, invisible

00:01:00.789 --> 00:01:03.780
infrastructure. Right. And what makes this topic

00:01:03.780 --> 00:01:06.260
so compelling is that while the underlying technology

00:01:06.260 --> 00:01:09.239
has completely transformed, I mean, we're talking

00:01:09.239 --> 00:01:13.040
a shift from literal scissors and glue to highly

00:01:13.040 --> 00:01:15.760
complex artificial intelligence driven scraping

00:01:15.760 --> 00:01:19.379
algorithms. The core human desire driving it

00:01:19.379 --> 00:01:21.680
hasn't changed a bit. It really hasn't. We all

00:01:21.680 --> 00:01:24.840
have this inherent need to understand. ideally

00:01:24.840 --> 00:01:27.439
control our own narrative out in the wild. Okay,

00:01:27.500 --> 00:01:30.379
let's unpack this. For this deep dive, our map

00:01:30.379 --> 00:01:33.280
is the Wikipedia article detailing the history,

00:01:33.519 --> 00:01:35.640
the technological leaps, and the rather intense

00:01:35.640 --> 00:01:38.299
legal battles surrounding media monitoring. Yes,

00:01:38.420 --> 00:01:40.319
the legal side gets very interesting. Oh, it

00:01:40.319 --> 00:01:42.400
really does. And our mission today is to trace

00:01:42.400 --> 00:01:44.540
this evolution. We are going to track how the

00:01:44.540 --> 00:01:47.340
flow of information went from 19th century Parisian

00:01:47.340 --> 00:01:50.180
stage actors buying physical paper cutouts of

00:01:50.180 --> 00:01:52.219
their own reviews. Which is such a great story.

00:01:52.400 --> 00:01:54.280
Right, all the way to the modern... AI -driven

00:01:54.280 --> 00:01:57.180
media intelligence bots of today. Bots that we

00:01:57.180 --> 00:01:59.000
should probably add right up front are currently

00:01:59.000 --> 00:02:01.439
sparking international copyright lawsuits. It

00:02:01.439 --> 00:02:03.439
is quite the journey. We're really looking at

00:02:03.439 --> 00:02:06.400
a fundamental shift in how human beings interact

00:02:06.400 --> 00:02:09.639
with mass media. And as we dig into this, you'll

00:02:09.639 --> 00:02:12.180
see how the methods we use to track information

00:02:12.180 --> 00:02:14.900
actually end up shaping the architecture of the

00:02:14.900 --> 00:02:16.719
information itself. I want to start with the

00:02:16.719 --> 00:02:19.699
basics, though. If we strip away the modern tech

00:02:19.699 --> 00:02:23.620
turk, What exactly is a media monitoring service

00:02:23.620 --> 00:02:26.060
at its core? Because it seems like it could mean

00:02:26.060 --> 00:02:28.159
a lot of different things depending on who is

00:02:28.159 --> 00:02:30.419
paying the bill. At its foundation, it's simply

00:02:30.419 --> 00:02:32.419
a service that provides clients with copies of

00:02:32.419 --> 00:02:34.919
media content that is of specific interest to

00:02:34.919 --> 00:02:37.280
them. Simple enough. But you hit the nail on

00:02:37.280 --> 00:02:39.860
the head regarding the demand. The service has

00:02:39.860 --> 00:02:42.900
to be incredibly malleable. For one client, it

00:02:42.900 --> 00:02:45.419
might be providing raw documentation of every

00:02:45.419 --> 00:02:47.479
single time their company is mentioned. Just

00:02:47.479 --> 00:02:50.330
a data dump. Exactly. But for another, it might

00:02:50.330 --> 00:02:53.069
be a detailed statistical analysis of their brand

00:02:53.069 --> 00:02:55.650
sentiment. It could involve tracking editorial

00:02:55.650 --> 00:02:58.990
opinion across an entire geographic region, or

00:02:58.990 --> 00:03:01.629
simply keeping tabs on what one specific competitor

00:03:01.629 --> 00:03:04.120
is doing in trade journals. And it's not just

00:03:04.120 --> 00:03:07.340
PR news, is it? I imagine advertisers are heavily

00:03:07.340 --> 00:03:09.419
reliant on this kind of tracking as well. Absolutely.

00:03:09.520 --> 00:03:11.159
Tracking advertising content, you know, where

00:03:11.159 --> 00:03:13.659
it's placed, how often it runs, what the competitors

00:03:13.659 --> 00:03:15.979
are spending. That is a huge sector of this industry.

00:03:16.300 --> 00:03:19.400
But to really understand how massive this undertaking

00:03:19.400 --> 00:03:21.960
is today, we have to look at how it started.

00:03:22.180 --> 00:03:25.219
The analog era. Right. And that means going back

00:03:25.219 --> 00:03:27.840
to a time when information was entirely physical.

00:03:28.840 --> 00:03:32.219
We are talking about the mid to late 19th century.

00:03:32.810 --> 00:03:35.229
Paint a picture for us. What does the media landscape

00:03:35.229 --> 00:03:37.710
look like at this point? And who actually realizes

00:03:37.710 --> 00:03:40.270
there's a business opportunity here? Well, put

00:03:40.270 --> 00:03:42.930
yourself in London in the 1850s. Mass media is

00:03:42.930 --> 00:03:45.490
essentially limited to print. But the volume

00:03:45.490 --> 00:03:47.289
of that print is suddenly exploding. Because

00:03:47.289 --> 00:03:49.229
of the telegraph, right? Yes, you have the invention

00:03:49.229 --> 00:03:51.289
of the telegraph, playing of submarine cables.

00:03:51.569 --> 00:03:53.969
News that used to take weeks to arrive by ship

00:03:53.969 --> 00:03:56.669
is now crossing oceans in minutes. Consequently,

00:03:56.669 --> 00:03:58.810
newspapers are just booming. Right. There is

00:03:58.810 --> 00:04:01.750
so much being printed that no single person or

00:04:01.750 --> 00:04:04.490
even a dedicated team at a company could possibly

00:04:04.490 --> 00:04:06.930
read it all. It's just an overwhelming flood

00:04:06.930 --> 00:04:10.400
of paper. Exactly. And that overwhelming flood

00:04:10.400 --> 00:04:13.620
is what prompted a man named Henry Romike and

00:04:13.620 --> 00:04:16.500
a news dealer named Curtis to establish the very

00:04:16.500 --> 00:04:19.740
first press clipping agency in London in 1852.

00:04:20.300 --> 00:04:22.220
So they essentially looked at this mountain of

00:04:22.220 --> 00:04:25.279
broadsheets and realized people would pay to

00:04:25.279 --> 00:04:27.560
have someone else do the reading. Precisely.

00:04:27.560 --> 00:04:29.699
But the story that I find absolutely hilarious

00:04:29.699 --> 00:04:32.720
happens a little later over in France. 1879,

00:04:32.860 --> 00:04:36.060
Paris. A guy named Alfred Cherie starts an agency

00:04:36.060 --> 00:04:38.620
called Largousse de la Presse. And his target

00:04:38.620 --> 00:04:41.360
demographic wasn't big business. No, his business

00:04:41.360 --> 00:04:44.300
model was brilliant and it leaned entirely into

00:04:44.300 --> 00:04:47.220
human ego. He offered a press clipping service

00:04:47.220 --> 00:04:49.860
specifically targeted at Parisian actors. Which

00:04:49.860 --> 00:04:52.459
makes perfect sense. Think about it. You are

00:04:52.459 --> 00:04:55.180
the star of a play in Paris. You want to read

00:04:55.180 --> 00:04:57.220
your reviews, but you don't want to buy 10 different

00:04:57.220 --> 00:04:59.360
bulky newspapers every morning and comb through

00:04:59.360 --> 00:05:02.120
pages of politics and grain prices just to find

00:05:02.120 --> 00:05:04.740
the one paragraph about your performance. Cherie's

00:05:04.740 --> 00:05:07.319
pitch was essentially, pay me and I will hand

00:05:07.319 --> 00:05:09.620
you a neat little stack of paper that is entirely

00:05:09.620 --> 00:05:12.509
about you. It's the 19th century equivalent of

00:05:12.509 --> 00:05:14.550
doom -scrolling your own Twitter mentions. That

00:05:14.550 --> 00:05:17.610
is exactly what it was, a pure vanity service.

00:05:18.189 --> 00:05:21.329
And that vanity aspect was the engine that kick

00:05:21.329 --> 00:05:24.290
-started the early industry. Initially, these

00:05:24.290 --> 00:05:26.250
clipping services catered almost exclusively

00:05:26.250 --> 00:05:29.949
to actors, wealthy tycoons, and socialites. It

00:05:29.949 --> 00:05:33.290
was entirely about personal ego, public image,

00:05:33.430 --> 00:05:36.269
and high society gossip. But obviously a business

00:05:36.269 --> 00:05:39.470
model that good doesn't stay confined to Parisian

00:05:39.470 --> 00:05:42.350
theaters. How fast did this spread? Very quickly.

00:05:42.730 --> 00:05:45.449
By 1885, the National Press Intelligence Company

00:05:45.449 --> 00:05:47.550
launched in New York. Just a few years later.

00:05:47.670 --> 00:05:50.189
Yeah. And by 1899, you have over a dozen of these

00:05:50.189 --> 00:05:51.870
clipping services operating across the United

00:05:51.870 --> 00:05:53.910
States alone. What's really interesting is that

00:05:53.910 --> 00:05:55.569
they quickly realized they couldn't possibly

00:05:55.569 --> 00:05:57.970
buy every local paper in the country. So they

00:05:57.970 --> 00:05:59.970
actually formed a cooperative network. Oh, that's

00:05:59.970 --> 00:06:02.430
smart. A bureau in New York would trade clippings

00:06:02.430 --> 00:06:04.709
with a bureau in Chicago to increase their geographic

00:06:04.709 --> 00:06:07.579
range. But the clientele must have evolved right.

00:06:07.800 --> 00:06:10.360
You can't build a massive global industry just

00:06:10.360 --> 00:06:13.360
by feeding the egos of theater actors and tycoons.

00:06:13.560 --> 00:06:16.540
When do the real heavy hitters enter the picture?

00:06:16.740 --> 00:06:19.660
The shift happens as the 20th century progresses.

00:06:20.199 --> 00:06:23.480
By the 1930s, the primary users of these services

00:06:23.480 --> 00:06:26.629
underwent a dramatic change. The vanity crowd

00:06:26.629 --> 00:06:28.889
was still there, but the bulk of the subscriptions

00:06:28.889 --> 00:06:31.350
were now coming from big businesses. Makes sense.

00:06:31.529 --> 00:06:33.970
Government agencies started subscribing to track

00:06:33.970 --> 00:06:37.279
public sentiment on policies. Even other newspapers

00:06:37.279 --> 00:06:39.379
subscribed to these bureaus just to track what

00:06:39.379 --> 00:06:41.360
their rival papers in different cities were reporting.

00:06:41.560 --> 00:06:43.879
It became foundational corporate intelligence.

00:06:44.360 --> 00:06:46.699
And the consolidation of that power was staggering.

00:06:46.920 --> 00:06:50.480
By 1932, just two companies, the Romike Company,

00:06:50.660 --> 00:06:53.360
which had expanded from London, and Luce's Press

00:06:53.360 --> 00:06:57.100
Clipping Bureau, they monopolized 80 % of the

00:06:57.100 --> 00:07:00.399
entire U .S. market. 80%. That is a massive amount

00:07:00.399 --> 00:07:02.360
of intelligence flowing through just two corporate

00:07:02.360 --> 00:07:05.000
bottlenecks. It is. Which brings us to the actual

00:07:05.000 --> 00:07:07.220
physical logistics of this operation. Because

00:07:07.220 --> 00:07:09.399
when we say clipping business, we mean that completely

00:07:09.399 --> 00:07:11.500
literally. Right. I want you to really visualize

00:07:11.500 --> 00:07:13.699
this for a second. Imagine you are running a

00:07:13.699 --> 00:07:16.699
business in the 1930s. You want to know if your

00:07:16.699 --> 00:07:19.740
new product is landing well in the Midwest. Today,

00:07:19.860 --> 00:07:22.660
you would perform a treater plus F keyword search

00:07:22.660 --> 00:07:25.199
on your computer and software would aggregate

00:07:25.199 --> 00:07:28.480
thousands of articles in a millisecond. In the

00:07:28.480 --> 00:07:31.120
1930s, you were relying on a room full of human

00:07:31.120 --> 00:07:33.720
beings armed with literal scissors and paste.

00:07:34.319 --> 00:07:36.899
Walk us through how this massive human search

00:07:36.899 --> 00:07:40.540
engine actually functioned. It was a highly regimented,

00:07:40.560 --> 00:07:44.220
factory -like operation with a very strict division

00:07:44.220 --> 00:07:47.160
of labor. If you walked into one of these bureaus,

00:07:47.180 --> 00:07:48.620
the first thing you'd see is the reading department.

00:07:49.439 --> 00:07:52.019
Women were primarily employed to sit at large

00:07:52.019 --> 00:07:54.519
desks, meticulously scanning through mountains

00:07:54.519 --> 00:07:57.480
of daily periodicals, newspapers, and trade journals.

00:07:57.680 --> 00:08:00.019
They had to memorize what exactly? Hundreds of

00:08:00.019 --> 00:08:02.920
client name specific brands and keywords. When

00:08:02.920 --> 00:08:04.920
they spotted a mention, they would mark it with

00:08:04.920 --> 00:08:07.240
a colored pencil. And that's just step one. What

00:08:07.240 --> 00:08:10.040
happens to the marked paper? The marked periodicals

00:08:10.040 --> 00:08:11.699
were then passed to the next department, which

00:08:11.699 --> 00:08:14.480
was typically staffed by men. Their entire job

00:08:14.480 --> 00:08:17.379
was to physically take shears, cut out the marked

00:08:17.379 --> 00:08:19.480
articles, and paste them onto specific slips

00:08:19.480 --> 00:08:22.360
of paper. And these slips had to be labeled so

00:08:22.360 --> 00:08:24.860
they didn't lose the context, right? Yes. These

00:08:24.860 --> 00:08:27.040
slips were pre -stamped with the date and the

00:08:27.040 --> 00:08:29.220
publication's name so the context wasn't lost.

00:08:29.600 --> 00:08:31.879
Imagine the sensory experience of that room.

00:08:32.340 --> 00:08:35.240
The constant rustling of giant broadsheet pages,

00:08:35.399 --> 00:08:38.259
the nonstop snipping of hundreds of scissors,

00:08:38.559 --> 00:08:41.620
the heavy smell of the glue. It was industrial

00:08:41.620 --> 00:08:44.360
-scale reading. And after the pasting, the slips

00:08:44.360 --> 00:08:46.740
went back to another department of women who

00:08:46.740 --> 00:08:49.139
would sort these individual clippings into specific

00:08:49.139 --> 00:08:52.500
client batches, package them into manila envelopes,

00:08:52.500 --> 00:08:54.460
and physically mail them out. It's just wild

00:08:54.460 --> 00:08:56.740
to think about. You are waiting days, sometimes

00:08:56.740 --> 00:08:59.440
weeks, for a physical envelope to arrive in the

00:08:59.440 --> 00:09:01.559
mail just to know what happened in the world

00:09:01.559 --> 00:09:03.919
last Tuesday. And that highly manual physical

00:09:03.919 --> 00:09:07.019
process remained the standard for decades. It

00:09:07.019 --> 00:09:09.019
was entirely constrained by the physical nature

00:09:09.019 --> 00:09:11.779
of print media. Until it wasn't. Exactly. Yeah.

00:09:11.919 --> 00:09:14.039
As we know, the media landscape was about to

00:09:14.039 --> 00:09:16.179
undergo a seismic shift. Here's where it gets

00:09:16.179 --> 00:09:18.340
really interesting. You move further into the

00:09:18.340 --> 00:09:21.539
20th century and suddenly news isn't just inked

00:09:21.539 --> 00:09:25.120
on a page. We enter the broadcast era. Information

00:09:25.120 --> 00:09:27.720
is coming through the radio and shortly after

00:09:27.720 --> 00:09:30.340
beaming directly into living rooms via television.

00:09:30.620 --> 00:09:33.340
Right. How on earth does a clipping bureau adapt

00:09:33.340 --> 00:09:36.139
to that? You can't take scissors to a radio wave.

00:09:36.340 --> 00:09:38.480
That was the existential crisis for the industry.

00:09:39.559 --> 00:09:42.860
Broadcast media is ephemeral. Once the words

00:09:42.860 --> 00:09:46.580
are spoken on the radio, they vanish. Press clipping

00:09:46.580 --> 00:09:48.740
agencies had to figure out how to capture lightning

00:09:48.740 --> 00:09:51.320
in a bottle. So what did they do? This transition

00:09:51.320 --> 00:09:53.559
was only made possible by the invention and widespread

00:09:53.559 --> 00:09:55.779
commercialization of audio and videotape recording

00:09:55.779 --> 00:09:59.159
systems in the 1950s and 1960s. So instead of

00:09:59.159 --> 00:10:00.799
a room full of people reading papers, you have

00:10:00.799 --> 00:10:02.299
a room full of people listening to tape decks.

00:10:02.720 --> 00:10:05.259
Precisely. They would record the broadcasts,

00:10:05.259 --> 00:10:07.740
have human monitors sit with headphones and listen

00:10:07.740 --> 00:10:10.299
for client mentions, and then physically type

00:10:10.299 --> 00:10:12.139
out transcripts or summaries of the relevant

00:10:12.139 --> 00:10:15.500
sections. That sounds incredibly tedious, but

00:10:15.500 --> 00:10:17.740
I imagine it fundamentally changed the way people

00:10:17.740 --> 00:10:20.000
viewed information. Once you start isolating

00:10:20.000 --> 00:10:22.299
a spoken quote from a 30 -minute news broadcast,

00:10:22.700 --> 00:10:25.220
you are treating information differently. If

00:10:25.220 --> 00:10:27.559
we connect this to the bigger picture, this is

00:10:27.559 --> 00:10:29.539
actually one of the most crucial turning points

00:10:29.539 --> 00:10:33.200
in the history of information technology. The

00:10:33.200 --> 00:10:35.940
very concept of clipping this fundamental idea

00:10:35.940 --> 00:10:39.220
that a specific piece of text or data could be

00:10:39.220 --> 00:10:41.899
isolated, extracted from its original publication

00:10:41.899 --> 00:10:45.320
context, and evaluated entirely on its own was

00:10:45.320 --> 00:10:48.360
revolutionary. This conceptual leap directly

00:10:48.360 --> 00:10:51.379
influenced the design of early digital news databases.

00:10:51.820 --> 00:10:53.700
Wait, really? So the physical act of clipping

00:10:53.700 --> 00:10:56.620
a newspaper paved the way for things like LexisNexis?

00:10:56.759 --> 00:10:59.580
Exactly. The mindset of a press clipping bureau

00:10:59.580 --> 00:11:01.620
introduced the corporate world to the concept

00:11:01.620 --> 00:11:04.019
of keyword searching long before computers were

00:11:04.019 --> 00:11:06.360
doing it. Before you could search a digital database

00:11:06.360 --> 00:11:09.059
for a term, these bureaus were manually performing

00:11:09.059 --> 00:11:12.019
keyword searches across physical media and compiling

00:11:12.019 --> 00:11:14.519
the results. That makes total sense. They essentially

00:11:14.519 --> 00:11:17.399
trained the business world to expect information

00:11:17.399 --> 00:11:20.679
to be searchable and extractable. So when digital

00:11:20.679 --> 00:11:23.320
architecture was being built, it was designed

00:11:23.320 --> 00:11:25.580
to mimic the output of a press clipping bureau.

00:11:25.740 --> 00:11:28.419
That is fascinating. The analog scissors literally

00:11:28.419 --> 00:11:31.480
shaped the digital search bar, which brings us

00:11:31.480 --> 00:11:33.600
right to the doorstep of the internet boom in

00:11:33.600 --> 00:11:36.460
the 1990s. This had to be a do -or -die moment

00:11:36.460 --> 00:11:39.460
for those legacy clipping companies. If you were

00:11:39.460 --> 00:11:42.360
still mailing out pasted slips of paper in 1996,

00:11:42.940 --> 00:11:45.080
you were in trouble. It was an adaptor parish

00:11:45.080 --> 00:11:48.539
decade. A great example of this evolution is

00:11:48.539 --> 00:11:51.000
the Universal Press Clipping Bureau out in Omaha,

00:11:51.139 --> 00:11:53.820
Nebraska. They had been a staple of the industry

00:11:53.820 --> 00:11:57.700
operating since 1908. But in the 1990s, to survive

00:11:57.700 --> 00:12:00.279
and reflect the new digital reality, they had

00:12:00.279 --> 00:12:02.580
to entirely rebrand. What did they change it

00:12:02.580 --> 00:12:04.840
to? They changed their name to Universal Information

00:12:04.840 --> 00:12:07.480
Services. It was a signal to the market that

00:12:07.480 --> 00:12:09.159
they weren't just clipping anymore. They were

00:12:09.159 --> 00:12:12.360
managing digital data streams. And the pace of

00:12:12.360 --> 00:12:15.919
that digital shift is staggering. By 1998, websites

00:12:15.919 --> 00:12:18.240
like Web Clipping Launch becoming some of the

00:12:18.240 --> 00:12:21.240
very first to monitor Internet -based news. And

00:12:21.240 --> 00:12:23.539
from there, it's an absolute gold rush. It really

00:12:23.539 --> 00:12:26.519
was. To give you an idea of how fast this exploded

00:12:26.519 --> 00:12:29.440
by 2012, researchers at Gartner estimated there

00:12:29.440 --> 00:12:32.399
were more than 250 different vendors doing just

00:12:32.399 --> 00:12:35.559
social media monitoring. It went from a few legacy

00:12:35.559 --> 00:12:38.580
companies with scissors to hundreds of tech startups

00:12:38.580 --> 00:12:41.450
overnight. The barriers to entry dropped significantly

00:12:41.450 --> 00:12:44.029
once the physical element was removed, and the

00:12:44.029 --> 00:12:46.190
sheer volume of data on the Internet demanded

00:12:46.190 --> 00:12:49.169
automated solutions. So what does this all mean

00:12:49.169 --> 00:12:51.929
for how the machine actually works today? We've

00:12:51.929 --> 00:12:54.649
gone from cut -and -clip sweatshops to massive

00:12:54.649 --> 00:12:57.669
server farms. How does a modern media monitoring

00:12:57.669 --> 00:13:00.450
service operate? Well, first, the industry has

00:13:00.450 --> 00:13:02.870
entirely rebranded itself to match its technological

00:13:02.870 --> 00:13:05.649
capabilities. You will rarely hear the term clipping

00:13:05.649 --> 00:13:08.019
bureau anymore. Today, it's referred to as information

00:13:08.019 --> 00:13:11.019
logistics or media intelligence. Media intelligence.

00:13:11.080 --> 00:13:14.580
It sounds so covert. It does. And the delivery

00:13:14.580 --> 00:13:17.360
methods are completely transformed. Clients aren't

00:13:17.360 --> 00:13:19.379
waiting for the mailman. We are talking about

00:13:19.379 --> 00:13:21.659
customized dashboards, instant email alerts,

00:13:21.779 --> 00:13:24.120
keeping clients updated minute by minute as news

00:13:24.120 --> 00:13:26.600
breaks and auto analysis formats, where vast

00:13:26.600 --> 00:13:28.940
amounts of data are visually graphed to show

00:13:28.940 --> 00:13:31.740
sentiment trends over time. And the tools driving

00:13:31.740 --> 00:13:33.940
this are names a lot of people might recognize.

00:13:34.259 --> 00:13:36.519
On the consumer side, you have Google Alerts.

00:13:36.519 --> 00:13:38.500
But on the massive enterprise side, you have

00:13:38.500 --> 00:13:41.480
platforms like Scission Meltwater Medinet and

00:13:41.480 --> 00:13:44.580
MuckRack. Yes. And the mechanics of how these

00:13:44.580 --> 00:13:47.299
systems capture the mentions are incredibly sophisticated,

00:13:47.580 --> 00:13:50.960
though they vary by medium. Let's look at television

00:13:50.960 --> 00:13:53.590
today. In the United States, television monitoring

00:13:53.590 --> 00:13:56.610
companies use a really clever workaround. They

00:13:56.610 --> 00:13:59.029
aren't just having people watch TV all day. No.

00:13:59.129 --> 00:14:01.470
Instead of having humans watch hundreds of channels,

00:14:01.669 --> 00:14:04.289
they capture and index the closed captioning

00:14:04.289 --> 00:14:07.370
text of the broadcasts. Once you turn the audio

00:14:07.370 --> 00:14:09.970
into a text file, it's easily searchable for

00:14:09.970 --> 00:14:12.250
client references. But surely they don't rely

00:14:12.250 --> 00:14:15.409
completely on the automated text. Closed captioning

00:14:15.409 --> 00:14:17.830
can be notoriously messy. That's a great point,

00:14:17.909 --> 00:14:19.789
and it's why the industry is actually split on

00:14:19.789 --> 00:14:22.350
this. Some TV monitoring companies still employ

00:14:22.350 --> 00:14:25.009
human monitors who review the video and write

00:14:25.009 --> 00:14:28.029
abstracts of the program content to ensure context

00:14:28.029 --> 00:14:30.389
is captured. Okay, that makes sense. But others

00:14:30.389 --> 00:14:32.870
rely entirely on those automated search programs

00:14:32.870 --> 00:14:35.870
to index the stories. It often comes down to

00:14:35.870 --> 00:14:38.190
what the client is willing to pay for speed versus

00:14:38.190 --> 00:14:40.649
nuanced accuracy. And what about the Internet?

00:14:41.129 --> 00:14:44.309
The sheer volume of online media is mind -boggling.

00:14:44.370 --> 00:14:46.350
You can't have humans reading every blog and

00:14:46.350 --> 00:14:48.649
news site. For online monitoring, it is entirely

00:14:48.649 --> 00:14:51.350
automated. These services utilize software known

00:14:51.350 --> 00:14:54.470
as spiders or bots. These bots are programmed

00:14:54.470 --> 00:14:57.370
to constantly crawl the web, monitoring the content

00:14:57.370 --> 00:14:59.990
of free online news sources. Newspapers, magazines,

00:15:00.269 --> 00:15:03.230
trade journals, TV station websites, news syndication

00:15:03.230 --> 00:15:05.769
feeds the bots, grab the links, and often pull

00:15:05.769 --> 00:15:08.309
the text versions of the articles directly into

00:15:08.309 --> 00:15:10.899
the client's dashboard in real time. That sounds

00:15:10.899 --> 00:15:13.120
incredibly efficient, but there's no way a bot

00:15:13.120 --> 00:15:15.700
understands context right. I mean, if I'm tracking

00:15:15.700 --> 00:15:17.899
the tech company Apple, how does the bot know

00:15:17.899 --> 00:15:19.980
an article isn't just about a local farmer's

00:15:19.980 --> 00:15:21.940
market having a great apple harvest? That is

00:15:21.940 --> 00:15:25.039
the exact flaw in the modern system. The automated

00:15:25.039 --> 00:15:27.100
bot results are pulled in at lightning speed,

00:15:27.259 --> 00:15:29.519
but they aren't always verified for accuracy

00:15:29.519 --> 00:15:32.980
by the monitoring service. A bot lacks human

00:15:32.980 --> 00:15:35.960
discernment. Furthermore, there is a massive

00:15:35.960 --> 00:15:39.240
structural blind spot. What's that? Not all print

00:15:39.240 --> 00:15:42.460
content makes it online. And conversely, a lot

00:15:42.460 --> 00:15:44.759
of web content never appears in physical print.

00:15:45.299 --> 00:15:48.620
If you rely solely on a digital bot, scraping

00:15:48.620 --> 00:15:52.440
the web in, you might miss a crucial piece of

00:15:52.440 --> 00:15:54.639
physical journalism and vice versa. Which is

00:15:54.639 --> 00:15:57.500
a huge deal when you consider who is relying

00:15:57.500 --> 00:15:59.940
on this data. If you are a listener thinking,

00:16:00.100 --> 00:16:02.620
why does this matter to me? Just look at the

00:16:02.620 --> 00:16:05.580
use cases today. Every single public relations

00:16:05.580 --> 00:16:08.639
team on earth uses this. They use it to track

00:16:08.639 --> 00:16:11.279
their own publicity, to monitor what their competitors

00:16:11.279 --> 00:16:13.620
are doing, and to prove to their bosses that

00:16:13.620 --> 00:16:15.600
their marketing campaigns are actually reaching

00:16:15.600 --> 00:16:18.299
the target audience. And it goes way beyond corporate

00:16:18.299 --> 00:16:21.019
PR. Imagine you're a local mayor or a federal

00:16:21.019 --> 00:16:23.559
agency director. City, state, and federal governments

00:16:23.559 --> 00:16:26.059
are massive clients of these services. Wow, I

00:16:26.059 --> 00:16:28.230
didn't even think about that. They rely on media

00:16:28.230 --> 00:16:30.230
intelligence to stay informed about regional

00:16:30.230 --> 00:16:32.450
issues they couldn't possibly monitor manually.

00:16:32.669 --> 00:16:35.149
They use the data to verify that the public safety

00:16:35.149 --> 00:16:37.169
information they are putting out is accurate,

00:16:37.269 --> 00:16:39.470
accessible in multiple formats, and actually

00:16:39.470 --> 00:16:41.580
reaching the communities that need it. It's such

00:16:41.580 --> 00:16:44.379
a vital infrastructure that the industry actually

00:16:44.379 --> 00:16:47.500
has its own global trade associations to establish

00:16:47.500 --> 00:16:50.539
best practices. You have the North American Conference

00:16:50.539 --> 00:16:52.519
of Press Clipping Services, the International

00:16:52.519 --> 00:16:55.960
Association of Broadcast Monitors. This is a

00:16:55.960 --> 00:16:58.840
highly formalized, heavily entrenched global

00:16:58.840 --> 00:17:01.340
enterprise. But as with any industry that scales

00:17:01.340 --> 00:17:03.940
this massively by utilizing automated technology,

00:17:04.319 --> 00:17:07.059
it eventually runs headfirst into a brick wall.

00:17:07.259 --> 00:17:10.190
In this case, that wall was the law. Specifically,

00:17:10.210 --> 00:17:12.970
the intense friction between automated web scraping

00:17:12.970 --> 00:17:15.670
technology and existing copyright legislation.

00:17:15.970 --> 00:17:18.869
The legal minefield. Because this raises a fascinating

00:17:18.869 --> 00:17:21.130
question. Who actually owns a snippet of news?

00:17:21.369 --> 00:17:23.529
If a bot reads an article and shows me a two

00:17:23.529 --> 00:17:26.009
-sentence summary, is that theft? In 2012, this

00:17:26.009 --> 00:17:28.569
tension exploded into two parallel lawsuits that

00:17:28.569 --> 00:17:30.789
hit the industry simultaneously. One in the United

00:17:30.789 --> 00:17:32.789
States and one in the United Kingdom. Right.

00:17:32.890 --> 00:17:35.150
And what's wild is that both cases involved the

00:17:35.150 --> 00:17:37.589
exact same defendant, the media monitoring giant

00:17:37.589 --> 00:17:40.480
Meltwater Group. This raises an important question

00:17:40.480 --> 00:17:42.900
about how intellectual property is handled in

00:17:42.900 --> 00:17:46.220
the digital age, where copying data is effortless.

00:17:46.619 --> 00:17:50.200
The core legal battle in both jurisdictions revolved

00:17:50.200 --> 00:17:53.299
around two things, the concept of temporary copies

00:17:53.299 --> 00:17:56.420
and the actual business model of generating snippets

00:17:56.420 --> 00:17:58.450
for clients. Let's break that down, starting

00:17:58.450 --> 00:18:01.029
with the U .S. case. This was Associated Press

00:18:01.029 --> 00:18:04.089
v. Meltwater. The Associated Press essentially

00:18:04.089 --> 00:18:06.849
argued you can't just have your bots scrape our

00:18:06.849 --> 00:18:09.190
journalists' hard work, pull out the most important

00:18:09.190 --> 00:18:11.910
sentences, and sell that summary to your clients.

00:18:12.089 --> 00:18:14.910
And the court agreed right. They did. The U .S.

00:18:14.930 --> 00:18:16.970
court ruled that the activity of scraping and

00:18:16.970 --> 00:18:19.130
showing those specific snippets to clients was

00:18:19.130 --> 00:18:22.269
unlawful. Crucially, they determined that Meltwater's

00:18:22.269 --> 00:18:24.990
automated service did not fall under the protection

00:18:24.990 --> 00:18:27.670
of the fair use doctrine. it was deemed a substitute

00:18:27.670 --> 00:18:29.650
for the original work. Okay, so that's the U

00:18:29.650 --> 00:18:32.289
.S. But the U .K. case, Public Relations Consultants

00:18:32.289 --> 00:18:34.569
Association v. the Newspaper Licensing Agency,

00:18:34.890 --> 00:18:37.950
dealt with this idea of temporary copies. Can

00:18:37.950 --> 00:18:39.730
you explain that for me? Because on the surface,

00:18:39.750 --> 00:18:42.230
it sounds like the court was regulating how a

00:18:42.230 --> 00:18:44.609
computer cache works. It does sound incredibly

00:18:44.609 --> 00:18:48.509
technical, but it's vital. Under U .K. and E

00:18:48.509 --> 00:18:51.930
.U. copyright law, the nuance is very specific

00:18:51.930 --> 00:18:55.799
regarding what constitutes a copy. Okay. If you,

00:18:55.920 --> 00:18:59.039
as a user, go to a news website and simply view

00:18:59.039 --> 00:19:01.519
the original source, meaning you just look at

00:19:01.519 --> 00:19:04.539
the article online without printing it or downloading

00:19:04.539 --> 00:19:07.460
a permanent file, that is not copyright infringement.

00:19:07.539 --> 00:19:10.920
So my cash saving it is fine? Yes. The temporary

00:19:10.920 --> 00:19:12.859
transient copies your computer automatically

00:19:12.859 --> 00:19:15.700
makes in its cache just to display the web page

00:19:15.700 --> 00:19:18.279
on your screen are considered lawful. Okay, that

00:19:18.279 --> 00:19:20.220
makes sense. Otherwise, browsing the internet

00:19:20.220 --> 00:19:22.619
at all would be illegal. So why were the media

00:19:22.619 --> 00:19:24.539
monitoring companies in trouble? Because the

00:19:24.539 --> 00:19:26.299
media monitoring business model doesn't work

00:19:26.299 --> 00:19:28.609
that way. Clients aren't paying Meltwater or

00:19:28.609 --> 00:19:30.990
Cision to just go read original websites one

00:19:30.990 --> 00:19:33.269
by one. They're paying the service to actively

00:19:33.269 --> 00:19:36.170
generate, extract, and share customized headlines,

00:19:36.390 --> 00:19:38.730
summaries, and snippets in an aggregated permit

00:19:38.730 --> 00:19:41.509
dashboard. Ah, I see. Because the service inherently

00:19:41.509 --> 00:19:44.650
relies on generating and distributing those extracted

00:19:44.650 --> 00:19:46.890
pieces of the original work, not just temporarily

00:19:46.890 --> 00:19:49.609
viewing them, the courts determined that both

00:19:49.609 --> 00:19:51.930
the service providers and the end users of those

00:19:51.930 --> 00:19:55.069
services now require explicit licenses from the

00:19:55.069 --> 00:19:57.990
publishers. We are stating this purely factually

00:19:57.990 --> 00:20:00.130
based on the outcomes provided in the history,

00:20:00.309 --> 00:20:02.930
but it completely forced the industry to adapt.

00:20:03.210 --> 00:20:05.490
They had to restructure their licensing agreements

00:20:05.490 --> 00:20:08.569
globally to comply with these rulings. It perfectly

00:20:08.569 --> 00:20:11.549
highlights a persistent truth technology almost

00:20:11.549 --> 00:20:14.529
always outpaces legislation. The media intelligence

00:20:14.529 --> 00:20:16.849
firms built the sweeping spiders and bots first

00:20:16.849 --> 00:20:19.880
because the technology allowed it. But the legal

00:20:19.880 --> 00:20:22.240
framework regarding digital fair use and temporary

00:20:22.240 --> 00:20:24.920
copies had to painfully catch up to the reality

00:20:24.920 --> 00:20:26.720
of how business was actually being conducted

00:20:26.720 --> 00:20:29.319
on the ground. It has been a wild journey. We

00:20:29.319 --> 00:20:32.059
started with Alfred Cherie in 1879, smelling

00:20:32.059 --> 00:20:34.480
like glue, cutting out physical newspaper clippings

00:20:34.480 --> 00:20:36.579
so Parisian stage actors could read their own

00:20:36.579 --> 00:20:38.480
reviews without having to buy the whole paper.

00:20:38.720 --> 00:20:41.480
A pure, simple vanity project. A long way from

00:20:41.480 --> 00:20:43.599
where we are now. And today we are looking at

00:20:43.599 --> 00:20:46.500
a global spider bot driven intelligence network.

00:20:46.920 --> 00:20:49.599
It is an invisible infrastructure of information

00:20:49.599 --> 00:20:52.539
logistics that local governments, federal agencies

00:20:52.539 --> 00:20:55.680
and massive corporations rely on every single

00:20:55.680 --> 00:20:58.720
day just to function and understand the world

00:20:58.720 --> 00:21:01.180
around them. And yet it all stems from that same

00:21:01.180 --> 00:21:04.799
fundamental human impulse. The tools change from

00:21:04.799 --> 00:21:08.059
steel scissors to scraping algorithms. But the

00:21:08.059 --> 00:21:10.980
desire to isolate, track and analyze what the

00:21:10.980 --> 00:21:13.420
world is saying about us remains exactly the

00:21:13.420 --> 00:21:16.190
same. It is incredible to think about. As we

00:21:16.190 --> 00:21:17.690
wrap up this deep dive, we want to leave you

00:21:17.690 --> 00:21:20.170
with a final thought to mull over. Consider this.

00:21:20.490 --> 00:21:23.589
If media monitoring spiders and bots are constantly

00:21:23.589 --> 00:21:25.529
scraping the internet for brand mentions and

00:21:25.529 --> 00:21:28.470
PR departments, use that automated bot -generated

00:21:28.470 --> 00:21:30.769
data to write and distribute new press releases.

00:21:31.069 --> 00:21:33.049
Are we approaching a closed loop? Exactly. A

00:21:33.049 --> 00:21:35.289
world where bots are simply writing media for

00:21:35.289 --> 00:21:37.710
other bots to monitor. Where does actual nuanced

00:21:37.710 --> 00:21:40.089
human understanding fit into that automated feedback

00:21:40.089 --> 00:21:42.559
loop? It is certainly a profound question to

00:21:42.559 --> 00:21:44.220
ponder the next time you read a piece of corporate

00:21:44.220 --> 00:21:46.480
news. Thank you so much for taking this deep

00:21:46.480 --> 00:21:49.259
dive with us. Stay curious and always look for

00:21:49.259 --> 00:21:51.920
the hidden systems and stories behind the information

00:21:51.920 --> 00:21:53.619
you consume every day.
