|
SAN DIEGO, CALIFORNIA
ELECTRONIC MUSIC TESTING
ANALOG vs. DIGITAL
MAY 2005
In 1988, long before the launch of Pinnacle Media, in a conference
room in Princeton NJ, Frank Cody and
Owen Leach began examining the flaws, impracticalities,
and numerous problems that programmers had uncovered through
traditional, paper & pencil music testing. Before hailing
what emanated from those meetings -- including the development
of electronic data collection, or Interactive-digital methodology,
and eventually the birth of Pinnacle Media Worldwide –
it is most important to outline what those inherent flaws
were at the time and in fact, still are today.
It was actually programmers within the industry, not researchers,
who began to notice some consistently troublesome occurrences
during the course of their paper & pencil music tests.
The following is a summary of the issues addressed as we examine
how these issues have been remedied though the means of technological
advances.
1- Intellectual Responses
For years, music had always been tested intellectually. We
were asking people to score songs using a 5 or 7 point scale
making respondents feel as if they were taking an SAT style
test forcing them to put a number to a product that they simply
did not use that way. Music simply is not a product of intellect
– it is one of emotion & passion. In fact, most
researchers and programmers will agree that the station generating
the most emotion will earn the distinction of the most compelling,
superlative product. The task was to identify statistically
reliable and accurate methods that would be able to measure
and harness that emotion. Simply put, people listen to the
radio and react emotionally, changing stations when they dislike
a song and turning it up when they love it!
2- Too Many Questions in Seven Seconds
Once a hook begins, much like driving a car, there is a couple
of seconds of reaction time. So now you only have five seconds
left to make the several calculations and intellectual decisions
required. Within those five seconds, the first question asked
in a paper and pencil test is, “Do you know that song?”
or “Is it familiar?”.
Next, they must score the song by first translating their
emotional response of, “I love it, I like it, it’s
just ok, I don’t like it, or I hate it” to a number
from 1-5, and then transpose that score onto a scantron sheet
by filling in an SAT style bubble. Next, again within the
same five seconds, they must determine if they are “tired
of the song” (and quite often “to what degree”).
Finally, in some cases they are asked, “On what station
would you most expect to hear that song?".
It was the King of Pollsters, George Gallup,
who pointed out that the best research only asks one question
at a time! There was simply too much information to intellectually
process in a seven second period.
We learned after studying and interviewing hundreds of respondents
after these style tests, that they would end up falling behind
very quickly, and as a result, would often just fill in bubbles
arbitrarily, or copy a neighbor’s responses just to
keep up with the count…which leads to:
3- Keeping Track of the Slate Numbers
With paper & pencil music tests, it is imperative to
have song slates before each title so the respondent can be
certain the scores on the scantron sheet are matching the
correct songs. One of the greatest threats to accurate research,
resulting from this problem, was that often people actually
either left bubbles empty, or got to the end and realized
that they had mismatched the last hundred songs with each
score off by one. Slate numbers were most often perceived
as yet another piece of information which respondents had
to process and also led to:
4- Extreme Fatigue
Inarguably, six or seven hundred titles represent a great
deal of material to present to respondents regardless of the
methodology. All research companies agree that a top priority
is getting what is needed with as little fatigue as possible.
The number one complaint consistently received during this
long period of “method testing” was the sheer
volume of material. Yet traditional music testing achieved
the worst grades and added insult to injury by not only inserting
two seconds before each cut, but reminding them every seven
seconds of just how many titles they have heard (i.e. “song
# five-hundred and seventy-three”). With up to four
responses for each song, the level of fatigue becomes great
and translates to over 2800 calculations within a two-hour
period.
5- Position Bias
Many firms using paper & pencil methodology had conducted
only a single session, with all respondents in the room at
one time. There is without a doubt, definitive correlation
to fatigue and test position. With any methodology, the songs
actually testing in the beginning and the end, do exhibit
statistical variances with those songs tested in the middle.
In second and third sessions, some firms would simply reverse
the order, assuming that this would adjust for the inherent
bias, however moving titles from the middle to the beginning
and end, has become the preferred method. In order to maintain
the best possible adjustment Pinnacle’s DMT™ tests
songs in groups of fifty. Most firms now use this as the industry
standard, regardless of the methodology. There is yet another
bias that is not considered when testing in only one session:
6- Lifestyle Variables and Sample Quotas
While single sessions may appear more efficient and are certainly
more cost effective, they do not take into account the various
life-styles and abilities of the sample. A single session
minimizes the ability to randomize your sample properly by
limiting it to only those people who are able to attend one
specific time. Research is much better served by offering
session time options to respondents, which in turn, make it
more practical when meeting the quotas outlined within the
screener.
However, we must mention that conducting one music test over
a 3-4 week period, in order to avoid any “single night”
bias is in itself flawed in that the variable of “time
and age” of the song has passed. New songs tested in
the first or second week of their life cycles that are tested
four weeks later will impact the final scores. That song now
has its own age as a variable that can no longer be quantified,
especially by those testing burn and familiarity, since time
is the variable that has the greatest impact on those responses.
Along with other quality firms, Pinnacle Media guarantees
samples by + or -10% and will supplement that project with
an additional session when necessary to meet the strategic
and tactical goals of the study. As a high quality research
firm, delivering the correct sample and converting that to
improved ratings is how Pinnacle Media measures our own success.
The Biases of ALL Music Research
It must be said that all research has some inherent bias.
The German physicist Werner Heisenberg developed
the “Uncertainty Principle” which simply states
that you can not truly observe something without introducing
variables that disturb it. Here is a list of variables that
impact ALL research regardless of methodology:
* No environment can exactly match those in which listeners
use the product.
* Only those people willing to participate are even included
in a research project. (The silver lining to that of course,
being they are the same people who will agree to fill out
a diary.)
* Only those willing to answer their phone when being recruited
can participate.
* No one ever listens to only the hook of a song
* Moods of respondents based on their day, activities, stresses,
etc are all variables out of our control.
The list actually goes on but the best research minimizes
these variables and attempts to “level the playing field”
so all things are equal. In the imperfect world of research,
that must remain a most important goal.
The Digital Solution
Twenty five years ago Madison Avenue developed an alternative
to traditional focus groups that enabled them to capture the
emotional appeal of television commercial concepts and TV
pilots. Shortly thereafter, film studios picked up on the
growing trend to test “rushes” of films in production,
and political campaigns began using the technology in focus
groups and auditorium studies for testing candidates. This
new methodology came with the advent of digital, interactive,
data collection. In 1988 we first began exploring how to best
utilize this technology to test music and soon discovered
its ability to harness true, emotional appeal. Taking into
account all the flaws outlined, the dials quickly made it
apparent that they didn’t minimize those biases –
they eliminated them.
1- Intellectual Responses
Pinnacle’s Digital Music Test™
uses a scale of 0-100 yet we do not force respondents to actually
“score” songs. They are instructed to do what
they do when listening to the radio. When they like a song
they turn it up. When they don’t like it they turn it
down. The degree to which they turn up or down tells us how
much they like or dislike the song. We learned to ask respondents
to tell us how they “feel” about the song today.
While interactive dials have the ability to measure “burn”
and “familiarity”, Pinnacle Media chooses to live
by the Gallup axiom - “ask only one question at a time”.
We must note that “burn” is truly a function
of current music and in fact we utilize this in OnlineTRACKER™.
Burn was developed in order to track the life-cycle of new
music, which is usually 20-25 weeks, at which time it either
disappears or makes it into library. As a result, respondents
are asked to use their dial to reflect how they feel about
the song today. How they used to feel is irrelevant. What
programmers need to determine is how people feel about songs…not
why, and to avoid forcing them through too many hoops.
We have learned that asking about “familiarity”
likewise clouds the issue and is an intellectual response.
The answer to the question, like burn, is already built into
their emotional response to every song. We therefore, leave
“burn” and “unfamiliarity” to call
out and OnlineTRACKER™,
at which time the playing of a hook does not begin until the
previous song’s information has been recorded.
2- Testing Relative Product Quality
With the advent of digital technology came another tremendous
advantage. We discovered an ability to measure actual on-air
music mixes of the client station, as well as their competitors.
By scoping down two hours of music from each station we could
now measure the “relative product quality” of
one station against several others. This would become one
of the most empowering and compelling points-of-differentiation
between dials and paper & pencil. How does one’s
core and cume respond to each station? What is their “intent-to-listen”
to each station’s mix? And are they able to correctly
attribute each to the proper radio station?
3- Less Fatigue
After doing several side-by-side tests using dials and paper
& pencil, we learned that respondents felt the dial was
more like listening to the radio, more emotional, more accurate,
and far less fatiguing with complaints over the amount of
test material far lower. Since the dial is read second-by-second
and data is recorded in synchronization with every song, respondents
no longer need to keep track of the songs. Therefore, there
are no slates, making participants work less and completing
the same number of songs in less time. As a result, Pinnacle’s
DMT™ is afforded nearly 25
more minutes than paper & pencil, to ask perceptual questions
and test other types of material:
4- Digital Content Analysis™
Digital-Dial technology also offers Pinnacle clients the
option of testing morning shows and personalities, as well
as TV spot campaigns (including those of competitors), keeping
in mind that the technology was first developed for this purpose.
Along with testing on air music mixes (and even prototypical
pods) this translates into wonderful opportunities to turn
a simple “music test” into something far more
valuable, while still realizing the limits of a tactical sample.
5- Immediate Results
One advantage that clients enjoy is the viewing room and
next-day results. During a Pinnacle Media Digital
Music Test™ clients are in a hidden, adjacent
room watching the results on the screen in real time. They
can “see” how the sample “feels” moment-to-moment,
giving them an actual snapshot of their audience during any
song. You can literally and graphically see when they tune
out and when turn it up. Data from all sessions is then crunched
overnight with final results presented to the client the next
morning, less than 12-hours after testing is complete.
Winner of the “Most Asked Question” Award
The most common question asked by first time users is whether
or not the previous song will impact the way one responds
to the next song. There are a few points that must be made
here:
A) If previous songs impact the way one feels about the next
song then all music testing is superfluous, since once we
put them on the radio, and place them next to other songs,
what we learned about the song in the test has now become
meaningless, since the previous song they heard has an impact
on how they feel about the song on the air right now.
B) If it’s true with dial technology then it’s
true with paper & pencil since neither method asks the
respondent to forget the previous song before scoring the
next song however…
C) Paper & pencil testing actually affords more chance
of previous songs impacting subsequent songs by what is called
the “rule of the 3’s”. Paper and pencil
testing keeps a running history of how they scored all previous
songs – it is truly right in front of them and they
begin to examine what they should score next, based on how
the previous answers look. If a respondent has scored a sequence
of three straight 5’s, the temptation to vary their
answer for upcoming songs becomes greater. We have all experienced
this on standardized testing when we saw too many answers
that appear the same, our temptation was to just change it
up. With dials there is no previous history to bias subsequent
songs. All they see in front of them is the current second.
D) If previous material had an impact on subsequent material
then it would also affect the personality testing as well.
Many research firms use dial methodology for this purpose
and notice no variance in their content testing.
E) This continues to be tested time and time again by simply
placing the same song in the same study, multiple times. These
songs always test within acceptable statistical variance.
This is actually something never tested using paper &
pencil but occurs at least four or five times in every music
test we do, mostly because as-is pods end up playing a few
of the same songs in head-to-head battles.
Pinnacle Media Worldwide
Pinnacle Media Worldwide was conceived, created, and developed
by programmers for programmers. Each member of Pinnacle’s
team has been in the trenches fighting the battles and each
is dedicated to continuing to work on creating new, reliable
methods that help our clients drive ratings, and revenue.
We, like many others have learned that, “insisting on
doing things the same way means you could be doing them wrong.”
If not, we would all still be watching VHS tapes, playing
vinyl records, have cassette players in our cars, and heaven
help us all…talking on pay phones!
Creating the most advanced and accurate methods of research
remain Pinnacle Media’s goal and mission as we continue
to help empower our clients in the coming years.
|