There are topics the industry believes it has already settled. Sonic branding is one of them. Everyone understands it, everyone agrees with it, yet very few actually implement it at the level it requires. It is precisely this gap between declarative understanding and real action that opens up one of the key questions of today’s communications industry: why does sound, despite its proven effectiveness, still remain on the margins of brand strategy?
At this year’s Dani komunikacija, this gap is addressed by Steve Keller, Sonic Strategy Director at SiriusXM Media and Studio Resonate, whose work systematically challenges the way the industry understands and uses sound. His perspective does not start from the assumption that brands don’t believe in audio, but rather that they are structurally organized in a way that prevents them from treating it as a strategic tool.
Keller raises a question that goes beyond creative execution and into the very architecture of the industry: from how briefs are structured, to what is measured and valued, to the fact that sound most often has no “owner” in the decision-making process. In such a system, audio can only remain an add-on, regardless of its potential.
In the conversation that follows, Keller does not offer another argument for why sound matters. Instead, he precisely maps why, despite everything, the industry still does not act accordingly.
The argument for sonic branding is not new, and by now it has been made persuasively enough that most people in the room in Rovinj will nod along. The more interesting question is why nodding has not translated into action at scale. What is the actual resistance, not the stated one, that keeps brands from treating sound with the same structural seriousness they give to visual identity?
There are all the predictable reasons: budget constraints, lack of expertise, unclear ROI. But when I push past those, what I usually find underneath is something far more fundamental and to me, far more interesting: a systemic bias that keeps audio-first solutions from having a seat at the problem-solving table.
Most brands have built their internal structures, their agency relationships, their creative governance, and their brand standards around a single, implicit assumption: that branding is a visual discipline. The brand manager was trained to look at things. The brand is preoccupied with visual expression, “tone of voice” is often limited to narrative structures with no consideration for how that voice literally sounds. The advent of television advertising led agencies of record to think less sonically and more visually, pursuing the major television campaign or film. As a result, sound lives, at best, in the production budget and, at worst, in the hands of someone who was handed a set of headphones and told to “pick something that sounds like us.” That’s not a sound strategy.
Here’s the uncomfortable truth: it’s not that brands don’t believe in sound. Most of them do, on some level, and the evidence is overwhelming enough that it’s hard to argue otherwise. Audio assets are 8.5 times more likely to appear in high-performing ads, according to Ipsos. Recent research on creative dividends from System 1 continually places audio assets from sonic logos, to jingles, to band voices and characters at the top of the list of distinctive brand assets that produce high attention and fast fluency. And yet the industry continues to treat audio as decoration rather than architecture. But all that belief in the power of sound hasn’t been enough to transform our behavior and our spend, in spite of the evidence.
Why? Perhaps it’s because the deeper resistance is that taking sound seriously would require reorganizing something, the agency relationship, the brand governance process, the creative brief, the way we define what a “brand standard” even is. Nobody wants to admit that the beautiful brand book they spent eighteen months and three agencies building is incomplete. Incomplete isn’t a popular word in boardrooms.
What I try to help clients understand is that this isn’t an indictment of everything they’ve already done. It’s an invitation. Your visual identity is the foundation. Now let’s build the whole house. Because right now, most brands have an extraordinarily well-designed front door and no idea what the place sounds like when you walk inside.
Sound is consistently the last element addressed in brand identity work, after visual design, naming, messaging, and often even packaging. You have spent your career inside that gap. What structural reason, not a superficial one, explains why the industry continues to underinvest in its most emotionally direct sensory channel?
If you want the structural explanation, follow the brief.
The creative brief is the architecture of everything that comes after it. It determines what gets resourced, what gets measured, what gets celebrated, and what gets cut when the timeline compresses and the budget shrinks. And in the overwhelming majority of agencies, the brief has no sonic dimension. There is no “audio brief.” There is no sonic equivalent of a visual moodboard. There is no moment in the creative process, whether at strategy, concepting, production, or evaluation, where someone is explicitly accountable for the sound of the brand.
When I look at the development cycle of a typical brand campaign, audio is bolted on at the end of the process. It’s an afterthought. The structural issue is that audio has no champion in the room where decisions get made. Visual design has an Art Director. Copy has a Copywriter. Strategy has a Planner. But sound? Rarely is sound considered as a core asset that drives creative from inception. Audio-first thinking isn’t audio-only thinking. It just asks different questions. Instead of only focusing on what the brand or the campaign looks like, it asks what the brand sounds like. It translates Consumer Entry Points into Consumer Ear Points. It looks at brand perception through an experiential lens, considering how sound drives both the experience and the perception that follows.
There’s also a measurement problem that feeds into the structural one. We’ve built an industry that optimizes for what’s easy to quantify. Clicks, reach, ROAS. But creative quality, particularly audio creative quality, is harder to quantify in the short term. A Marketing Week/Kantar study found that a third of marketers don’t measure creative performance at all. If you don’t measure it, you don’t value it. If you don’t value it, you don’t resource it. If you don’t resource it, you don’t get good at it. When a brand can point to specific, measurable evidence that a sonic investment drove a real business outcome, the conversation in the room changes.
Your research at Oxford’s Crossmodal Research Laboratory has explored the relationship between sound and taste, the way one sense can alter the experience of another. For a communication industry that is primarily visual in its self-conception, what does crossmodal science reveal about the incomplete model brands are using to think about audience experience?
What crossmodal science reveals, at its core, is that the brain is not organized the way the marketing industry is.
Marketing tends to operate in sensory silos. We have a visual team and a content team and an experiential team, and those groups often work in sequence rather than in concert. But the brain doesn’t experience a brand that way. When a consumer encounters your brand in a store, in an ad, on a phone, or at an event, their brain is integrating signals from multiple sensory channels simultaneously, and those signals are actively influencing each other. What you hear changes what you see. What you smell changes what you taste. The sound of a product can literally alter its perceived value, and how much you’re willing to pay for it.
My work with Professor Charles Spence at Oxford’s Crossmodal Research Laboratory, along with colleagues like Janice Wang, has documented this in ways that are both scientifically rigorous and practically useful. We demonstrated, for example, that specific auditory attributes correspond to specific taste qualities. Saltiness has sonic characteristics. Spiciness has sonic characteristics. You can use that knowledge to amplify or modify a flavor experience. We’ve taken those findings out of the laboratory and into real-world activations, helping brands like Propel, Cadbury, Chivas, and Sprite use “sonic seasonings” to make their products taste differently, literally, depending on what the listener hears when they’re consuming a beverage or eating a meal.
For the marketing industry, this can appear on the surface as an abstract curiosity, when in reality, it’s a direct challenge to a fundamental assumption about how brand experiences work. The assumption is this: design each channel for its own modality, then deploy them together. What crossmodal science says is: those channels aren’t separate. They are in constant dialogue in the brain. Every time your brand communicates without sound, or communicates with sound that hasn’t been designed to work with your visual and experiential elements, you are leaving the crossmodal associations to chance.
The practical implication is that brands are routinely underpowering their investments in experience design because they’re thinking about the senses as independent inputs rather than as a system. A well-designed sonic identity doesn’t just make your brand more recognizable. It can change how your product feels, how your space is perceived, how trustworthy your spokesperson sounds. The incomplete model brands are using isn’t wrong, exactly. It’s just limited. It’s single-channel thinking in a multisensory world.
The audio landscape has been substantially restructured by podcasts, voice interfaces, and streaming, environments in which the relationship between a brand and sound is ambient and contextual rather than interruptive and deliberate. How does sonic brand strategy need to evolve to remain coherent in an environment it did not originally design for?
In the days of radio, before the advent of television, audio advertising ruled. We had jingles, we had branded content, we had talk-shows and sponsored listening events. The thirty-second radio spot gave brands a defined container, a captured audience, and a clear beginning, middle, and end. You could target locally, but the real reward was reaching a broad audience and building mental availability, even before we knew to call it that.
Fast forward to today, and we’re seeing an audio renaissance. Screen fatigue has consumers moving away from visually dominant platforms. While terrestrial radio is still a viable media channel, other channels have extended audio’s reach and power. Podcasts are intimate and conversational. Streaming playlists are deeply personal. Voice interfaces have no visual support at all. Smart speakers sit in people’s homes like ambient members of the household. In all of these environments, a brand should consider how they’re heard. Are they contributing to the sonic experience, or are they interrupting it?
The brands that are navigating this well have done something important: they’ve stopped thinking about sonic identity as a system. It’s not simply about having a distinctive audio asset. It’s about having multiple assets and multiple ways of expressing them consistently across a sonic ecosystem. You have a sonic logo that works as a brief identifier. You have a melodic leitmotif that carries emotional weight across longer contexts. You have a brand voice that builds familiarity and trust over repeated interactions. You have soundscapes or sonic environments that work at ambient scale, below conscious attention but above zero. You have functional sounds tied to the sonic DNA of a brand embedded in UX/UI systems. These assets are designed to be modular and flexible, deployable across formats and contexts, but coherent in their combined impression.
The key design principle is what I call “congruency at the system level.” Each sonic asset should be individually recognizable and collectively coherent. McDonald’s has been doing a version of this for two decades. Their five-note signature has appeared in thousands of configurations, from full orchestral arrangements to solo piano, from somber pandemic-era tones to celebratory activations, and it remains immediately and unmistakably them. That’s not luck. That’s a disciplined approach to a sonic experience, with the development of a sonic architecture to support it.
For brands entering audio environments that they didn’t originally design, or for sonic environments they might encounter in the future, the question isn’t “how do we adapt our existing sonic assets to new contexts?” It’s “have we built our sonic identity with sufficient flexibility to live anywhere our audience is, without losing its essential character?” The difference in those questions is that the first is usually born from a place of panic. The second is born from a desire to be prepared, where future adaptation is seen as an opportunity you’ve already planned for.
Sonic identity carries cultural specificity. Musical scales, tonal associations, and rhythmic patterns carry different meanings in different regions. For smaller markets where global sonic conventions are imported rather than developed organically, what is the actual work of building a distinct and locally resonant audio identity?
This is a question I feel particularly passionately about, and it connects to some work I care deeply about: Sonic Diversity.
Let me start with a challenge to the premise. When we talk about “global sonic conventions,” we should be honest about what we usually mean: Western, often American or Northern European, harmonic and rhythmic structures that became globally distributed through the dominance of certain media and entertainment industries. This is not a neutral aesthetic heritage. It’s a cultural export that got mistaken for a universal language.
Let me be clear. Research has shown us that there are generalized universal responses and perceptions related to music and sound. But music is not a universal language. Cultural nuances exist, and they need to be considered.
The actual work of building a distinct and locally resonant audio identity starts with a kind of sonic archaeology. What does this culture’s music history actually sound like? What scales, modes, rhythmic patterns, timbres, and sonic textures are genuinely native to this place? What sounds carry emotional meaning that locals would recognize not because they’ve been told to, but because it’s already in their bones? This is not nostalgia or folk music pastiche. It’s source material.
From that source material, the craft is finding the universal dimensions I mentioned earlier, and expressing them through a genuinely local vocabulary. There’s a meaningful difference between a sonic identity that uses local music as a decorative flavoring and one that is built, at its structural level, from local sonic sensibility. Audiences can feel that difference even if they can’t articulate it.
The advantage smaller markets have, which is often overlooked, is that they are not fighting over the same sonic territory as the global giants. When your competitive frame is local or regional, the opportunity to claim a truly distinctive sonic identity is actually larger, not smaller. You’re not competing with McDonald’s for the right to own five notes and the melody around it. You have the entire sonic landscape of your own cultural heritage to draw from.
The risk, and I see this constantly, is that ambition to look “international” leads brands in smaller markets to sound like diluted versions of global brands rather than authentic expressions of their own identity. That’s the path to invisibility. If you sound like everyone else, you’re not a brand. You’re just noise in a familiar key.
The work is to resist that pressure, and to trust that the most powerful sonic identity is one that sounds unmistakably like itself.
