Before You Bet the Shelf, Check Virtual Research Reliability

Most FMCG launches fail within two years. The ones that survive tend to share one thing in common: someone tested the decision properly before committing the budget.

There is a number that should keep every category manager in Australia awake at night. According to Euromonitor International, a quarter of all new FMCG product launches tracked across key categories in 2023 and 2024 were inactive by the end of that same period. Gone from shelves. Delisted. Money spent on development, packaging, trade marketing and distribution, written off.

The broader picture is no more comforting. Industry research consistently places the failure rate for new grocery products somewhere between 70 and 80 per cent. A 12-year longitudinal study of FMCG launches found that just four per cent of new products were still on shelves after five years. And 75 per cent of consumer packaged goods fail to earn even $7.5 million in their first year on the market.

These are not fringe products from underfunded startups. They include launches from major manufacturers with sophisticated marketing teams, established retail relationships and significant R&D budgets. The problem is rarely the product itself. More often, it is the shelf decision: where the product sits, how it is presented, whether shoppers can find it, and whether the surrounding layout helps or hinders the purchase moment.

This is where virtual store research enters the conversation. It is a powerful method for testing shelf-level decisions under controlled conditions before real money is committed to real stores.

The Question Behind the Question

When a senior leader asks whether virtual store research is reliable, what they are really asking is something more pointed: can I trust this enough to make a decision worth six or seven figures?

It is a fair question. The answer depends not on the technology, but on how the study is designed. A photorealistic virtual aisle is just a container. The reliability comes from what happens inside it: how the question is framed, what is controlled, what is measured, and how the results are interpreted.

Strong virtual research is closer to experimental design than to concept testing. The aim is not to produce a compelling visual for a pitch deck. It is to isolate a specific decision, test it under consistent conditions, and deliver a finding that holds up when someone in procurement or category management pushes back on the numbers.

One Decision at a Time

Most research briefs arrive as a bundle. Improve visibility. Lift conversion. Reduce range friction. Make the shelf easier to shop. A reliable study strips that back to a single, testable question:



Does a revised shelf layout improve findability for a target shopper segment?



Does a new pack design change what shoppers notice first?



Does added signage help navigation, or introduce confusion?



Does moving a product from mid-shelf to eye level shift purchase behaviour?

If the decision is vague, the findings will be too. The discipline of narrowing the brief to one variable is where most of the analytical value is created, and it is the step that gets skipped most often when teams are under time pressure.

Control Beats Realism

Virtual environments can be strikingly realistic. StoreLab maintains more than 150 virtual retail environments and a library of over 40,000 individual 3D product models, producing simulations that closely replicate the look and feel of Australian retail aisles. That realism helps participants behave naturally, which matters. But realism is not what makes a study reliable. Control is.

Reliability depends on keeping the experience consistent for every participant within each test condition. That means holding constant the shelf layout within each test cell, the navigation rules, the task framing and the exposure flow. When these elements are locked down, any difference in behaviour between test conditions can be attributed to the variable being tested, rather than to some uncontrolled artefact of the experience.

StoreLab offers two broad research formats: online simulated shop testing supported by surveys and comparisons, and in-person testing using VR headsets with proprietary 3D eye tracking in a central location format. Regardless of which format is used, the control principles are the same.

What a Reliable Study Looks Like

A well-designed virtual store study follows a three-stage structure: screen, shop, then explain.



Screen confirms the participant matches the target shopper profile before they enter the virtual environment. There is no value in testing a premium skincare layout on someone who has never bought skincare.



Shop places the participant in the virtual aisle with a realistic shopping task and captures what they do: where they look, what they pick up, how long they spend, and what they choose. Behaviour first.



Explain follows the shopping task with a post-shop survey that captures the reasoning behind decisions. This is where qualitative insight adds context to the behavioural data, but it comes second, not first.

This sequence matters because people are unreliable narrators of their own attention. A shopper might report noticing a product they walked straight past. Eye-tracking data and behavioural observation catch what self-reporting misses.

The Mistakes That Undermine Confidence

Even with strong technology, certain design choices will weaken a study to the point where the results become difficult to defend. These are the most common:

Leading the task. Asking participants to “choose the best display” instead of giving them a realistic shopping mission with constraints. The framing shapes the behaviour.

Changing too many variables at once. Testing a new layout, revised range, different pricing and added signage in the same cell makes it impossible to attribute any observed effect to a single cause.

Over-relying on recall. People report seeing things they did not look at. Behavioural measurement needs to come before self-reported opinion.

Skipping the baseline. Showing only the proposed new shelf, with no control condition representing current reality, removes the basis for comparison. Without a baseline, there is no way to measure whether the change improved anything.

Allowing post-hoc interpretation. If success measures are defined after the results come in, the findings will be shaped to fit the preferred narrative. Defining what counts as success before the study runs is the single most important safeguard against stakeholder bias.

When Online Simulation Is Enough, and When VR Earns Its Place

Not every question needs a headset. StoreLab positions its research toolkit across a spectrum, and choosing the right format is part of designing a reliable study.

Online simulation tends to suit direction-setting questions where speed and scale matter. It is effective for narrowing options, eliminating weak concepts early, and testing range or layout decisions where shopper attention is not the primary variable.

In-person VR with eye tracking earns its place when visibility and attention are the core questions, or when the decision carries enough financial weight that stakeholders need the highest level of confidence in the findings. Eye tracking captures what online simulation cannot: precisely where a shopper’s gaze lands, for how long, and in what sequence.

Many teams use a staged approach. Online simulation first, to reduce the field of options. In-person VR second, for the high-stakes decisions where attention data will make or break the business case.

Why This Matters More Than It Used To

Australian retail is operating in a period of concentrated market power, tighter margins and intensifying competition for shelf space. As we explored in The Rehearsal Room, the gap between strategic intent and in-store execution is one of the most expensive problems in the sector. But execution only matters if the underlying decision was sound.

A perfectly executed planogram based on untested assumptions is still a bet. Virtual store research, done properly, converts that bet into an informed decision. It gives category teams, brand managers and commercial directors the evidence they need to walk into a buyer meeting at Woolworths or Coles with something more persuasive than instinct and a mood board.

In a market where a quarter of new launches disappear within months and the cost of a failed shelf reset ripples through supply chains, agencies and retail relationships, the question is no longer whether to test. It is whether your testing is rigorous enough to trust.

The methodology is not glamorous. Baselines, control conditions, single-variable testing and pre-defined success measures do not make for exciting presentations. But they are what separate research that changes decisions from research that confirms what someone already wanted to believe.

And in a sector where the margin between a successful launch and a warehouse full of delisted stock can come down to shelf position, pack visibility or a signage decision made in a meeting room six months earlier, that distinction is worth protecting.

References & Further Reading

Euromonitor International: Spotting Failed New Product Launches: Why It Matters (2025). This independent global market research report tracks the 25 per cent “inactive” rate of new FMCG product launches across digital and physical shelves in 2023 and 2024. Read the analysis here.
Harvard Business School / University of Toronto: The 70 to 80 per cent grocery failure rate is a widely accepted academic benchmark. Foundational research by Harvard Business School Professor Clayton Christensen (creator of the Jobs to Be Done framework) places the broader consumer product failure rate at up to 80 per cent, while specific grocery sector research by Inez Blackburn at the University of Toronto affirms the 70 to 80 per cent failure benchmark for supermarket shelves.
The Ehrenberg-Bass Institute for Marketing Science: This independent academic research institute (based at the University of South Australia) conducts large-scale longitudinal studies on FMCG brand survival, brand loyalty, and the high attrition rates of new product introductions over multi-year periods.
NielsenIQ Breakthrough Innovation Report: A landmark ongoing study by the global leader in FMCG measurement and data. This research established the benchmark that 75 per cent of consumer packaged goods fail to earn $7.5 million during their first year on the market due to a lack of genuine innovation or poor shelf execution.

Before You Bet the Shelf, Check Virtual Research Reliability

Before You Bet the Shelf, Check Virtual Research Reliability

The Question Behind the Question

One Decision at a Time

Control Beats Realism

What a Reliable Study Looks Like

The Mistakes That Undermine Confidence

When Online Simulation Is Enough, and When VR Earns Its Place

Why This Matters More Than It Used To

References & Further Reading

Useful Links

Contact Info