Vox Pop: Social data isn't representative of the general population, so how do you ensure that your research isn't biased?
This week, we asked leading players in the research industry to discuss the unrepresentative nature of social data and how they ensure that their research isn't biased. Here is a selection of the best responses...
Steven Ginnis, Research Director, Head of Innovation Ipsos
…By understanding the context in which the social media conversation has taken place, by having an honest and open dialogue with the client, by exploring the dynamics of the dataset, and by combining humans and machines.
During data collection, we’re careful not to introduce bias in to query development. This includes being open about whether we want our query to be more specific (and lose content) or more open and introduce noise), and iteratively test queries to improve accuracy.
During analysis and interpretation, we are clear about which members of the public had the opportunity to contribute by tracking trends in social media use across platforms. We also explore the use dynamic within any dataset, considering the extent to which the conversation is driven by a small number of high-frequency users or whether the distribution is more equal between a larger number of voices. This dynamic will change between projects. Best practice would also seek to remove Bots and distinguish between individuals and institutions.
Finally, we utilize the power of ‘humanized AI’, harnessing the best of technology but not relying on fully automated solutions to ensure we don’t introduce algorithmic bias.
Margaret Amein, Founder, Blue Java Insights
It’s true that Social Media appears to be a quantitative data source which should give us black and white numbers.
Yet, it isn’t. Social Media simply can’t deliver representative results the way a survey can.
Instead, the real value of Social Media is that it’s a rare, high volume resource of contextual, descriptive information. In other words - it’s an incredible, source of strategic audience and market insights.
Where else would you ever be able to so easily:
1. See authentic audience conversations - in that volume - about your vertical?
2. Understand how buying decisions are made by your audience over time?
3. Learn about audience pain points?
4. Learn about your actual (or target) competitors?
As with any type of other contextual, descriptive information, these types of social findings can be validated with a statistically representative survey - if needed.
Otherwise, for an added bonus to its real-world insights advantage, Social Media Intelligence is one of the quickest and least expensive audience insights methods available.
Eric Michelson, Social Listening Insights Development Expert, Aetna
Social listening never suggests itself to be representative. So, in that regard, it’s not an issue. I go where the discussion leads me. And it can take me far afield.
There is a bias in research towards looking at larger communities because, under the pressure of time, or just sheer laziness, the pickings are easiest. Often it’s the tangential discussion, which is harder to find, that has the tastiest fruit. I advise that a complete review always takes time. I’m lucky and grateful to work with people that understand.
Business culture bias also has to be dealt with. Businesses have their own language, priorities, and way of deconstructing their environments. The taxonomy and priorities of genuine humans and their discussions are different. There is some overlap, of course. But the bias of business culture can be hard to crack when trying to get the voice of the people we serve understood.
Matt Dodd, Managing Director, Analytics, Media & Digital, UK & Ireland, Kantar
The assumption that panels are representative of the general population and social isn’t is a red herring. As has been reported in recent years, panels have increasingly struggled to represent the general feeling of a population, so overall any data source has biases that need to be considered and taken into consideration when doing research.
If as a researcher or purchaser of social research, you focus on representability then you miss the bigger ‘watch out’ for social data which is that it’s text and images which are much harder to count properly than survey; models are never 100% accurate. This means that even if you’re able to be 100% representative, your insights will be meaningless. We handle both these issues in the following ways:
1/Data curation is more important than data analysis. If you are not spending at least 50% of your time on this stage in the process, your work will fail due to the old adage “garbage in, garbage out”. As such, we ensure all social projects meet certain standards for relevancy (70-90%), descriptor accuracy (70%+), and representativeness, serving as the basis for confident analyses and the discovery of insights.
2/Data Triangulation: To identify and handle biases it’s our role as insights specialists to triangulate it with other sources to ensure greater representativity. A great example of this was work we were doing for a global snacking client where we saw hidden consumptions moments from our World Panel data set which were not appearing in the social space. This was because no young mum would tweet/post about having an indulgent chocolate occasion at the school gates at pick up time.
Dr Jillian Ney, Digital Behavioural Scientist, The Social Intelligence Lab
It is true that social data is not representative of the underlying general population. This means that social data may not be suitable for research where representativeness is important to understand population behavior, for example in politics and election forecasting or gaining insight into attitudes, sentiments or activities of large populations. There have been several studies that suggest that social media users are younger, more highly educated and liberal than non-users (see OxIS Survey, Mellon and Prosser 2017) which can lead to bias in generalised population studies. However, social data can be used in combination with other sources to understand general population behavior.
It is more likely that researchers will be using social data to understand preferences around brands, customer journies, consumption preferences, and patterns of consumption. While bias still exists in the data sets, and collecting more social data will not help reduce the bias, you should explore the data source and audience composition. It is possible to answer questions using social data. When analyzed qualitatively, you start to find more emotion in the data set compared to other methods, like focus groups.
It's also important to understand what is being discussed online due to interpersonal influence. There is research exploring the impact of interpersonal influence of social data on social media users' attitudes and behaviors. In what could be described as an unethical study, Facebook reported that emotional cognition existed on their platform (Kramer, 2014). This study suggests that while social data is not representative of the population as a whole, the opinions and stories have the potential to sway other peoples behaviors and decisions - making social data analysis important for brands.
This interview was recorded via LinkedIn Live, if you prefer to view on LinkedIn, click the button below.
View Interview