Towards more diversity in movies – Finding content that breaks the norm
By Angnis Schmidt-May, Head of Insights at Ceretai
At Ceretai we love movies. And we love diversity. Our personal utopia is a wide selection of films where all different kinds of stories, characters and perspectives are portrayed. All members of society should have the chance to access movies in which they can recognize part of themselves. Everyone should be able to find screenings out of which they walk empowered by the messages that the movie has sent. Not every movie can do this for everyone – but everyone should always find enough movies that can!
We believe that this is possible – but we are not there yet. In fact, the majority of movie content reinforces stereotypes and norms that have fastened in our society. Ceretai’s goal is to identify movies whose content questions stereotypes and norms – and make this information accessible to the audiences.
To do this, we first must develop a definition for ‘the norm’ in today’s selection of movies. We can then label movie content that challenges or even breaks these norms as ‘norm-breaking’. Of course, norms are very much dependent on the particular society we live in and all of our analyses happen from a Northern-European perspective (for now). It is particularly important to us that the criteria for norms and breaking of norms which come out of our development process will be as objective as possible.
The first basic criteria for movie content that breaks the norm
In order to get a feeling for what could classify as ‘norm-breaking’ film content, we started by reading through hundreds of movie plot summaries. This gave us a first idea on what type of stories appear often and which narratives could be considered as exceptions to these norms.
Based on these initial insights, we set up a first list of criteria, taking our very first step towards identifying norm-breaking movies. The phrasing of these criteria is based on the grounds for discrimination, as defined in Swedish law.
A movie is labelled as ‘norm-breaking’ if it satisfies at least one of the following demands:
- In the centre of the story is a female character and the movie does not focus on her relation to men nor on her role as a mother.
(Example: Pippi Långstrump directed by Olle Hellbom)
- In the centre of the story is a male character who struggles with masculinity or with his role as a man in society.
(Example: Billie Elliot directed by Stephen Daldry)
- In the centre of the story is a character from the LGBTQIA+ community.
(Example: The Danish Girl directed by Tom Hooper)
- In the centre of the story is a character who is of non-white ethnicity or who is portrayed by a person of color.
(Example: Black Panther directed by Ryan Coogler)
- In the centre of the story is a character belonging to a religious group other than Christian or atheist.
(Example: My name is Khan directed by Karan Johar)
- In the centre of the story is a person who is older than 45 years.
(Example: Hundraåringen directed by Felix Herngren)
- In the centre of the story is a character who is disabled or who is portrayed by a person with a disability.
(Example: Les Intouchables directed by Olivier Nakache, Eric Toledano)
Using these very basic criteria, we have manually labelled more than 1000 international and Swedish cinema movies and found that about 25% of all films classified as ’norm-breaking’ according to our basic definition. Consequently, we consider the remaining 75% that do not satisfy any of the criteria as ‘the norm’.
What we have also found is that the amount of norm-breaking film content has increased from 15% to 56% among the 5 best performing movies in Sweden during the last 20 years. This is a first indication that movies that break norms and defy stereotypes are becoming more popular among the audiences!
The next steps – refining the criteria and making norm-breaking movies available to the audiences
You may have noticed that our above notion of ‘breaking the norm’ is still pretty fluffy. We need to improve it and define more precise criteria. And we can do this by using a magic tool called science – but first we need to collect lots of data…
We want to go even deeper into the on-screen content of the movie and into the portrayal of characters. We want to detect imbalances that are sometimes hard to see for the potentially biased human eye. And our software tools based on machine learning will help us with this. They will also act as our data deliverers – you can think of them as robots who do the boring work for us and don’t have a personal opinion about any of the movies.
The data delivered by our software tools will be used to fill a huge database, similar to IMDb but with additional information on diversity, equality and breaking the norm in the movie content.
This data will enable a larger audience to find the movies they want to see more often!
But how? I want more detail…
So our aim is to obtain a more precise and refined version of norm-breaking film content, in addition to the above basic criteria. We also need to perform the analyses in an automated way instead of having to do time-consuming manual work. Our methodology will therefore be based on the scientific evaluation of a large set of movie data.
But let’s take this step by step.
1). The diversity parameters
For each movie, we define various diversity parameters, such as:
- the screen times devoted to women, people older than 45 and persons of color
- the speaking time of women
- imbalances among genders in emotions / facial expressions (e.g. the ’smile factor’ = how much more do women smile on screen than men do?)
- imbalances among genders in filming angles (e.g. how often are people filmed from below/above to appear dominant/subdominant?)
- imbalances among genders in content of dialogues (e.g. which type of frequent words are used by men and women?)
- the appearance of LGBTQIA+ characters, disabled people and members of other underrepresented groups of society
2). The statistical evaluation
Next, we determine ‘the norm’ for these parameters by finding their average values over a large set of movies. In case this norm suggests an imbalance (for example, an average of less than 50% speaking time for women), there will be a well-defined ‘direction of equality’ (in this example: more female speaking time). Movies whose diversity parameter significantly deviates* from the norm in the ‘direction of equality’ will be labelled norm-breaking in regard to this particular diversity parameter.
Let’s give you a more detailed example.
Suppose that by analysing 2000+ movies we find that – on average – women smile twice as much as men in these films. We call this parameter the ‘smile-factor’ whose average value would be 2 in this example. The ‘direction of equality’ points to values for the smile-factor that are smaller than 2, because full equality would correspond to a value of 1 (for which women and men smile equally often). So a movie with smile factor 1.5 lies in the ‘direction of equality’.
Now, whether this movie is labelled to have a norm-breaking smile factor depends on the whole data set: Only if its value of 1.5 lies outside the ‘standard deviation’ around the average, we define it to be norm-breaking. Because in this case we can say with confidence that it truly differs from the average and is not just a statistical fluke. For instance, if the standard deviation for the smile factor turns out to be 0.2 in our data set, then any value smaller than 1.8 would significantly differ from the average value of 2.
Notice that the value of 1.5 in our example (which states that women smile 50% more than men) still does not correspond to full equality. But it is sufficiently far away from the average movie to be considered as norm-breaking, according to our new definition!
* Significant deviation is a well-defined concept in data science and requires the parameter value under consideration to differ from the average more than by an amount called the ’standard deviation’. Just like the average, this number is computed from the entire data set of all movies. The standard deviation is very relevant because it tells you whether you have enough data to make statistical claims (such as “the female speaking time of movie XYZ is 55%, which is significantly higher than that of most other movies”). It is crucial for objectivity!