Monthly Report December 2024

This month, we analysed our six baselines (Sexism, Anti-LGBTQ+, Anti-Muslim, Anti-Refugees/Migrants, Antisemitism, and Anti-Roma) across four European regions (Western Europe, Central and Eastern Europe, Northern Europe, and Southern Europe). This regional focus provides an opportunity to address localised issues, with data enabling comparative insights across these regions.

 

Data for December was based on 1,1 M messages in 24 languages, across 7 social media platforms: Reddit, X, 4chan, Gab, YouTube, TikTok, and Facebook.

 
    • Western Europe 

    This channel contains 343,947 posts, representing a substantial volume of social media content across multiple platforms. The distribution of data shows a clear predominance of content from X (formerly Twitter), accounting for nearly half (49%) of all analysed content, while TikTok contributes one-third (33%) of the dataset. 4chan represents a smaller but significant portion at 6% of the total posts.

    The toxicity analysis reveals notable patterns in content quality and user behaviour across these platforms. While the overall average toxicity score remains relatively low at 0.17, there are 2,876 posts that exceed a high toxicity threshold of 0.8, representing approximately 0.8% of total content. It is particularly noteworthy that these high-toxicity posts show a disproportionate concentration on 4chan compared to other platforms, suggesting this platform may serve as a focal point for problematic content.

    Thematic analysis of the dataset reveals several significant patterns in content distribution. Violence-related content features prominently, appearing in approximately one-fifth (21%) of all messages, indicating a concerning level of violent themes across the monitored channels. Additionally, our analysis shows that toxic content frequently intersects with specific thematic areas, most notably religious discussions (11%) and political discourse (10%). This clustering suggests these topics may serve as particular flashpoints for problematic user interactions.

    The linguistic analysis of toxic content revealed several frequently occurring harmful keywords, including multiple slurs targeting racial, ethnic, and LGBTQ+ communities. The presence and persistence of such terminology, particularly in conjunction with the high prevalence of violent content and the concentration on religious and political topics, suggests potential coordinated patterns of hate speech and extremist rhetoric within this channel.

    • Central and Eastern Europe

    This channel contains 39,040 posts, representing a substantial volume of social media content gathered from multiple platforms. The data collection demonstrates a relatively balanced distribution across major social networks, with X accounting for 31% of the content, followed closely by Facebook at 30%, and Reddit contributing 20% of the posts. This diverse platform representation provides a comprehensive view of online discussions and user behaviour across different social media environments.

    The toxicity analysis reveals generally healthy communication patterns, with an average toxicity score of 0.09, indicating that most interactions remain within acceptable bounds. However, there are areas of concern, as 100 posts exceeded the high toxicity threshold of 0.8. These problematic posts show a disproportionate concentration on 4chan, suggesting platform-specific patterns in controversial content.

    Content analysis reveals that political discussions correlate with elevated toxicity levels, accounting for 8% of concerning content. This pattern aligns with broader trends in social media where political discourse often generates more heated exchanges. Additionally, approximately 5% of messages contain references to violence, representing a notable subset of potentially problematic content.

    The presence of toxic keywords in multiple languages (including terms like kurva, pedofila, meghalt, cata, and zabić) indicates that harmful content crosses linguistic boundaries and suggests a multinational or multicultural user base. This linguistic diversity adds complexity to content moderation challenges and highlights the need for multilingual monitoring approaches.

    When comparing these findings to typical social media behaviour patterns, the overall toxicity levels remain relatively contained. The 0.26% rate of highly toxic posts (100 out of 39,040) suggests that while concerning content exists, it represents a small fraction of total interactions. This provides a baseline for monitoring trends and identifying potential increases in problematic content.

    • Northern Europe


    This channel contains 21,480 posts, representing a focused dataset predominantly sourced from Reddit, which accounts for 62% of the content. The platform distribution shows a clear preference for Reddit-based discussions, followed by X at 22% and Facebook contributing 13% of the posts. This concentration on Reddit suggests the content may reflect the platform's characteristic discussion patterns and community dynamics.

    The toxicity analysis reveals notably healthy communication patterns, with a low average toxicity score of 0.05, significantly below typical social media benchmarks. Only 11 posts exceeded the high toxicity threshold of 0.8, indicating exceptional content quality across the dataset. However, it's worth noting that these highly toxic posts show a disproportionate presence on Facebook, suggesting that platform-specific factors may influence content toxicity.

    Of particular interest is the 6% of messages containing references to violence, representing a moderate level of potentially concerning content. This percentage warrants attention as it indicates a consistent presence of violent themes across the dataset, despite the overall low toxicity scores.

    The identified toxic keywords (etnisk rensning, lort, kuken, analfabeter, and knulla) appear to be predominantly Scandinavian, particularly Swedish, suggesting a geographically or culturally focused user base. This linguistic pattern provides valuable context for understanding the community dynamics and potential cultural factors influencing content patterns.

    The stark contrast between the very low average toxicity score (0.05) and the presence of violent content (6%) presents an interesting pattern. It suggests that while discussions may touch on violent themes, they generally maintain a measured and non-toxic tone. This could indicate either well-moderated discussions of sensitive topics or content that references violence without engaging in toxic behavior.

    The platform distribution, heavily weighted toward Reddit, may explain some of these patterns, as Reddit's community-based moderation system and subforum structure often lead to more regulated discussions. However, the concentration of highly toxic content on Facebook, despite its smaller share of total posts, highlights platform-specific challenges in maintaining content quality.

    • Southern Europe

    This channel contains 28,468 posts, collected across multiple major social media platforms. The data distribution shows X (formerly Twitter) as the primary source, accounting for more than half (54%) of all monitored content, while TikTok contributes nearly one-third (32%) of the dataset. Reddit represents a smaller but notable portion at 6% of the total posts.

    The toxicity analysis reveals relatively moderate patterns in content quality and user behaviour across these platforms. The overall average toxicity score is notably low at 0.12, with only 120 posts exceeding a high toxicity threshold of 0.8, representing approximately 0.4% of total content. Interestingly, these high-toxicity posts show a disproportionate concentration on YouTube compared to other platforms, suggesting this platform may serve as a particular focal point for problematic content within this dataset.

    Thematic analysis of the dataset indicates that violence-related content appears in 9% of all messages, representing a notable but not predominant presence of violent themes across the monitored channels. This level of violent content, while concerning, is significantly lower than what has been observed in some comparable datasets.

    The linguistic analysis of toxic content revealed several frequently occurring harmful keywords in Portuguese and Spanish, including terms targeting Roma communities ("gitano", "gitanos"), religious groups ("mahometano"), and racial and LGBTQ+ slurs ("preto", "viado"). The presence of these terms in multiple languages suggests the monitored content spans different geographic and linguistic regions, particularly focusing on Iberian and Latin American contexts.

    • Western Europe 

    This channel contains 36,236 posts, collected across multiple social media platforms with X being the primary source at 60%, followed by TikTok at 22%, and 4chan at 10%. The data reveals significantly concerning patterns in content toxicity and harmful behaviour.

    The average toxicity score of 0.33 is notably high, indicating substantial problematic content across the dataset. This is further emphasized by the 743 posts that exceeded the high toxicity threshold of 0.8, representing approximately 2% of all content. The concentration of highly toxic content on Gab, despite its smaller representation in the dataset, suggests platform-specific patterns in extremist content.

    Content analysis reveals concerning patterns in discriminatory behavior, with 14% of toxic content involving sexism and 10% relating to political discourse. Most notably, 20% of messages contain references to violence, indicating a significant presence of potentially harmful content. The identified toxic keywords demonstrate explicit hate speech and extremist symbols, suggesting coordinated harmful behavior.

    The platform distribution pattern, heavily weighted toward X, indicates that while harmful content appears across all platforms, mainstream social media remains the primary vector for distribution. The significant presence of such content on TikTok (22%) is particularly noteworthy given the platform's younger user demographic.

    • Central and Eastern Europe 

    This channel contains 4,182 posts, representing a moderate-sized dataset predominantly sourced from X, which accounts for 61% of the content. The platform distribution shows a clear preference for X-based discussions, with Reddit and Facebook contributing smaller but similar proportions at 15% and 14% respectively. This concentration on X suggests the content largely reflects that platform's characteristic discussion patterns and community dynamics.

    The toxicity analysis reveals mixed communication patterns, with an average toxicity score of 0.18, which falls within moderate ranges for social media discourse. Twenty-one posts exceeded the high toxicity threshold of 0.8, representing approximately 0.5% of total content. These highly toxic posts show a disproportionate presence on Reddit, suggesting platform-specific factors may influence content toxicity patterns despite Reddit's smaller share of the overall dataset.

    A notable concern emerges in the form of sexist content, which accounts for 15% of toxic interactions. This relatively high percentage of gender-based discriminatory content presents a significant pattern that warrants attention. In contrast, violence-related content appears in only 4% of messages, indicating a lower prevalence of violent themes compared to discriminatory behavior.

    The identified toxic keywords (kurva, csalás, robot, csúcs, and mizogin) appear to be predominantly Hungarian, suggesting a geographically or culturally focused user base. The presence of "mizogin" (likely related to misogyny) aligns with the high percentage of sexism-related toxic content, indicating a consistent pattern of gender-based hostile behavior within the community.

    The platform distribution dynamics present an interesting contrast, where X dominates the overall content volume but Reddit shows a higher concentration of toxic posts. This suggests that while the majority of discussions occur on X, the community dynamics or moderation practices on Reddit may be contributing to more problematic interactions within that subset of content.

    • Northern Europe

    This channel contains 774 posts, representing a relatively small dataset with content predominantly sourced from X, which accounts for 72% of the collected data. Reddit contributes 22% of the content, while TikTok maintains a minimal presence at 3%. This strong concentration on X suggests the dataset primarily captures discussions and interactions characteristic of that platform's user base.

    The toxicity analysis reveals generally healthy communication patterns, with an average toxicity score of 0.13, indicating relatively mild levels of problematic content. Notably, no posts exceeded the high toxicity threshold of 0.8, suggesting effective content moderation or community self-regulation. However, among the posts with elevated toxicity scores (though below 0.8), TikTok shows a disproportionate concentration relative to its small share of the overall dataset.

    Content analysis reveals concerning patterns regarding gender-based discrimination, with 14% of toxic content involving sexist themes. This relatively high percentage of sexism-related content presents a notable pattern within the community's discourse. Additionally, approximately 6% of messages contain references to violence, representing a moderate level of potentially concerning content.

    The identified toxic keywords (bøsse, transu, bögklubb, pedofiler, and islamism) appear to be predominantly Scandinavian, particularly Swedish and Danish, suggesting a Nordic-focused user base. The nature of these terms indicates that discriminatory content extends beyond gender-based issues to include sexual orientation, religion, and other forms of bias.

    The platform distribution dynamics present an interesting pattern where X dominates the overall content volume, while TikTok, despite its minimal presence, shows a higher concentration of problematic content. This suggests that while the majority of discussions maintain acceptable standards on X and Reddit, the small subset of TikTok content may require closer monitoring.

    • Southern Europe

    This channel contains 7,119 posts, with data collection heavily concentrated on X, which accounts for an overwhelming 89% of the content. The remaining content is distributed between TikTok at 7% and YouTube with a minor 2% share. This strong dominance of X suggests the dataset primarily reflects the discourse patterns and community dynamics specific to that platform.

    The toxicity analysis reveals moderately elevated levels of problematic content, with an average toxicity score of 0.23. Fifty-six posts exceeded the high toxicity threshold of 0.8, representing approximately 0.8% of the total content. While these highly toxic posts appear across platforms, they show a disproportionate concentration on Gab, despite this platform not being among the primary sources of content. This suggests potential cross-platform content sharing patterns, particularly regarding problematic material.

    Gender-based discrimination emerges as a significant concern, with sexist content accounting for 13% of toxic interactions. This relatively high percentage indicates a persistent pattern of gender-based hostile behavior within the community. Additionally, 5% of messages contain references to violence, representing a moderate level of potentially concerning content.

    The identified toxic keywords (viado, matar, puto, putas, and preto) are predominantly Portuguese, suggesting a Brazilian or Portuguese-speaking user base. The nature of these terms indicates that discriminatory content spans multiple forms of bias, including gender-based discrimination and general hostile behavior.

    The platform distribution presents an interesting dynamic where X dominates the overall content volume, while Gab, despite not being a primary content source, shows higher concentrations of toxic posts. This suggests that while the majority of discussions occur on mainstream platforms, more problematic content may be shared or referenced from alternative platforms.

    • Western Europe 

    This channel contains 120,037 posts. Data collection predominantly focused on X (69%), followed by TikTok (22%), and 4chan (3%). Analysis of content toxicity revealed an average score of 0.26 across all platforms, with 1,232 posts exceeding a toxicity threshold of 0.8. The distribution of highly toxic content showed a notable concentration on 4chan compared to other platforms.

    Content analysis identified several concerning trends in the toxic posts. Religious intolerance emerged as the primary theme, accounting for 23% of toxic content, while racially discriminatory content represented 11%. A significant portion of the messages (31%) contained references to violence.

    The presence of harmful language and slurs was documented across platforms, with certain derogatory terms appearing frequently in toxic posts. This pattern suggests coordinated messaging or the amplification of harmful narratives within specific community segments.

    This observation points to the persistence of toxic behavior across social media platforms, though with varying intensity levels. The higher concentration of toxic content on 4chan compared to mainstream platforms like X and TikTok indicates platform-specific behavioral patterns, possibly influenced by different content moderation approaches and community standards.

    The relatively lower overall toxicity score (0.26) suggests that while concerning content exists, it represents a minority of the total communications. However, the presence of highly toxic content (toxicity ≥0.8) in over a thousand posts indicates persistent challenges in maintaining healthy online discourse across these platforms.

    • Central and Eastern Europe 

    This channel contains 4,488 posts, with data primarily sourced from X (62%), while TikTok (19%) and Facebook (13%) represent secondary data sources. The analysis reveals a relatively low average toxicity score of 0.14 across all platforms, with only 21 posts exceeding the high toxicity threshold of 0.8.

    An interesting pattern emerged in the platform distribution of toxic content, with Reddit showing a higher frequency of highly toxic posts despite not being among the top three data sources. This suggests that while Reddit represents a smaller portion of the dataset, it contains a disproportionate amount of toxic content.

    Content analysis reveals that toxic posts primarily cluster around two main themes: racism (12%) and political discourse (11%). The intersection of these themes suggests a connection between political discussions and discriminatory content. Additionally, references to violence appear in 10% of the messages, indicating a concerning overlap between ideological content and aggressive rhetoric.

    Linguistic analysis of toxic content revealed recurring terms related to conspiracy theories and xenophobic sentiments, particularly focusing on anti-masonic narratives and anti-immigration rhetoric. The presence of terms in multiple languages (including Polish terms like "masonów" and "nachodźców") suggests this discourse spans across different linguistic communities.

    The relatively low average toxicity score (0.14) indicates that the majority of conversations maintain a civil tone. The small number of highly toxic posts (21 with scores ≥0.8) suggests that extreme content represents a minimal portion of the overall discourse. However, the concentration of these posts on specific platforms and their thematic focus on racism and politics points to potential echo chambers where such content may be amplified.

    • Northern Europe

    This channel contains 3,072 posts, with X emerging as the primary data source (71%), complemented by Facebook (17%) and Reddit (10%). The analysis reveals a notably low average toxicity score of 0.11, with only 16 posts exceeding the high toxicity threshold of 0.8, indicating generally healthy discourse across the platforms.

    Despite Reddit's relatively small share of the overall dataset (10%), it shows a disproportionate concentration of highly toxic content. This pattern suggests that while Reddit represents a minor portion of the data, it may harbor more intense discussions that tend toward toxic discourse.

    The thematic analysis of toxic content reveals two predominant categories: religious-themed content (15%) and political discussions (10%). The presence of violence-related content in 12% of messages indicates a concerning overlap between ideological discussions and aggressive rhetoric. This intersection suggests a complex relationship between religious discourse, political debate, and violent narratives.

    Linguistic analysis of toxic content revealed several recurring terms in Danish, including derogatory language and references to ethnic cleansing ("etnisk rensning"). The presence of terms related to Islamism and ethnic discrimination suggests an ongoing narrative focusing on religious and ethnic tensions. The multilingual nature of these toxic keywords (including Danish terms like "lort" and "trækker") indicates that these discussions transcend language barriers and may be part of broader regional discourse patterns.

    The relatively low average toxicity score (0.11) suggests that most interactions maintain a civil tone. The limited number of highly toxic posts (16 with scores ≥0.8) indicates that extreme content represents a small fraction of the overall discourse. However, the concentration of these posts on Reddit and their focus on religious and political themes points to potential echo chambers where such content may find reinforcement.

    • Southern Europe

    This channel contains 7,494 posts, with X serving as the dominant data source (79%), followed by TikTok (15%) and YouTube (5%). The analysis reveals a moderate average toxicity score of 0.28, with 135 posts surpassing the high toxicity threshold of 0.8, indicating pockets of concerning content within the dataset.

    An interesting pattern emerges regarding platform distribution: while Gab is not among the top three data sources, it shows a higher concentration of highly toxic posts. This suggests that although Gab represents a smaller portion of the dataset, it harbors more extreme content compared to mainstream platforms.

    The thematic analysis reveals that religious-themed toxic content accounts for 10% of flagged posts. However, the most striking finding is the high prevalence of violence-related content, present in 21% of messages. This significant proportion of violent content suggests a concerning trend in the discourse.

    Linguistic analysis of toxic content revealed several recurring terms in Spanish and Portuguese, including derogatory language targeting religious and racial identities. The presence of terms like "mahometano" suggests an anti-Islamic narrative, while other toxic keywords indicate racial prejudice and general hostility. The multilingual nature of these toxic terms (including "moro," "negro," "balle," and "puto") points to harmful narratives circulating across Spanish and Portuguese-speaking communities.

    The relatively high number of extremely toxic posts (135 with scores ≥0.8) compared to the overall message volume suggests a more significant challenge with content moderation in this channel. While these posts represent a small percentage of total content, their concentration on alternative platforms like Gab indicates potential echo chambers where extreme viewpoints may be amplified.

    • Western Europe 

    This channel contains 28,395 posts, with X representing the primary data source (72%), while Facebook (7%) and TikTok (6%) contribute smaller portions of the dataset. Analysis reveals a moderate average toxicity score of 0.24, with 272 posts exceeding the high toxicity threshold of 0.8.

    Platform distribution analysis shows that while 4chan is not among the top three data sources, it exhibits a disproportionately high concentration of toxic content. This suggests that although the platform represents a smaller share of the overall dataset, it contains more extreme content compared to mainstream social media platforms.

    The thematic analysis reveals that politically-oriented toxic content accounts for 16% of flagged posts, while racially discriminatory content represents 10%. Most notably, violence-related content appears in 38% of messages, indicating a significant prevalence of aggressive rhetoric within the discourse.

    Linguistic analysis of toxic content identified recurring derogatory terms targeting racial and ethnic groups. The use of quotation marks around "migrants" suggests a pattern of delegitimizing or hostile rhetoric toward immigrant communities. The presence of hashtags like "#endtimes" indicates an apocalyptic narrative framework potentially being used to justify or amplify hostile content.

    The high proportion of violence-related content (38%) combined with the significant number of extremely toxic posts (272 with scores ≥0.8) points to concerning patterns in this channel. While these highly toxic posts represent a small percentage of total content, their concentration on 4chan suggests the presence of echo chambers where extreme viewpoints may be reinforced.

    • Central and Eastern Europe 

    This channel contains 3,686 posts, with X serving as the primary data source (52%), while Facebook (20%) and TikTok (12%) represent secondary sources. The analysis reveals a relatively low average toxicity score of 0.19, with only 16 posts exceeding the high toxicity threshold of 0.8.

    A notable pattern emerges in the platform distribution of toxic content, with Facebook showing the highest frequency of highly toxic posts among all platforms. This is particularly interesting given Facebook's position as the second-largest data source (20%), suggesting a correlation between platform usage and toxic content in this case.

    The thematic analysis reveals that politically-oriented toxic content accounts for 12% of flagged posts. The presence of violence-related content in 11% of messages indicates a moderate level of aggressive rhetoric, though lower than seen in some other channels.

    Linguistic analysis of toxic content revealed a multilingual pattern of concerning terms, spanning Polish ("masoneria," "terrorysta") and Ukrainian/Russian ("спецоперация," "ухилянт," "ухылянт"). The presence of terms related to masonry conspiracies, terrorism, and military operations ("спецоперация" - special operation) suggests a focus on geopolitical tensions and conspiracy narratives in Eastern Europe.

    The relatively low number of highly toxic posts (16 with scores ≥0.8) indicates that extreme content represents a minimal portion of the overall discourse. However, their concentration on Facebook, combined with the political nature of toxic content, suggests potential echo chambers where such narratives may find reinforcement.

    • Northern Europe

    This channel contains 1,412 posts, with X emerging as the predominant data source (78%), complemented by Reddit (12%) and Facebook (8%). The analysis reveals a relatively low average toxicity score of 0.19, with only a single post exceeding the high toxicity threshold of 0.8, indicating generally healthy discourse across the platforms.

    Notably, X shows the highest frequency of toxic content, which aligns with its position as the primary data source. This suggests that while the platform hosts the majority of conversations, it also contains most of the problematic content, though at a very low rate given the single highly toxic post.

    The thematic analysis identifies two main categories of toxic content: racially-oriented content (16%) and political discussions (15%). The close percentage between these categories suggests a potential intersection between racial and political discourse. Violence-related content appears in 13% of messages, indicating a moderate presence of aggressive rhetoric.

    Linguistic analysis of toxic content revealed terms primarily in Finnish and Swedish, including derogatory language and references to sexual violence ("joukkoraiskaus" - gang rape). The presence of terms like "valboskap" (voting cattle) and "analfabeter" (illiterates) suggests a narrative that combines political criticism with derogatory characterizations. The Finnish expletive "vittu" appears frequently in hostile exchanges.

    The remarkably low number of highly toxic posts (just one with a score ≥0.8) suggests that while concerning topics are discussed, the conversation generally maintains a civil tone. However, the presence of racially and politically charged content, combined with references to violence, indicates underlying tensions in the discourse.

    • Southern Europe

    This channel contains 6,463 posts, with X serving as the primary data source (54%), followed by significant contributions from TikTok (32%) and YouTube (11%). The analysis reveals a relatively low average toxicity score of 0.19, with 31 posts exceeding the high toxicity threshold of 0.8.

    An interesting pattern emerges in the platform distribution: while Reddit is not among the top three data sources, it shows a higher concentration of highly toxic posts. This suggests that although Reddit represents a smaller portion of the dataset, it contains more extreme content compared to mainstream platforms.

    Violence-related content appears in 13% of messages, indicating a moderate level of aggressive rhetoric within the discourse. This percentage, while concerning, is lower than seen in some other channels.

    Linguistic analysis of toxic content revealed a notably diverse multilingual pattern, spanning Italian ("cazzo," "immigrati clandestini"), Spanish ("sinpapeles"), and Greek ("λάθρο"). The recurring themes in these toxic keywords center around unauthorized immigration, with terms specifically targeting undocumented migrants in different languages ("immigrati clandestini," "sinpapeles" - both referring to undocumented immigrants). The presence of general insults like "idiota" across linguistic boundaries suggests common patterns of hostile discourse.

    The relatively low average toxicity score combined with the small number of highly toxic posts (31 with scores ≥0.8) indicates that extreme content represents a minimal portion of the overall discourse. However, the concentration of these posts on Reddit, along with the multilingual nature of anti-immigration rhetoric, suggests coordinated narrative patterns across Southern European language communities.

    • Western Europe 

    This channel contains 90,207 posts, with X representing the dominant data source (70%), followed by 4chan (16%) and TikTok (5%). The analysis reveals a concerning average toxicity score of 0.33, with a substantial 3,458 posts exceeding the high toxicity threshold of 0.8.

    Platform distribution analysis shows that 4chan, while representing the second-largest data source, exhibits a disproportionately high concentration of toxic content. This suggests that the platform plays a significant role in hosting and potentially amplifying extreme content compared to mainstream social media platforms.

    The thematic analysis reveals that racially discriminatory content accounts for 17% of flagged posts, while politically-oriented toxic content represents 11%. Most alarmingly, violence-related content appears in 45% of messages, indicating a very high prevalence of aggressive rhetoric within the discourse.

    Linguistic analysis of toxic content identified recurring derogatory terms targeting specific ethnic and religious groups. The pattern of hostile rhetoric appears to be particularly focused on antisemitic content (as evidenced by phrases like "jews hate") and racial discrimination.

    The high proportion of violence-related content (45%) combined with the significant number of extremely toxic posts (3,458 with scores ≥0.8) points to concerning patterns in this channel. While these highly toxic posts represent a small percentage of total content, their volume and concentration on 4chan suggest the presence of established networks where extreme viewpoints are normalized.

    • Central and Eastern Europe 

    This channel contains 9,422 posts, with X being the predominant data source (82%), followed by Facebook (10%) and YouTube (4%). The analysis reveals a moderately elevated average toxicity score of 0.30, with 153 posts exceeding the high toxicity threshold of 0.8.

    Platform distribution analysis indicates that Facebook, despite representing only 10% of the dataset, shows the highest frequency of highly toxic content. This suggests that while Facebook hosts a smaller portion of the overall discourse, it contains more concentrated pockets of extreme content compared to other platforms.

    The thematic analysis identifies racial discrimination as a significant component, accounting for 9% of toxic content. Violence-related content appears in 13% of messages, indicating a moderate level of aggressive rhetoric within the discussions.

    Linguistic analysis of toxic content revealed a pattern of concerning terms across multiple Slavic languages, including Polish ("zdrajców" - traitors, "żydy"), Czech/Slovak ("židi"), German ("jude"), and Russian ("капитализм" - capitalism). The recurring presence of terms targeting Jewish people in multiple languages suggests a coordinated cross-linguistic pattern of antisemitic discourse. The inclusion of terms related to "traitors" and "capitalism" indicates a narrative that combines antisemitism with political and economic grievances.

    The relatively high number of extremely toxic posts (153 with scores ≥0.8) combined with their concentration on Facebook suggests the presence of echo chambers where such content may find reinforcement. While these posts represent a small percentage of total content, their multilingual nature points to coordinated narrative patterns across Eastern European communities.

    • Northern Europe

    This channel contains 1,675 posts, with X dominating the data sources (90%), followed by much smaller contributions from Reddit (5%) and Facebook (3%). The analysis reveals a moderate average toxicity score of 0.26, with only 2 posts exceeding the high toxicity threshold of 0.8.

    Despite Reddit's minimal share of the dataset (5%), it shows a disproportionate concentration of highly toxic content. This pattern suggests that while Reddit represents a small portion of the conversations, it may harbor more intense discussions that tend toward extreme content.

    The thematic analysis reveals three overlapping categories of toxic content: religious-themed content (17%), racial discrimination (13%), and political discussions (12%). The presence of violence-related content in 17% of messages indicates a concerning intersection between ideological discussions and aggressive rhetoric.

    Linguistic analysis of toxic content revealed several recurring terms in Swedish, including references to ethnic cleansing ("etnisk rensning"), religious extremism ("islamism"), and political criticism ("sossarna" - derogatory term for Social Democrats). The presence of terms like "pedofiler" (pedophiles) and "slaktade" (slaughtered) suggests narratives that combine accusations of sexual deviancy with violent rhetoric.

    The remarkably low number of highly toxic posts (only 2 with scores ≥0.8) indicates that while concerning topics are discussed, the conversation generally maintains a relatively civil tone. However, the overlap between religious, racial, and political toxicity, combined with the presence of violent rhetoric, suggests underlying tensions in the discourse.

    • Southern Europe

    This channel contains 7,136 posts, with X serving as the primary data source (80%), complemented by YouTube (12%) and TikTok (7%). The analysis reveals a moderate average toxicity score of 0.28, with 72 posts exceeding the high toxicity threshold of 0.8.

    Platform distribution analysis shows an interesting pattern: while 4chan is not among the top three data sources, it exhibits the highest concentration of highly toxic posts. This suggests that although 4chan represents a smaller portion of the dataset, it contains more extreme content compared to mainstream platforms.

    The presence of violence-related content in 18% of messages indicates a concerning level of aggressive rhetoric within the discourse. This percentage, while significant, is lower than seen in some other monitored channels.

    Linguistic analysis of toxic content revealed a pattern of Spanish and Portuguese terms focused on religious and ethnic discrimination. The recurring terms include anti-Islamic rhetoric ("mahometano," "sunita"), derogatory references to North Africans ("moros"), and antisemitic content combining anti-Zionist narratives ("ebreo sionista"). The presence of terms like "genocidas" suggests a narrative framework that associates certain groups with extreme violence.

    The relatively high number of highly toxic posts (72 with scores ≥0.8) combined with their concentration on 4chan indicates the presence of spaces where extreme viewpoints may be amplified. While these posts represent a small percentage of total content, their thematic focus on religious and ethnic discrimination suggests coordinated narrative patterns.

    • Western Europe 

    This channel contains 28,395 posts, with content heavily concentrated on X, which accounts for 72% of the dataset, while Facebook and TikTok contribute smaller proportions at 7% and 6% respectively. This strong dominance of X suggests the dataset primarily reflects discourse patterns specific to that platform's user base.

    The toxicity analysis reveals concerning patterns, with a relatively high average toxicity score of 0.24. A total of 272 posts exceeded the high toxicity threshold of 0.8, representing approximately 1% of total content. These highly toxic posts show a disproportionate concentration on 4chan, despite this platform not being among the top three sources of content, suggesting strategic platform selection for more extreme content.

    Content analysis reveals multiple concerning patterns. Political content features prominently in toxic messages, accounting for 16% of problematic content, while racist content comprises 10% of toxic interactions. Most notably, 38% of messages contain references to violence, representing an alarmingly high level of hostile content that warrants immediate attention.

    The identified toxic keywords combine general extremist terminology with specific anti-migrant rhetoric, including explicit racial slurs, dehumanizing terms targeting migrants, and apocalyptic hashtags. This combination suggests organized harmful behavior targeting multiple groups, with a particular focus on racial and migrant-related discrimination.

    The platform distribution dynamics present an interesting pattern where X dominates the overall content volume, while 4chan shows higher concentrations of toxic posts. This suggests a deliberate cross-platform strategy where mainstream platforms are used for broader content distribution while alternative platforms host more extreme content.

    • Central and Eastern Europe 

    This channel contains 3,124 posts, with content distribution spread across multiple platforms, though X remains the primary source at 52% of the dataset. TikTok and Facebook share equal representation at 14% each, indicating a more balanced platform distribution compared to typical social media datasets where X tends to dominate more heavily.

    The toxicity analysis shows moderate levels of concerning content, with an average toxicity score of 0.19. Twenty-seven posts exceeded the high toxicity threshold of 0.8, representing approximately 0.9% of total content. These highly toxic posts show a disproportionate concentration on Reddit, despite this platform not being among the top three sources of content. This suggests platform-specific factors may influence content toxicity patterns.

    Content analysis reveals that racist content accounts for 14% of toxic interactions, indicating a persistent pattern of racial discrimination within the dataset. The presence of violence-related content is relatively low at 5% of messages, suggesting that while discriminatory content exists, it rarely escalates to violent themes.

    The identified toxic keywords (țigan, tigan, czarni, țigancă, and cyganom) appear in multiple Eastern European languages, including Romanian (țigan, țigancă), Polish (czarni, cyganom), and variations in spelling (tigan). These terms predominantly target Roma communities, indicating focused ethnic discrimination in Eastern European social media spaces.

    The platform distribution dynamics present an interesting pattern where content is spread across multiple platforms rather than being heavily concentrated on one. However, the emergence of Reddit as a hotspot for highly toxic content, despite not being among the primary content sources, suggests strategic platform selection for more extreme content.

    • Northern Europe

    This channel contains 682 posts, representing a relatively small dataset with content heavily concentrated on X, which accounts for an overwhelming 89% of the content. Reddit contributes a minor 8% share, while TikTok maintains a minimal presence at just 1% of posts. This strong dominance of X suggests the dataset primarily captures discussions specific to that platform's user base.

    The toxicity analysis reveals notably healthy communication patterns, with a remarkably low average toxicity score of 0.06, indicating minimal problematic content across the dataset. Only a single post exceeded the high toxicity threshold of 0.8, representing just 0.15% of the total content. While this highly toxic post appeared on X, this aligns with the platform's dominant share of overall content rather than suggesting platform-specific issues.

    Content analysis reveals an interesting contrast: despite the low overall toxicity score, racist content accounts for 21% of toxic interactions when they do occur. This relatively high percentage suggests that while problematic content is rare, it tends to manifest specifically in racial discrimination when it appears. However, violence-related content is notably minimal, present in only 1% of messages, indicating that hostile content rarely escalates to threats or violent themes.

    The identified toxic keywords (mustalaiset, sigøjnere, pili, pohui, and mustalainen) appear predominantly in Finnish (mustalaiset, mustalainen) and Danish (sigøjnere), suggesting a Nordic focus with particular emphasis on terms related to Roma communities. This linguistic pattern indicates that when discriminatory content does appear, it often targets specific ethnic groups using local language terminology.

    The platform distribution presents an interesting dynamic where X overwhelmingly dominates the conversation, while other platforms play minimal roles. This concentration on X, combined with the very low toxicity scores, suggests effective content moderation or community self-regulation within this specific discourse community.

    • Southern Europe

    This channel contains 3,580 posts, demonstrating a relatively balanced distribution across major social media platforms, with X accounting for 45% of the content, followed by TikTok at 30%, and YouTube contributing a significant 21% of the posts. This more even platform distribution differs from typical patterns where X tends to dominate content volume more heavily.

    The toxicity analysis reveals concerning patterns, with an elevated average toxicity score of 0.31, indicating substantial problematic content across the dataset. Of particular note, 88 posts exceeded the high toxicity threshold of 0.8, representing approximately 2.5% of total content. These highly toxic posts show a disproportionate concentration on YouTube, despite it being the third most common platform in the dataset, suggesting platform-specific factors may influence content toxicity.

    Content analysis identifies racism as a significant concern, with 15% of toxic content involving racial discrimination. This relatively high percentage indicates a persistent pattern of race-based hostile behavior within the community. Additionally, approximately 10% of messages contain references to violence, representing a moderate level of potentially concerning content that warrants monitoring.

    The identified toxic keywords (gitanos, os ciganos, concha, μαϊμού, and invasores) appear in multiple languages, including Spanish, Portuguese, and Greek, suggesting a Southern European focus with particular emphasis on anti-Roma sentiment. The nature of these terms indicates that discriminatory content specifically targets certain ethnic communities, with terms related to both ethnicity and migration status.

    The platform distribution dynamics present an interesting pattern where content is more evenly spread across platforms compared to typical social media datasets. However, the concentration of highly toxic content on YouTube, despite its smaller share of overall posts, suggests that video content may play a particular role in the spread of discriminatory messaging. This could indicate the strategic use of platform-specific features for content dissemination.

 
Previous
Previous

The Great Replacement Theory: From Fringe Conspiracy to Mainstream Narrative

Next
Next

Meme Warfare: Luigi Mangione and the Mainstreaming of Saints Culture