CS 260 course banner
Spring 2026 · Graduate Course · UCR

Computational Cybersafety

This course introduces computational methods for understanding and addressing safety issues in Web-based systems, with emphasis on online behavior, harmful content, platform dynamics, and data-driven mitigation.

Instructor
Emiliano De Cristofaro
Sessions
MW 9:30-10:50
Location
Skye 170
Office Hours
TBD

Course overview

The class aims to provide graduate computer science students and those in related programs with proficiency in computational methods to understand and address safety issues in Web-based systems.

Students will develop familiarity with Web ecosystems, social networks, and online behavior; definitions and dynamics of safety issues; qualitative and quantitative research methods; and the use of advanced data structures, data science, and machine learning for analysis and mitigation.

The course blends conceptual foundations, case studies, discussion, and project-oriented work in cybersafety.

Grading

Active Participation20%
Presentations40%
Final Project40%

Course details

  • Course: CS 260 – Computational Cybersafety
  • Prerequisites: CS 141 (or equivalent). Also recommended: CS 170 and/or CS 171 (or equivalent)
  • Additional support: Academic Resources Center (ARC), 156 Skye Hall

Learning goals

Understand ecosystemsStudy Web platforms, social networks, and online behavior through a computational lens.
Analyze safety issuesExamine how harmful behaviors emerge, spread, and evolve in networked systems.
Apply methodsUse qualitative, quantitative, and machine learning methods for cybersafety research.
Design mitigationsEvaluate and propose data-driven approaches to reduce online harm.

Schedule

Search by topic or filter by session type. All entries are currently tentative.

Project Examples

1. Do Aggression Classifiers Age Well? Reproducing Mean Birds and Benchmarking Modern Methods

Core papers: Chatzakou et al., "Mean Birds: Detecting Aggression and Bullying on Twitter" (WebSci 2017); Chatzakou et al., "Detecting Cyberbullying and Cyberaggression in Social Media" (TWEB 2019)

Reproduce the feature engineering pipeline (lexical, network, user-level features) and classifier from Mean Birds. Then train the same model on the Twitter Abusive Behavior dataset and evaluate cross-dataset: how well does a model trained in 2016 generalize to a later dataset? Finally, swap the classical classifier for a zero-shot LLM baseline (e.g., prompted GPT-4o-mini or a locally run Llama-3 model via Ollama) and compare F1 across all three conditions.

2. On the Origins of Memes: Reproducing and Comparing Hashing Methods

Core paper: Zannettou et al., "On the Origins of Memes by Means of Fringe Web Communities" (IMC 2018)

The paper uses perceptual hashing (pHash) to track how the same image propagates across platforms. Students reproduce the platform-influence analysis: which platform "sources" most memes and which are primarily "sinks"? The extension replaces pHash with CLIP embeddings and measures how often the two methods agree. No new image crawling needed — the dataset already contains the hashes and metadata.

3. What is Gab? Reproducing the Echo Chamber Analysis

Core paper: Zannettou et al., "What is Gab? A Bastion of Free Speech or an Alt-Right Echo Chamber?" (CyberSafety 2018)

Reproduce the paper's three main analyses: (a) basic community characterization (post volume, user activity distribution, topics via LDA), (b) hate speech prevalence using the HatEval lexicon or Perspective API, and (c) network structure (follow graph modularity, echo chambers via community detection). The extension compares Gab's hate speech prevalence and network structure to the 4chan /pol/ dataset from the same period.

4. Is it a Qoincidence? Reproducing QAnon Community Analysis Across Two Platforms

Core papers: Papasavva et al., "'Is it a Qoincidence?': An Exploratory Study of QAnon on Voat" (WWW 2021); Papasavva et al., "The Gospel According to Q" (ICWSM 2022)

The two papers study QAnon on Voat and 4chan using different methods. Bridge them: (a) reproduce the Voat paper's topic and sentiment characterization of QAnon subverses, (b) apply the same pipeline to the 4chan dataset filtered by Q-related keywords, and (c) compare how fast and faithfully each platform reacts to canonical "Q drops."

5. Reproducing the Parler Dataset Characterization and Comparing to Gab

Core paper: Aliapoulios et al., "A Large Open Dataset from the Parler Social Network" (ICWSM 2021)

The Parler paper characterizes the platform but leaves many analytical questions open. Students (a) reproduce baseline statistics (activity over time, user/post distributions, top hashtags, hate speech prevalence), then (b) perform a structured comparison with Gab: hate speech rates, activity distributions, shared news domains, and timelines relative to shared political events (e.g., the 2020 election).

6. Raiders of /pol/: Fringe Influence and Longitudinal Shifts in 4chan

Core papers: Hine et al., "Kek, Cucks, and God Emperor Trump" (ICWSM 2017); Papasavva et al., "Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts" (ICWSM 2020)

The 2017 paper characterized /pol/ over 2.5 months; the 2020 dataset covers 3.5 years but did not re-run the full analysis. Students (a) reproduce the 2017 core measurements on the 2016 slice, (b) run the same measurements on later 1-year slices, and (c) test whether toxicity and political focus changed over time — e.g., did /pol/ become more or less toxic after the 2016 election?

7. Auditing Algorithmic Visibility on Reddit

Core paper: Galeazzi et al., "Revealing The Secret Power: How Algorithms Can Influence Content Visibility on Twitter/X" (NDSS 2026)
Related work: Hounsel et al., "Analyzing the Impact of Video Quality on User Engagement" (NSDI 2020)

Adapt the "p-score" visibility metric from the core paper to Reddit. Compare the reach of posts containing external news links versus internal self-posts across a matched set of subreddits. Determine whether Reddit's ranking algorithm penalizes external URLs similarly to the findings on X, and whether the effect varies by subreddit type.

8. Longitudinal Toxicity Trends in Fringe Communities

Core paper: Papasavva et al., "Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts" (ICWSM 2020)
Related work: Hine et al., "Kek, Cucks, and God Emperor Trump" (ICWSM 2017)

Apply HateBERT or Perspective API to sampled blocks from 2017 and 2020 slices of the dataset. Compare the evolution of toxicity baselines and test whether radicalization follows a steady upward trend or is driven by specific event spikes (elections, platform bans). Reproduce the temporal activity patterns from the 2017 paper as a baseline comparison.

9. Federated Learning for Multi-Platform Toxicity Detection

Core paper: Chennoufi et al., "PROTEAN: Federated Intrusion Detection in Non-IID Environments" (ESORICS 2025)
Related work: McMahan et al., "Communication-Efficient Learning of Deep Networks from Decentralized Data" (FedAvg, ICML 2017)

Partition the Jigsaw dataset into silos representing different platforms (Reddit-style, Twitter-style, forum-style splits by comment length and vocabulary). Use the Flower library to train a federated toxicity classifier under FedAvg, then implement prototype-sharing from PROTEAN and measure whether it improves detection on silos with highly imbalanced or out-of-distribution toxic content.

10. Characterizing the Echo Chamber Effect

Core paper: Cinelli et al., "The Echo Chamber Effect on Social Media" (PNAS 2021)
Related work: Zannettou et al., "The Web of False Information: Rumors, Fake News, Hoaxes and Clickbait" (TWEB 2019)

Extract URLs shared in two ideologically opposed subreddits over a 6-month window. Categorize linked domains into news, social, and fringe tiers. Quantify the overlap and isolation of information sources, reproducing the echo-chamber operationalization from Cinelli et al. The extension runs the same analysis on a third "neutral" subreddit and tests whether it bridges or mirrors one of the two camps.

11. Reproducing Cyberaggression Detection Metrics

Core paper: Chatzakou et al., "Who Let the Trolls Out? Towards Understanding and Detecting Problematic Users on Twitter" (WebSci 2019)
Related work: Davidson et al., "Automated Hate Speech Detection and the Problem of Offensive Language" (ICWSM 2017)

Implement linguistic and structural features — tweet frequency, mention patterns, account age, retweet ratio — alongside text-only features (TF-IDF, sentiment). Train classifiers under both feature sets and reproduce the finding on whether user-behavioral features outperform text-only features. The extension evaluates whether this advantage holds on the Davidson et al. hate speech dataset with a different label taxonomy.

12. Adversarial Attacks on Toxicity Classifiers

Core paper: Hosseini et al., "Deceiving Google's Perspective API Built for Detecting Toxic Comments" (arXiv 2017)
Related work: Grondahl et al., "All You Need is 'Love': Evading Hate Speech Detection" (AISec 2018)

Apply a battery of adversarial perturbations to Perspective API and a fine-tuned BERT model: inserting low-toxicity words, substituting characters with homoglyphs or leet-speak, and adding innocuous suffixes. Measure the drop in recall for each attack type. The extension tests whether adversarially augmenting the training set with a small fraction of perturbed examples restores robustness.

13. When LLMs Label Hate: Comparing Automated and Human Annotations

Core paper: Founta et al., "Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior" (ICWSM 2018)
Related work: Davidson et al., "Automated Hate Speech Detection and the Problem of Offensive Language" (ICWSM 2017)
Dataset: Twitter Abusive Behavior Dataset — 100K tweets with crowdsourced labels (normal, spam, hateful, abusive)

The Founta et al. dataset provides 100K tweets labeled by multiple crowdworkers, making it ideal for studying annotation disagreement. Students (a) reproduce the paper's inter-annotator agreement analysis, (b) prompt an LLM (GPT-4o-mini or Llama-3 via Ollama) to label the same tweets using the identical taxonomy, and (c) systematically compare where LLM labels diverge from the human majority label — broken down by content category, ambiguity level, and linguistic framing. Do LLMs align more with human consensus or with minority annotator views? Where do they consistently diverge, and what does that reveal about the implicit definition of "hate speech" encoded in each labeling source?