Program

The 1st Workshop on "Large Language Models for Cross-Temporal Research" 10th October 2025, co-located at COLM 2025

Program Schedule

📺 YouTube Replay: Watch the recorded session here

Room: 520C

The workshop will take place in the hybrid format (online and in-person). The time slots below are in Montreal local time (UTC-4).

  • 9:00 - 9:10 Opening
    • Welcome remarks
  • 9:10 - 9:50 Keynote 1: Jose Camacho Collados (Cardiff University, UK)
    • Title: Temporal Awareness in NLP: The Case of Social Media and Specialized Language Models
    • Abstract: Despite the large amount of information generated in social media platforms, understanding what is going on is not an easy task, even after the significant progress in NLP and LLMs in recent years. In particular, social media relies on immediacy, which requires automatic tools to monitor and analyse data in real time. Moreover, given the large amount of data generated, efficient and specialised solutions are often necessary. In this talk, I will focus on research that addresses the temporal and dynamic nature of social media, with evaluation studies involving language models and their application in popular NLP tasks such as sentiment analysis, hate speech detection or topic classification. For the last few years, I’ve been interested in understanding the effect of temporal shifts between the language models and specific applications. To this end, I’ve been trying to answer some of the following questions: Do we require up-to- date language models for successfully solving NLP tasks? Shall we annotate recent data for our task, or “old” training data may be enough? Finally, while the main focus is on social media, most of the topics discussed can be relevant to tasks from other domains.
  • 9:50 - 10:30 Keynote 2: Alexis Huet (Huawei, France)
    • Title: Episodic memories generation and evaluation benchmark for LLMs: can LLMs remember what actually happened?
    • Abstract: Episodic memory – the ability to recall specific events grounded in time and space – is a cornerstone of human cognition, enabling not only coherent storytelling, but also planning and decision-making. Despite their remarkable capabilities, LLMs lack a robust mechanism for episodic memory: we argue that integrating episodic memory capabilities into LLM is essential for advancing AI towards human-like cognition, increasing their potential to reason consistently and ground their output in real-world episodic events, hence avoiding confabulations. To address this challenge, we introduce a comprehensive framework to model and evaluate LLM episodic memory capabilities. Drawing inspiration from cognitive science, we develop a structured approach to represent episodic events, encapsulating temporal and spatial contexts, involved entities, and detailed descriptions. We synthesize a unique episodic memory benchmark and release open source code and datasets to assess LLM performance across various recall and episodic reasoning tasks. Finally, we evaluate how well current LLMs handle three types of memory: in- context, RAG, and parametric, highlighting progress from this year and the challenges that remain.
  • 10:30 - 11:00 coffee break
  • 11:00 - 12:20 Oral Talks (4 presentations):
    • TiMoE: Time-Aware Mixture of Language Experts
    • LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models
    • Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning
    • Around the World in 24 Hours Probing LLM Knowledge of Time and Place
  • 12:20 - 13:45 Lunch break
  • 13:45 - 14:25 Keynote 3: Bahare Fatemi (Google Research, Canada)
    • Title: Evaluating Temporal Reasoning in Large Language Models
    • Abstract: Large Language Models (LLMs) struggle with complex temporal reasoning. Standard benchmarks are often inadequate because LLMs may have seen the data before. To truly assess their reasoning skills, we developed novel synthetic datasets specifically designed to test temporal logic in diverse, unseen scenarios. Our work systematically investigates the impact of problem structure on performance, providing clear diagnostic insights into the current limitations of LLMs.
  • 14:25 - 15:05 Keynote 4: Ali Emami (Emory University, USA)
    • Title: LLMs: the Object or the Lens?
    • Abstract: LLMs are often treated as the object of study: we benchmark their reasoning, measure their biases, and probe their internal mechanisms. Yet these systems are trained on the totality of our linguistic record: our histories, our beliefs, and our contradictions. In this talk, I argue that LLMs should be seen not only as products of culture, but as instruments for studying it. By fine-tuning models on temporally or culturally specific corpora, we may be able to transform them into “time capsules” that reveal how norms, values, and worldviews evolve. Drawing on our recent work fine-tuning LLMs on seven decades of fiction to trace shifting social biases, I’ll show how this approach opens a new research frontier: using language models to study societal change itself. I’ll discuss how such models bridge qualitative depth and quantitative scale, the limits of traditional methods, and the challenges of interpreting machines that now mirror our collective past.
  • 15:05 - 15:45 Keynote 5: Vivek Gupta (Arizona State University, USA)
    • Title: Temporal Reasoning over Evolving Semi-Structured Data
    • Abstract: Temporal tabular information permeates the real world, from evolving Wikipedia infoboxes and quarterly ledgers to longitudinal clinical records, yet today’s strongest large language models (LLMs) struggle as soon as time in tables becomes a first-class concern. This talk examines why temporality in tables trip up LLMs and what recent advances are doing to fix it. I will synthesize empirical insights from recent benchmarks and probes that reveal three recurring failure modes: brittle grounding, counterfactual sensitivity, and retrieval/extraction misalignment. Building on these, I’ll highlight emerging solutions spanning adaptive prompting (SEAR), cross-structural training (CLEAR), symbolic pipelines that transform semi-structured tables into normalized relational schemas, and finally reinforcement-learning loops that verify and correct intermediate decisions. The talk concludes with open challenges and opportunities for hybrid neuro-symbolic and RL-driven approaches that treat evolving tables as timestamped relational systems, pointing toward a new generation of agents capable of reliable reasoning over dynamic, semi-structured data.
  • 15:45 - 16:15 coffee break
  • 16:15 - 17:25 Oral Talks (4 presentations)
    • Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
    • Fine-Tuned LLMs are “Time Capsules” for Tracking Societal Bias Through Books
    • Context is Key: A Benchmark for Forecasting with Essential Textual Information
    • Is Your LLM Outdated? A Deep Look at Temporal Generalization
  • 17:25 Wrap up and closing