Graduation Portfolio2025 — 2026
Idiap Research Institute · MedAI

Beyond theLast-Frame

Temporal deep learning for the automated grading of retinal inflammation in fluorescein angiography.

AuthorTymo van Rijn
Student no.1057297
HostIdiap — Martigny, CH
ProgrammeHBO-ICT · Software Eng.
Foreword

Reader'sGuide

A single, ordered path through the work — from clinical and technical background, through professional skills and process, into analysis, advice, design and realisation.

This is the graduation portfolio for the project Temporal Deep Learning for Automated Grading of Retinal Inflammation in Fluorescein Angiography, carried out at the Idiap Research Institute with the MedAI group.

The chapters are meant to be read as a whole story, not a loose pile of artifacts. Early chapters set context; Analysis explains what was investigated and why; Advice shows how recommendations were refined after feedback; Design and Realisation record how those choices were made concrete in software. Paths to notebooks, PDFs, scripts and diagrams are given explicitly — so you can open the evidence right next to the narrative.

How to use this interactive portfolio

01 — Scroll

The journey unfolds downward

Eleven chapters, one continuous narrative. The five graded competencies are marked in red on the chapter rail to the left.
02 — Expand

Distilled, with the full text one click away

Each chapter shows a focused narrative. A Read the full chapter text control reveals the complete report prose inline.
03 — Examiner mode

Grading criteria, mapped to evidence

Open the panel at the bottom-right for the official competency criteria — each pointed straight at the files that demonstrate it.
Abstract

Abstract

Internship work at Idiap (MedAI) on temporal modelling for fluorescein angiography — what was built, and what is honestly still open.
01Context

The group's pipeline treated each examination as a single static frame.

Clinically, diagnosis depends on how fluorescence evolves over time. This portfolio summarises internship work on giving the fluorescein-angiography pipeline a genuine temporal representation — rather than collapsing a minutes-long examination into one image.

02Work to date

A controlled, benchmark-driven path from raw frames to a temporal model.

On the public Aptos-style benchmark, labels and temporal signal were verified, timing was recovered from on-image text where metadata was missing, and an LDA-based Temporal Separability Benchmark compared binning configurations before any expensive GRU training.

Downstream experiments use a frozen RETFound-family backbone, temporal binning, and a two-layer GRU. A separate PCA study showed no benefit from compressing frozen embeddings before the recurrent core.

03Outcomes & status

A working pipeline and a defensible method — not a finished clinical product.

The narrative and artefacts here document analysis, advice, design and a working evaluation script. Reported metrics and recommendations reflect the state at the time of writing.

Implementation of full time-embedding tracks, HOJG-scale validation, and integration with production workflows remain for subsequent project phases — and are described honestly as such.

Problem & research questions

TheProblem

Fluorescein angiography is a dynamic, time-based procedure — yet the state-of-the-art pipelines treated every examination as a single, isolated still.

The Idiap Research Institute, in Martigny, Switzerland, is a non-profit specialising in artificial intelligence, machine learning and computer vision. Within it, the MedAI group develops algorithmic methods for medical imaging in close collaboration with clinical partners such as the Hôpital Ophtalmique Jules-Gonin in Lausanne.

Fluorescein angiography tracks blood flow and vascular leakage in the retina over several minutes. But the existing deep-learning pipelines relied on single-frame Vision Transformer backbones — discarding the temporal evolution of the dye, information that is clinically indispensable.

Problem statement

Current fluorescein angiography classification pipelines insufficiently utilise the temporal information present in FA examinations — limiting diagnostic performance, robustness and interpretability.

Five sub-questions

The chapters that follow are organised around answering these.

Q1

Does the available benchmark data — labels, class balance, frame-level dynamics, timing metadata — support temporal modelling, and what limitations follow from how time is recorded?

Q2

When metadata lacks reliable timing, can we recover usable elapsed time and decide which frames are safe to use?

Q3

How should we group frames in time — binning, aggregation — and can a lightweight probe thin out options before expensive training?

Q4

Does unsupervised dimensionality reduction (PCA) of frozen embeddings improve sequence classification, or should embeddings stay at backbone dimension?

Q5

Does model performance transfer from Aptos to the external hospital (HOJG) dataset?

The objective

Engineer a temporal data representation and implement a sequential deep-learning architecture able to process variable-length FA examinations — improving diagnostic performance, robustness and interpretability.

Clinical & technical context

Clinical &Technical

The minimum context needed to understand why time is not optional — what clinicians actually read, and what the host pipeline allows.
Uveitis & fluorescein angiography

Patterns defined by behaviour over time — not by any single moment.

An FA examination is closer to a short film than a photograph: a dye is injected and tracked as it moves through the retinal vasculature over ten to fifteen minutes. Clinicians separate hyperfluorescence patterns by how they behave.

Grows & blurs

Leakage

Dye escapes from vessels; the bright area tends to grow and blur as the sequence runs.
Fills & sharpens

Pooling

Dye collects in a defined space, filling a cavity — often with a sharper border than leakage.
Brightens

Staining

Tissue takes up dye and gets brighter — without the expanding, spreading pattern of leakage.
Stays stable

Window defect

Missing masking pigment reveals brighter choroidal signal early — geographically stable throughout.
Foundation models

A frozen RETFound-Green backbone turns each frame into meaning.

Training deep networks from scratch on specialised medical imagery overfits badly. The MedAI group instead leans on foundation models — RETFound and RETFound-Green, a compact ViT-Small pre-trained via a novel token-reconstruction task.

Passed through the backbone, each FA frame becomes a highly compressed feature vector encapsulating its medical semantics. The technical mandate of this project: sequence those vectors chronologically to represent the transit of the dye over time.

0
CFP images pre-trained on
0
Dimensions per frame vector
The software environment

MedNet and a shared GPU grid — the constraints behind every choice.

MedNet is the group's in-house, PyTorch-based deep-learning framework: a shared, standardised structure for training, evaluation and experiment outputs, so one researcher's work can be inspected and reproduced by another. The second constraint is the shared GPU grid — compute is finite, and a wasteful experiment is a cost to the whole group. This gate reappears in every later chapter.

The two datasets

One public benchmark, one private clinical set.

The project draws on two distinct datasets. The difference between them is not only clinical — it decides what each one is allowed to be used for.

DatasetClinical focusRole in this project
HOJGUveitis-related retinal inflammation — 543 patients, 1,042 eyes, ASUWOG grading.Main clinical target dataset for grading inflammatory signs.
3rd AptosBroader angiographic abnormalities and fluorescence patterns.Public dataset for broad FA analysis and temporal exploration.
Data governance

Why those two datasets cannot be handled the same way.

The public / private split was treated as a hard boundary, not an accident of availability.

Private — HOJG

Identifiable clinical patient data from the Hôpital Ophtalmique Jules-Gonin, governed by Swiss law and the GDPR. It cannot leave Idiap's infrastructure or be redistributed — so it is pointed to, never shipped.

Public — Aptos

A public benchmark — used for open exploration and anything that touches external tools or services, carrying none of the disclosure risk of patient imagery, and reproducible for a future paper.

0.00
Single-frame baseline — ROC-AUC

From the thesis of Roberto Pulvirenti, using only the last frame of each examination. A genuinely strong score.

This project did not start from a blank page. “Temporal” is not competing against a weak straw-man — it has to earn its place against an already-capable single-frame model. Any temporal approach must justify its added complexity and compute by improving on that 0.82, under the same constraints.

Graded competency

Professional SkillsManage & Control

Communication, collaboration and a structured, traceable, quality-oriented process — built inside a multidisciplinary research environment.
Communication

Adapting both content and form to the audience.

The project sat between technical researchers and clinical stakeholders. Bi-weekly calls with the Jules-Gonin hospital needed machine-learning ideas in plain, accessible language; discussions with medical-AI researchers needed precise methodological justification.

Asking for feedback after presentations exposed a clear lesson: being complete is not the same as creating the right emphasis. I began stating the central takeaway earlier and cutting detail from the first slides — visible by comparing an early weekly deck with a later one, and brought to a head in the institute-wide TAM talk.

Collaboration

Proactive alignment over working in isolation.

Rather than working alone, I regularly shared intermediate findings and discussed uncertainties before major implementation decisions. The clearest example: I had built an LSTM pipeline to process frame sequences — and treating the feedback on it as an opportunity, not a setback, became the model for how I worked.

Feedback received — André · weekly presentation

A GRU architecture would be more efficient than the LSTM for our requirements. I researched GRUs, validated the advantages of the switch, and refactored the pipeline — identifying the bottleneck, opening a dialogue, and proactively pivoting to a better approach.

Manage & Control

A structured, traceable, quality-oriented process.

Manage & Control here was not just a schedule — it was keeping the software and research process structured, traceable and reliable enough to support valid conclusions, on top of the company’s established standards.

Structure

A logbook as a control instrument

A daily logbook with weekly reflections for planning, monitoring and adjustment. Honestly: I once thought I could hold the structure in my head — I could not, and returned to written goals.
Development street

Branch-based GitLab workflow

New features, scripts and experiments developed in separate branches before integration — keeping main stable and linking results to specific code versions.
Quality assurance

Stepwise validation

Intermediate outputs checked for logical correctness before scaling to expensive grid runs — catching errors early and saving shared compute.
Flexibility

Replanning against evidence

Short planning cycles; when a direction produced weak results or another looked more promising, priorities were adjusted and documented.
Graded competency

Analysis

Not a straight-line success narrative — a controlled sequence of revisions in which weak assumptions were exposed, corrected, and converted into defensible downstream decisions.
The question behind the question

Was the last-frame shortcut clinically wrong — and could the data support a different choice?

The official goal was to improve automated grading of retinal inflammation. Underneath it sat a more basic issue: the existing pipeline treated each examination as a single static image — the last frame. Before writing any temporal model code, that assumption had to be tested.

A window defect appears early and stays the same size; leakage starts small and progressively blurs; staining accumulates brightness without expanding. A model that sees only the last frame is not missing some information — it is missing the entire basis of the clinical diagnosis for several pathology types. That is a structural flaw worth fixing.

Stakeholder & requirement analysis

Five stakeholders, translated into the quality-attribute gates every later chapter answers to.

StakeholderWhat they needQuality attribute
HOJG cliniciansDecisions they can trust and interpret.Clinical validity · interpretability
MedAI researchersBeat the 0.82 baseline; fit the existing stack.Performance · compatibility · reproducibility
Shared-grid usersExperiments that don't monopolise compute.Scalability · compute efficiency
Future researchersCode they can reuse and extend.Maintainability · reproducibility
Patients (HOJG)Their data handled lawfully and safely.Security · privacy
Data analysis

Does the data even support temporal modelling?

The 3rd Aptos dataset — 1,921 examinations across 16 metadata fields — was worked through systematically. Labels were informative, but class imbalance was non-trivial: a decision that drove class-weighted training and a commitment to per-class metrics rather than averages.

Comparing first and last frames within examinations confirmed genuine visual progression across the sequence. The temporal signal was there — the question became how to extract it.

0
FA examinations
0
Hyperfluorescence classes
The timestamp pivot

A preprocessing footnote that became a core quality gate.

Aptos has no structured per-frame timing, so elapsed time had to be reconstructed from timestamp text burned into each image. The first OCR approach was not robust enough — so the methods were benchmarked explicitly against 150 manually-checked frames.

EasyOCR57.3%

86 / 150 frames

Tesseract50.7%

76 / 150 frames

Gemini 2.5 Flash96%

144 / 150 frames

The six remaining failures turned out to be Indocyanine Green Angiography frames — not FA frames at all. On genuine FA frames, Gemini reached 144 / 144 — a perfect score — and the discovery doubled as a way to discard non-FA frames automatically.

0
Validated timestamps added to the dataset

A genuine enrichment of the Aptos dataset that the whole MedAI group can reuse.

The timeline showed huge variability — some examinations run up to 90 minutes, most at most 14 — and frames cluster near the beginning and end of an exam, sparse in the relative centre. With the when established, the next question was the what: were there distinguishable phases?

Phases in time — the LDA probe

Calibrated to the data, not copied from a textbook.

A one-dimensional Linear Discriminant Analysis fitted on frozen embeddings became the project’s reusable measurement instrument — read through three metrics: balanced accuracy, distribution overlap, and the Fisher score. It tested whether a clinical phase convention actually transferred to Aptos.

Early
Mid
Late
0 s47.5 s197.5 selapsed →
Paper-derived boundaries (47.5 s / 197.5 s) — heavily skewed phase counts; balanced accuracy 0.643.
Early
Mid
Late
0 s103 s518 selapsed →
Data-driven boundaries (103 s / 518 s) — balanced phases; balanced accuracy 0.694. Carried downstream.
LD1 histogram under paper-derived phase boundaries
Fig. 5LD1 cluster geometry under the paper-derived boundaries — Early and Mid sit on top of each other; only Late is clearly displaced.
Benchmarking before training

Thinning the option space cheaply, before expensive runs.

The Temporal Separability Benchmark used that same LDA probe to compare backbone families and frame-picking strategies — showing temporal structure was present, and that finer binning gave diminishing returns. RETFound-Green was kept: efficient, robust, and continuous with the group’s prior work.

A separate controlled study asked whether PCA on frozen embeddings would help. It did not improve macro-ROC-AUC or AP. Negative — but decision-relevant: it removed an attractive, non-beneficial branch and kept the implementation path simpler.

Operational feasibility & error analysis

Where the compute goes — and where the model goes wrong.

A question-driven MedNet analysis showed activation storage dominates parameter storage, that scaling is driven by batch size and temporal depth, and why out-of-memory errors persist even with gradient checkpointing. The bottleneck is not a large model — it is a per-frame backbone multiplying work in one step.

An interactive error-analysis viewer separated actionable mistakes from out-of-reach ones. The model struggled with very small pathologies — traced to FA frames being resized from 334×334 to 224×224, discarding exactly the subtle cues that matter. That finding became a piece of advice.

HOJG — scope & transfer

An honest boundary on what the internship could reach.

Running anything against HOJG required the codebase to be mirrored into a separate, access-controlled environment. The privacy safeguards that protect patients are exactly what made the data hard to reach — and that process outlasted the project window. Q5 remains open. The whole method was deliberately built on Aptos as a transferable, re-runnable benchmark so the MedAI team can carry out that external validation.

Weak assumptions, exposed and corrected — converted into more defensible downstream decisions.

Analysis — conclusions
Graded competency

Advice

Translating the analysis into recommendations that are actually actionable — gated against compute, infrastructure compatibility and interpretability.
What the customer needed

Three gates that constrained every recommendation.

The MedAI group needed a temporal pipeline that runs reproducibly on existing GPU infrastructure, integrates cleanly with the RETFound family already in use, and stays interpretable enough to discuss with clinical partners.

Gate 01

Compute cost & scalability

The cluster is shared — proposals that multiply training cost without a believable gain are not actionable.
Gate 02

Infrastructure compatibility

Replacing the backbone breaks continuity with the group's established pipeline — out of scope.
Gate 03

Interpretability

An opaque bump in a summary metric is weak evidence if it can't be connected to how clinicians reason.
The first attempt — and what was wrong

Stating criteria, then failing to apply them.

The first advisory document compared three options — but treated the existing binning-plus-LSTM approach as the natural reference point, and named the project criteria without ever using them in the comparison. Stating requirements and then dropping them weakens the whole exercise.

Feedback received — Oscar · week 5–6 supervision

The advisory was focused too narrowly on optimising binning parameters and had not sufficiently considered alternative temporal-encoding strategies from recent literature. The recommendation: broaden the scope before committing to a design direction. This produced a dedicated literature review and a revised advisory report.

Alternatives, assessed against the gates

Not “which paper sounds newest” — which strategy survives.

Strategy classCompute & scalabilityInterpretabilityVerdict
Temporal binningPasses — minimal preprocessing overhead.Moderate — abstracts away elapsed time.Track 1 — baseline
Explicit time-embeddingStrong — minimal added training cost.Best — aligns with a continuous, clinical notion of time.Track 2 — primary
Heavier temporal objectivesFails — substantial complexity, longer cycles.Fails — much harder to explain to clinical partners.Deferred
The recommendation

A two-track path — start small, end big.

Track 1 keeps temporal binning as a controlled baseline — it already satisfies the three gates and matches what was empirically exercised in depth. Track 2 implements a time-embedding strategy — the only option scoring acceptably on all three criteria at once.

The MedAI group’s feedback was unanimous — research is iterative, so Track 1 was picked first. Their other point: I had interpreted the interpretability gate as closeness to real-world practice, when it meant the model showing where it drew its information from — tools like Grad-CAM. A genuine lesson in confirming requirements rather than assuming them.

MedNet framework advice

Conservative by design — extend, don’t replace.

Two recommendations: cache frozen ViT features so the temporal component trains on stored embeddings rather than recomputing the backbone every epoch; and extend MedNet so temporal batches become an explicit contract rather than ad-hoc local modifications. Switching the whole stack to MONAI would imply re-validation and integration work — extending MedNet is the realistic path.

Dissemination & a concrete win

Advising beyond the MedAI group.

Presenting the work at the institute-wide TAM turned into advising other groups on their temporal problems — temporality, it turned out, is not unique to MedAI. The audience gave feedback too: one suggestion was to change how ViT tokens are pooled.

0.850.86

Test AUC, after changing ViT token pooling from average to max — a piece of TAM feedback, applied a week later.

Mitigating information loss

A trade-off, raised honestly with the pipeline owner.

The error analysis suggested that downscaling inputs loses subtle pathological cues. I raised this directly with Roberto, who owns the preprocessing pipeline. His reply confirmed the concern was real — but it is always a trade-off: gains in performance against losses in memory and inference latency. Weighed against the efficiency requirement, the idea was set aside in favour of higher-return work.

Graded competency

Design

Turning Analysis and Advice into architecture choices, explicit quality trade-offs, and a validation strategy — before any integration work.
Architecture

Extend the existing pipeline — don’t replace it.

The RETFound-Green backbone and the existing linear head were kept unchanged, with a temporal block inserted between them. The constraint was deliberate: if performance changes, it should be attributable to temporal modelling — not to a swapped backbone or classifier.

Variable-length FA examination
Unchanged · frozen

RETFound-Green backbone

Each frame is encoded into a 384-dimensional embedding.

New · the contribution

Temporal stack

Bin-then-select, then a two-layer GRU summarises the progression into a single vector.

Unchanged

Classification head

The existing linear head maps that vector to five HyperF-Type classes.

Exam-level prediction
Three blocks in sequence — the temporal stack is a drop-in layer, so any change in performance is attributable to temporal modelling alone.
Security & privacy by design

A strict boundary between public development and clinical use.

The core sequence logic and GRU model were built dataset-agnostic, so the public Aptos dataset could serve as a sandbox with no risk of leaking sensitive information. The external OCR service was applied only to public data — when the model is eventually run on HOJG it will use the hospital’s own metadata, so no patient information ever touches an outside server.

Design decisions & rationale

Trade-offs made explicit.

A recurrent model was preferred over a Transformer — sequence lengths after binning are short, and the recurrent option fit the available data and compute better. The design review specified an LSTM; implementation switched to a GRU without changing the tensor contract.

Feeding more frames helped — but full sequences can reach 200 frames. The resolution: keep the data-driven phases, pick four frames per phase, and feed an ordered sequence of twelve embeddings to the GRU.

Early
Mid
Late
12 embeddings
ordered sequence
GRU
one vector out
Bin-then-select — three data-driven phases, four frames each, twelve ordered embeddings into the GRU. Enough temporal fidelity to stay inside memory without an out-of-memory error.
How success would be measured

Aligned with evaluation from the start.

All configurations use the same fixed train / validation / test split, with a held-out test set of 211 examinations. The primary metrics match the baseline work — ROC-AUC, to validate ranking of disease severity across all thresholds, and F1, to ground the evaluation in clinical utility and penalise majority-class defaulting. Blindly borrowing metrics is a fast track to misleading results — so each was chosen for a reason.

Quality characteristics

Architecture in service of maintainable research software.

CharacteristicHow the design supports it
ReproducibilityFixed split file; backbone frozen; embeddings computed once and reused.
ScalabilityCost scales with phase count, not with raw frames per examination.
InterpretabilityBin structure and recurrent hidden states are easier to discuss than full fine-tuned backbones.
CompatibilityBackbone and head interfaces unchanged; the temporal block is a drop-in layer.
DebuggabilityA prototype script prints tensor shapes through the pipeline before integration.
Security & privacyPublic / private dataset segregation; external OCR isolated to public data.
Prototype & test strategy

Two layers, because two kinds of bug fail in different places.

A self-contained prototype validated end-to-end tensor flow and sequence reshaping — and caught an early reshaping mismatch before integration, reducing debugging cost later.

The test strategy splits in two: a model layer (test_vitgru) covering freeze / unfreeze policy and a full forward pass, and a data layer (test_seq_angioreport) covering frame-selection boundaries, the split parser, and variable-length padding — where a silent error would corrupt the GRU signal without raising an exception.

Feedback received — André · week 3–4 meeting

Mean-pooling frames within a bin risks averaging away exactly the temporal variation the model is meant to learn. The recommendation: use a single representative frame per bin instead, and evaluate both strategies systematically rather than assuming one is always better.

Graded competency

Realisation

Not just working code — an implementation that follows the design, connects to existing systems, and can be reused, inspected, tested and evaluated.
From script to pipeline

Working code is not automatically good professional software.

First — exploratory

run_downstream.py

Proved the temporal idea end-to-end: precomputed embeddings arranged as short sequences, a GRU, classification from the final hidden state. Fast to modify — but configuration, data prep, model and training were mixed in one file.

Final — integrated

MedNet temporal pipeline

A full image-to-prediction pipeline inside the MedNet project structure: image folders grouped by exam, sequences loaded and preprocessed, a ViT backbone per frame, a GRU, and class probabilities for the whole examination.

The implemented system

Responsibilities separated across the project structure.

The final pipeline is reusable because each concern has a home — so a future experiment can change one layer without disturbing the rest.

Data

Grouping & sequence construction

Temporal data code groups frames by exam; the raw loader builds sequences; a custom collate function handles variable-length batching.
Model

The ViT-GRU model

The image-to-sequence-to-classification logic itself, with transforms and hyperparameters lifted into configuration files.
Runner

Training & auditable output

The experiment runner handles training, prediction and evaluation — every run produces a folder of settings, checkpoints, logs and plots.
Existing frameworks, not reinvented

Consistent with the Idiap codebase.

PyTorchtensors, padding, packing, the GRU, loss
Torchvisionpreprocessing & augmentation
Timmthe ViT backbone
MedNettraining & evaluation framework
Testing & evaluation readiness

Validated before any expensive training job.

The two-layer unit-test strategy designed in the previous phase was implemented as planned, both modules living in the MedNet library. They confirm the model-level and data-level components behave correctly in isolation — and continue to serve as a regression guard during refactoring. Structural validation traces the complete flow, from split JSON to evaluation output.

Feedback received — André · week 5–6 meeting

Use a GRU instead of the specified LSTM. It achieves equivalent representational power with fewer parameters and lower computational cost — relevant when the sequence after binning is short and experiments run on a shared cluster. The change was incorporated before implementation.

Results & status

Results& Status

A status snapshot, told honestly — the portfolio documents a working method and pipeline, and is candid about what is not yet finished.
The numbers as they stand

Reported metrics reflect the state at the time of writing.

Single-frame baseline0.82

ROC-AUC — last frame only (Pulvirenti)

Temporal model — max-pooled tokens0.86

Test AUC, after the average → max pooling change

The single-frame baseline set the bar at 0.82. Changing ViT token pooling from average to max lifted the temporal model’s test AUC from 0.85 to 0.86. These are real, measured results — but they are a snapshot, not a finished clinical claim, and should be read as such.

Delivered vs. open

What this internship produced — and what it deliberately leaves to the next phase.

Delivered
  • A complete analysis, advice and design chain, gated against explicit quality attributes.
  • A working, MedNet-integrated temporal evaluation pipeline.
  • A two-layer automated unit-test suite in the MedNet library.
  • 32,844 validated frame timestamps, reusable by the whole MedAI group.
  • A reproducible LDA phase-calibration probe and Temporal Separability Benchmark.
Open for subsequent phases
  • Full implementation of the explicit time-embedding track (Advice — Track 2).
  • HOJG-scale external validation — research question Q5.
  • Integration with production clinical workflows.
  • Grad-CAM-style interpretability, as clarified by the supervisors.

The honest framing is the point. The artefacts here document analysis, advice, design and a working evaluation script — they do not claim a finished clinical product. Reporting the state accurately, and naming what remains, is itself part of the professional contribution.

Portfolio map

PortfolioMap

Every notebook, document, script, figure and slide deck referenced across the narrative — gathered in one place, one click from opening.
18 artefacts

Analysis

Notebook

Aptos data exploration

Systematic check of labels, class balance and frame-level temporal change.

Notebook

Data exploration (supporting)

Supporting exploratory notebook for the benchmark data.

PDF

OCR benchmark analysis

EasyOCR vs Tesseract vs Gemini 2.5 Flash on 150 hand-checked frames.

PDF

Timeline distribution analysis

Where stamped frames sit on the FA timeline; sampling density per exam.

PDF

Timestamp metadata quality

Exam-duration variability and timestamp reliability after extraction.

PDF

Probing temporal separability (TSB)

The LDA-probe benchmark ranking backbones and binning strategies.

PDF

TSB study proposal

The benchmark plan sent to André for sign-off before running it.

PDF

PCA embedding / GRU study

Controlled study: does PCA on frozen embeddings help? (Decision-relevant no.)

PDF

MedNet framework analysis

Where GPU memory goes; why OOM persists even with gradient checkpointing.

PDF

Literature — temporal encoding

Literature review of temporal-encoding strategies (triggered by Oscar's feedback).

PDF

Studied concepts notebook

Handwritten/sketched notes building genuine understanding of the methods.

PDF

LSTM bin-count evaluation

Early evaluation of bin counts for the recurrent model.

Interactive

Model error analysis (interactive)

Interactive exam viewer — ground truth vs prediction, frame by frame.

Source

analyse_timestamps.py

Timestamp analysis script used after OCR extraction.

Figure

Figure 1 — Notion paper database

The literature trail: titles, dates, keywords kept organised in Notion.

Figure

Figure 2 — paper notes

Reading notes written in own words, unfamiliar terms pinned down.

Figure

Figure 3 — paper notes

A second example of the same dense-abstract note-taking habit.

Figure

Figure 5 — LD1 histogram (manual boundaries)

Cluster geometry under paper-derived phase boundaries — Early and Mid overlap.

Where something cannot be redistributed — for example the exact train / validation / test ID list used on the cluster — the appendices point to where the file lives on Idiap storage and what it contains, rather than shipping it.

Thank you fortaking the time.

This portfolio documents a real research contribution — built carefully, reviewed honestly, and made reusable for the team that continues it.

AuthorTymo van Rijn
Student no.1057297
HostIdiap — MedAI
Year2025 — 2026