Methodology & Notes – Cefic Matomo Analysis

Definitions

Key metrics and how they are computed in this dashboard.

New vs Returning visitors

New Visitor: a visitor coming to the site for the very first time.
Returning Visitor: a visitor who has visited the site before and is coming back.

Important (common sources of confusion):

The distinction does not depend on the selected date range — it depends on the visitor's global history.
Recognition relies on cookies / User ID. If cookies are deleted or blocked, a returning visitor may be counted as new.
Recognition is per device / browser (unless User ID is enabled).

Matomo vs GA4 — do not compare directly: In Matomo, the New / Returning distinction relies on the visitor's global history (independent of the analysed period). In GA4, the classification depends on events detected within the selected period (first_visit / first_open), which can lead to interpretation differences between the two tools.

How this dashboard computes it: via Matomo's VisitFrequency.get API, which returns nb_visits_new and nb_visits_returning per day based on the visitor's cookie history. When a date filter is applied, new and returning visits are summed across daily rows, preserving Matomo's global-history definition.

Visits, Unique Visitors, Pageviews

Visit (session): a series of consecutive actions by the same visitor, ended after 30 minutes of inactivity (Matomo default).
Unique Visitor: distinct visitor identified by cookies/fingerprint within the analysed period. Aggregating across days overcounts — use cautiously.
Pageview: one page load recorded by the tracker. Does not deduplicate reloads of the same URL within a visit.

Bounce rate, Avg. visit duration

Bounce rate: percentage of visits with a single pageview. A high bounce is not inherently bad — for a "read-one-article" news page it's normal.
Avg. visit duration: average time between the first and last recorded action of a visit. Bounced visits contribute 0 seconds (no second action to measure from).

Conversions

Conversion: a visit that triggered a configured Goal (e.g. download, event registration, video play).
Attribution model: last non-direct touch — the campaign of the converting visit receives the conversion. Multi-touch attribution is not available.

Page type classification

Pages on cefic.org are grouped into categories (News, Policy, Guidance, Case Studies, Events, Science, Industry Data, Sectors, Highlights, Resources, About, Home, Other) by URL-prefix matching rules defined in the ingestion script. Pages that don't match any prefix are labelled “Other”.

Matomo Cloud API constraints

Limitation	Impact	Status
Segment pre-processing required	Some Matomo Cloud plans require custom segments to be created in the Segment Editor before the API returns data for them. Initially `visitorType==returning` was not available via `VisitsSummary.get`.	Resolved — switched to `VisitFrequency.get` which returns both new & returning in a single call without segments.
API rate limits & chunking	Matomo Cloud enforces rate limits on API calls. Requesting large date ranges in a single call may fail or return partial data.	Mitigated — the pipeline splits requests into 30-day chunks with automatic retry.
Period-snapshot tables overwrite	Tables ending in `_period` (keywords, websites, socials, exit pages, etc.) are fully refreshed on each run. Historical period-level data is not preserved — only the latest snapshot exists.	By design — these tables hold current-state summaries, not time series.

Data coverage gaps

Gap	Detail	Workaround
Data starts 2025-05-19	The Matomo tracking tag was installed on cefic.org on that date. No analytics data exists before this point.	Year-over-year comparisons will only be available from May 2026 onwards.
Page-type classification is rule-based	Pages are classified (News, Policy, Guidance, etc.) based on URL prefix matching. Pages that don't match any known prefix are labelled “Other”.	New sections on cefic.org require adding rules in `ingest_matomo.py`. Currently ~27% of pageviews fall into “Other”.
Campaign audience parsing	Audiences (members, staff, anyone, non_members) are inferred by parsing the campaign name string. Non-standard naming produces “undefined”.	Standardise UTM campaign names following the pattern `topic_audience_variant`.
Conversion attribution	Conversions are attributed to the campaign of the visit session. No multi-touch or cross-session attribution is available.	This is a Matomo-level limitation (last-touch model). Consider Matomo's Multi Attribution plugin for advanced needs.

Dashboard & display limitations

Limitation	Detail
One-screen layout	Main dashboards are designed for desktop screens (1280×800 minimum). On smaller screens or mobile devices, some panels may overlap or require scrolling.
Date filter scope	The date filter applies to daily-granularity data. Period-snapshot tables (keywords, websites, socials, exit pages, page performance) always show the full-period view regardless of the selected date range.
Plotly chart rendering	Charts require the Plotly.js library (~4.5 MB). A CDN is used with a local fallback. In fully offline/air-gapped environments, ensure `plotly.min.js` is present in `site/assets/`.
Static generation	All data is embedded at build time. There is no live connection to Matomo — dashboards reflect the state at last pipeline run (daily 06:00 CET).

Pipeline & infrastructure

Item	Detail
Single-machine pipeline	The pipeline runs on a single server via cron. There is no HA, no retry scheduler, and no alerting beyond log inspection. If the machine is down at 06:00, that day's update is skipped (caught up automatically the next day).
DuckDB single-writer	DuckDB allows only one writer at a time. Running the pipeline concurrently (e.g. manual + cron) may cause lock errors.
GitHub → Azure deployment	The pipeline pushes to GitHub, which triggers Azure Static Web Apps deployment. If the GitHub push succeeds but Azure deployment fails, the live site will be stale until the next successful deployment.