Overview

This site is generated by a daily Python pipeline that pulls analytics from Matomo Cloud (cefic.matomo.cloud, site ID 5), stores them in a local DuckDB database, and renders a set of static HTML dashboards. The pipeline is scheduled via cron at 06:00 CET every day (run_matomo_cron.sh), then pushed to GitHub and deployed through Netlify. Runs are idempotent: campaign daily metrics are appended incrementally, while period-snapshot tables (fact_*_period) are truncated and re-inserted on every run.

Product views

Each main dashboard fits on one screen and answers one business question.

Legacy drill-down pages (Dashboard, Campaigns, Audiences, Trends, Source/Medium, Chord, Monthly Ranking) remain available for deeper exploration on specific slices.

Data sources — Matomo endpoints

Endpoints called during each daily pipeline run, tables they populate, and the views that consume them.

EndpointData pulledTables populatedUsed by
Referrers.getCampaigns Per-campaign daily metrics (visits, page views, bounce, conversions, avg time) fact_campaign_daily Overview, Campaigns, Campaign Intel
MarketingCampaignsReporting.getName (expanded) mtm_content sub-campaigns, daily fact_campaign_content_daily Overview (campaign expand)
MarketingCampaignsReporting.getSourceMedium Per-campaign source / medium breakdown in-memory only Campaign Intel (channel diversity score)
Actions.getPageUrls (flat) Page-level daily stats + page-type classification fact_page_daily Overview, Content
VisitFrequency.get Daily new vs returning totals fact_visit_frequency_daily Overview, Audience
Referrers.getReferrerType Channel rollup (search, direct, referral, campaign, social, AI) fact_channel_daily Traffic
Referrers.getKeywords Top organic search keywords fact_keyword_period Traffic
Referrers.getWebsites Top referring websites fact_website_period Traffic
Referrers.getSocials Traffic from social networks fact_social_period Traffic
UserCountry.getCountry Visits by country fact_country_daily Traffic (map + list)
UserCountry.getCity Visits by city fact_city_period Traffic
DevicesDetection.getType Desktop / smartphone / tablet split fact_device_daily Audience
DevicesDetection.getBrowsers Browser breakdown with engagement fact_browser_daily Audience
Actions.getEntryPageUrls Top landing pages with entry bounce fact_entry_page_period Audience
Actions.getExitPageUrls Top exit pages with exit rate fact_exit_page_period Audience
Actions.getSiteSearchKeywords Internal search terms, hits, exit rate fact_site_search_period Audience
Events.getCategory Event categories (CTAs, downloads, etc.) fact_event_category_daily Audience
Events.getName Individual event names (used for downloads) fact_event_name_period Content (downloads list)
MediaAnalytics.get Video plays, impressions, play rate dim_media_summary Audience (video play rate)

Marts (SQL views)

Views derived from fact tables, computed on demand when the site is built.

Page taxonomy

How every URL gets classified into a single page_type.

The canonical taxonomy is defined in docs/cefic_site_structure.md in the repository — it is the single source of truth for PAGE_TYPE_RULES, PAGE_TYPE_LABELS and PAGE_TYPE_ORDER, consumed by both ingest_matomo.py (to tag each page when the daily feed is written) and build_site.py (to label and order the charts). The 13 canonical types are: news, policy, guidance, case_studies, events, science, industry_data, sectors, highlights, resources, about, home, other.

Known limitations

What is not tracked — and why.

Glossary & methodology

New vs. Returning visitors

Key points often misunderstood:

Matomo vs. GA4: Matomo classifies new/returning based on the visitor's full global history (independent of the selected period). GA4 uses events detected within the selected period (first_visit / first_open), which can lead to different numbers between the two tools.

Refresh schedule

The full pipeline (ingest → marts → site → git push) runs every day at 06:00 CET via run_matomo_cron.sh. Logs are written to logs/ and re-runs are idempotent: campaign daily rows are upserted incrementally, and period-snapshot tables are fully truncated and re-inserted. If ingestion fails partway through, a re-run simply picks up from MAX(date) + 1 for daily data and rebuilds the period tables from scratch.