One Person Many Roles: Designing a Unified Person Dimension in Google BigQuery

    Mark Rittman
    Jan 26, 2026
    Data Engineering
    BigQuery
    Modern Data Stack
    Analytics Engineering
    Looker

    When building our internal data warehouse, we quickly noticed that people and companies sit at the centre of most business processes and that the same person or company often plays multiple roles over time.

    A consultant might be a staff member one day, a project lead the next and a contract signatory when closing a deal. Marketing contacts become clients, clients become referral partners and candidates sometimes become employees.

    While these roles share common attributes such as name, title and contact details, others are role-specific. It is tempting to model each role separately with dimensions for staff, contacts or candidates but this quickly leads to silos, duplication and the inevitable question of which table holds the correct email address.

    Instead, we modelled people in BigQuery using a single unified person dimension that captures every role a person plays across the business.

    The problem with role-specific tables

    Most analytics warehouses grow organically. You start with a CRM and build a contacts dimension. HR adds employee data so you create an employees dimension. Recruiting wants candidates, finance wants signatories and before long you are maintaining several tables that all store the same basic fields.

    This fragmentation causes predictable problems:

    • Duplicate records with conflicting data
    • Identity confusion across systems
    • Complex queries that require unions instead of joins
    • Higher maintenance as changes cascade across models
    • An incomplete view of how a person relates to the business over time

    Most importantly, it breaks the relationship history. When a prospect becomes a client, their marketing journey disappears. When a contractor becomes an employee, their project history is left behind.

    A unified approach using BigQuery STRUCTs

    We use Google BigQuery as our data warehousing managed service and instead created a single persons_dim table that accommodates every role a person could play in our business. To do this in an elegant and BigQuery-optimized way we used a design approach that:

    Uses boolean flags for common filters. Queries like WHERE is_staff = TRUE are cleaner and faster than WHERE staff_attributes IS NOT NULL. The flags also work in Looker filters without requiring users to understand STRUCT syntax.

    Stores current roles as an array. The current_roles field enables queries like WHERE 'PARTNER_CONTACT' IN UNNEST(current_roles) while also supporting aggregations on multi-role individuals.

    Groups related attributes in STRUCTs. Rather than 20 columns prefixed with staff_, nest them in staff_attributes. This keeps the schema organised and makes it obvious which fields belong together.

    CREATE TABLE `analytics.persons_dim` (
    -- Core identity (common to all persons)
    person_pk STRING NOT NULL,
    person_name STRING,
    person_phone STRING,
    all_emails ARRAY<STRING>,
    person_linkedin_url STRING,

    -- Role flags for simple filtering
    is_staff BOOL DEFAULT FALSE,
    is_contractor BOOL DEFAULT FALSE,
    is_client_contact BOOL DEFAULT FALSE,
    is_partner_contact BOOL DEFAULT FALSE,
    is_prospect BOOL DEFAULT FALSE,
    is_candidate BOOL DEFAULT FALSE,

    -- Current roles array for multi-role queries
    current_roles ARRAY<STRING>,
    primary_role STRING,

    -- Staff-specific attributes (populated when is_staff = TRUE)
    staff_attributes STRUCT<
    job_title STRING,
    team_name STRING,
    employment_start_date DATE,
    weekly_capacity INT64,
    default_hourly_rate INT64,
    cost_rate INT64,
    manager_person_fk STRING
    >,

    -- CRM/Marketing attributes
    crm_attributes STRUCT<
    lifecycle_stage STRING,
    lead_status STRING,
    lead_source STRING,
    conversion_event_name STRING
    >,

    -- Candidate attributes (populated when is_candidate = TRUE)
    candidate_attributes STRUCT<
    resume STRING,
    current_application_stage STRING,
    is_referred BOOL
    >,

    -- Partner attributes
    partner_attributes STRUCT<
    partner_company_fk STRING,
    is_primary_contact BOOL,
    has_signatory_authority BOOL
    >
    );

    Uses bridge tables for contextual relationships. A person’s role on a specific project or deal isn’t an attribute of the person — it’s a relationship. Bridge tables model this correctly and support temporal tracking of when assignments started and ended.

    Maintains a primary role. When a person has multiple roles, which one matters most? We calculate primary_role based on a precedence hierarchy (staff > contractor > partner > client > prospect), giving reports a sensible default.

    This type of table schema lets you use boolean flags, for example WHERE is_staff = TRUE to return all persons who are staff members and for more complex analysis you can refer to the STRUCT fields for role-specific attributes. A person with multiple roles simply has multiple flags set to TRUE, with their respective STRUCT fields populated.

    How the person dimension joins to fact tables

    With a unified person dimension, every fact table that involves people can join through a single, consistent foreign key.

    This consistency transforms cross-functional analysis. Instead of building complex unions or maintaining role-specific join logic, every query follows the same pattern.

    Bridge tables for contextual roles

    Some roles however require more context recorded for them in-order for them to be useful in making decisions; a person isn’t just “a project team member” in the abstract; they’re a team member on specific projects with specific allocations. We handle this with bridge tables that link the person to the relevant context.

    These bridge tables let us track the same person across multiple projects or deals with different roles on each. A senior consultant might lead one project while providing support on another — the bridge captures both relationships without duplicating their core identity.

    A unified person view for more effective messaging

    Looker, through its tight integration with Google BigQuery, makes it easy to import this unified person dimension table schema into your LookML project and connect it, either directly or through its bridge tables, to the activities that person has participated in across your business operations.

    The look below, for example, is filtered to only show people where has_multiple_roles = Yes and sorted by role count descending, so the most interconnected relationships appear first.

    This helps us spot people like:

    • Partners who are former staff, and who would know our culture and working style intimately
    • Clients who became referral sources, our strongest advocates worth nurturing
    • Prospects who are also community members, who are already engaged with our content and forums such as the dbt Labs Slack
    • Candidates who are also prospects, who have a dual interest in working with/for us

    and would be something we’d make use of before any outreach, in-order to personalise our messaging and avoid treating warm contacts like cold leads.

    Altogether: a foundation for people analytics

    A unified person dimension such as this one transforms what’s possible with your analytics. Instead of “how many clients do we have?” you can ask “how many clients started as inbound leads versus referrals from existing partners?” Instead of “who’s on this project?” you can ask “which team members have worked with this client before, in any capacity?”

    The investment in consolidating person data pays dividends across every function as sales gets complete contact history, delivery sees full project context and finance can track the same individual from prospect through to paid invoice. Building on this foundation, you can layer predictive models that identify which prospects are most likely to convert based on engagement patterns across all touchpoints.

    If your warehouse has scattered person data across multiple dimensions, consider as we have whether a unified approach might simplify your architecture and unlock new analytical possibilities.

    Interested? Find out more!

    Rittman Analytics is a boutique data analytics consultancy that helps ambitious, digital-native businesses scale-up their approach to data, analytics and generative AI.

    We’re authorised delivery partners for Google Cloud along with dbt Labs, Cube, Dagster, Embeddable, SnowCatCloud and Fivetran and are experts at helping you design an analytics solution that’s right for your organisation’s needs, use-cases and budget.

    If you’re looking for some help and assistance with your data or AI initiative or would just like to talk shop and share ideas and thoughts on what’s going on in your organisation and the wider data analytics world, contact us now to organise a 100%-free, no-obligation call — we’d love to hear from you!

    Share:

    Recommended Posts

    One Person Many Roles: Designing a Unified Person Dimension in Google BigQuery

    One Person Many Roles: Designing a Unified Person Dimension in Google BigQuery

    Jan 26, 2026
    Data Engineering
    BigQuery
    +3
    Adventures in Aggregate Awareness (and Level-Specific Measures) with Looker

    Adventures in Aggregate Awareness (and Level-Specific Measures) with Looker

    Aug 25, 2025
    Looker
    Data Engineering
    +3
    IQR-Based Website Event Anomaly Detection using Looker and Google BigQuery &mdash; Rittman Analytics

    IQR-Based Website Event Anomaly Detection using Looker and Google BigQuery — Rittman Analytics

    Jul 28, 2025
    Analytics Engineering
    BigQuery
    +3