✦ For everyone, free.

Practical knowledge for real and everyday life

Home

25.7 Variable Selection

Variable Selection identifies key variables shaping information flow and system dynamics in Cybernetic Communication Theory.

Variable selection in cybernetic communication methodology is the process of identifying which quantities, states, and conditions to represent in a system model — deciding what to name, measure, and track as the components whose change over time constitutes the system's dynamic behavior. Variable selection is fundamental to the analytical power and fidelity of any system model: the variables a model includes determine what the model can represent, what dynamics it can reveal, and what questions it can answer. Variables that are omitted leave corresponding dynamics invisible; variables that are poorly defined or measured introduce distortions that propagate through all subsequent analysis. The choice of which variables to include is therefore not a pre-analytical preliminary step but a substantive analytical act with major consequences for what the model reveals about how the system works.

Properties of Well-Selected Variables

Variables in cybernetic communication models should meet several criteria that make them useful for representing system dynamics:

Quantitative continuity: Variables should represent quantities that can take a range of values and can increase or decrease over time, rather than binary states or categorical labels. "User trust" works as a variable because it can be higher or lower, can increase or decrease; "user type = active/passive" does not work well because it is a category that must be decomposed into underlying continuous quantities to represent dynamics. This does not require that variables be precisely measurable in practice — qualitative models can use variables that are conceptually continuous even when data collection methods are imprecise — but the variable must be conceptually scalable for causal loop notation to apply.

Theoretical meaningfulness: Variables should correspond to theoretically meaningful concepts that play genuine roles in the dynamics being modeled, rather than being operational proxies that may or may not track the underlying concepts of interest. Choosing "click-through rate" as a variable may be operationally convenient because it is directly measurable, but if "content relevance to user needs" is the theoretically meaningful variable and click-through rate is only a fallible proxy for it, the model may reveal how click-through rate dynamics work while obscuring how content relevance dynamics work — which may be quite different.

Level of aggregation: Variables should represent the right level of aggregation for the questions being asked. "Total platform engagement" is a highly aggregated variable that hides the distribution of engagement across user groups, content types, and time periods — distributions that may be essential to understanding the dynamics of interest. More disaggregated variables (engagement of high-following vs. low-following accounts; engagement with information content vs. entertainment content) may better represent the dynamics that matter, at the cost of model complexity.

Quantitative continuity Can increase or decrease; not just binary or categorical Theoretical meaningfulness Tracks concepts, not just convenient proxies Appropriate aggregation Neither too coarse nor too fine for the question Causal centrality Part of important feedback loops in the system

Variable Selection and What Gets Left Out

Every model omits variables that are present in the real system but treated as outside the scope of the model. What is omitted matters as much as what is included, because omission determines what dynamics the model cannot represent and what questions it cannot answer.

A model of content recommendation dynamics that includes engagement metrics and algorithmic ranking scores but omits user wellbeing and user satisfaction treats the relationship between engagement and wellbeing as if it were given rather than as a dynamic to be analyzed — implicitly treating high engagement as equivalent to high wellbeing. If engagement and wellbeing diverge systematically (which research suggests they often do), the model misrepresents the system's effects on the people it serves. Including wellbeing as a variable — even one that is harder to measure than engagement — produces a model that can represent the engagement-wellbeing divergence and analyze how system design choices affect it.

The political dimension of variable selection concerns whose interests are represented in the model. Models that include metrics measuring outcomes important to platform operators (engagement, revenue, retention) but exclude metrics measuring outcomes important to users (wellbeing, information quality, autonomy) or to society (democratic discourse quality, public health information distribution) are biased toward representing the system from the operator's perspective. Variable selection that systematically excludes measures of harm to affected communities produces models that cannot analyze those harms — and therefore cannot support design interventions or governance decisions aimed at reducing them.

Operationalization: From Variable to Measurement

Variable selection involves two related but distinct tasks: conceptual variable selection (deciding what concepts should be represented in the model) and operationalization (deciding how to measure or observe each selected variable). These tasks can come apart in important ways.

Conceptual variables like "user autonomy," "content quality," "community trust," or "information diversity" are theoretically important but resist simple operationalization. The difficulty of measurement does not imply that these variables should be excluded from models — their exclusion would mean that the dynamics they participate in could not be analyzed — but it does mean that the operationalization choices made when moving from concept to measurement require careful attention and honest acknowledgment of what the measurement captures and what it misses.

When research and system design use only easily measurable variables, the result is not methodological rigor but methodological bias toward what is easy to measure at the expense of what is important. Cybernetic communication analysis that takes its normative obligations seriously must resist this bias by selecting variables that are conceptually central to the questions that matter — including equity, wellbeing, and autonomy — and then working on the harder problem of how to measure them adequately, rather than substituting convenient proxies and treating the substitution as unproblematic.

Common Variable Selection Errors

Several common errors in variable selection impair the quality of cybernetic communication models:

Omitting stocks in favor of flows: modeling rates of change (new user acquisition, churn) without including the accumulated stocks they fill and drain (total user base) produces models that misrepresent the relationship between short-run rates and long-run levels.

Confusing levels with rates: treating engagement (a rate) as a stock-like concept, or treating reputation (a stock) as if it responds instantaneously to rate changes, produces models with incorrect dynamic behavior.

Including highly correlated variables separately: when two variables move together so consistently that they carry the same information, including both adds complexity without adding analytical clarity and can produce the appearance of independent dynamics where there are none.

Aggregating across important heterogeneity: when user groups, content types, or platform contexts differ in ways that are analytically important, collapsing them into a single aggregate variable hides dynamics that matter for the questions being analyzed.