Tokens as Technology: AI Inference, Labor Augmentation, and Long-Run Software Productivity Growth

Inference Economics

Develops a macroeconomic growth theory in which AI inference tokens enter the software production function as a Harrod-neutral labor-augmenting technology. Treating the AI inference frontier as an exogenous technology stock, we derive five formal results: token-augmented balanced growth path, token divide theorem, vibe coding transition, non-monotone labor share, and endogenous efficiency. A calibration implies per-developer software productivity growth of approximately 15–20 percent per year.

Author
Affiliation

Ibrahim Niankara

Al Ain University, College of Business; Brass Digital Lab, Abu Dhabi, UAE

Published

2 May 2026

Working Paper — This article is a working paper. Content reflects research in progress and has not yet undergone formal peer review.

Abstract

We develop a macroeconomic growth theory in which AI inference tokens enter the software production function as a Harrod-neutral labor-augmenting technology. Treating the AI inference frontier as an exogenous technology stock—in the tradition of the Solow–Swan model—we derive five formal results for the software sector. (i) Token-Augmented Balanced Growth Path: per-developer software productivity grows at rate g_s^* = \varphi(g_T - n), where g_T is the exogenous growth rate of the AI inference frontier and n is developer population growth; the BGP exists uniquely, with convergence speed \lambda = (1-\beta)[\delta_K + (1-\varphi)n + \varphi g_T]. (ii) Token Divide Theorem: early token adopters maintain a permanent productivity advantage \varphi g_T(t_j - t_i) over late adopters. (iii) Vibe Coding Transition: there exists a critical elasticity of substitution \sigma^* = 1 at which the human–token production relationship shifts from complementarity to substitutability, providing the formal microfoundation of the paradigm shift to natural-language goal specification. (iv) Non-Monotone Labor Share: within the software sector, factor shares trace an inverted-U as the elasticity of substitution evolves with AI capability. (v) Endogenous Efficiency: optimal investment in prompt engineering and context management generates a second engine of growth, raising the BGP rate to g_s^{**} = g_E^* + \varphi(g_T - n). A calibration using recent AI productivity evidence implies per-developer software productivity growth of approximately 15–20 percent per year under plausible parameter values.

JEL Classification: O41, O33, J24, L86, D24

Keywords: token production function, labor-augmenting technology, balanced growth path, AI inference, software productivity, vibe coding, factor shares, token divide, semi-endogenous growth


1. Introduction

The theory of economic growth has long been concerned with identifying the key factors that drive sustained increases in productivity. Solow (1956) demonstrated that capital accumulation alone cannot generate permanent growth in per-capita output and that sustained growth requires ongoing technological progress. Subsequent work by Romer (1990) and Jones (1995) endogenized technological progress, but the fundamental insight—that growth requires a continuously expanding frontier of technique—remains the bedrock of the field.

We argue that AI inference tokens constitute a new and quantitatively important instance of this mechanism. Tokens are the atomic computational units purchased by developers from frontier model providers (OpenAI, Anthropic, Google DeepMind, and others) or consumed from self-hosted models; they represent the medium through which large language models deliver their productive services. A developer who buys tokens buys effective AI assistance: code completions, debugging, architecture suggestions, automated testing, documentation, and increasingly, autonomous agentic execution of entire software workflows.

The central theoretical claim of this paper is that tokens are best understood as Harrod-neutral (labor-augmenting) technology in the software production function. A developer who consumes T tokens per period is effectively a more productive developer: the same human effort, augmented by AI assistance, produces more software output. This is precisely the structure of labor-augmenting technical progress in the Solow model, with the AI inference frontier playing the role of the technology level A(t). Following the Solow–Swan convention, we treat the long-run growth rate of the AI inference frontier as exogenous—driven by the cumulative R&D investments and scale economies of frontier model providers operating outside the individual developer’s optimization problem. This assumption is analogous to the standard treatment of disembodied technical progress in neoclassical growth theory.

We make five main contributions. First, we characterize the Balanced Growth Path (BGP) of the token-augmented growth model, proving existence and uniqueness and deriving both the steady-state growth rate and the convergence speed as functions of model parameters. Second, we prove the Token Divide Theorem: differences in token adoption timing generate permanent, non-converging productivity gaps proportional to \varphi g_T(t_j - t_i). Third, we characterize the Vibe Coding Transition using a CES production function—an extension of the baseline model that allows the elasticity of substitution to vary—identifying \sigma^* = 1 as the threshold at which the human–token relationship shifts from complementarity to substitutability. Fourth, we derive a non-monotone labor share path within the software sector: rising during the assistance regime and falling in the automation regime. Fifth, we extend the model to endogenize token efficiency, showing that optimal investment in prompt engineering and retrieval systems constitutes a second independent engine of growth.

The paper connects to several strands of the literature. The formal apparatus follows the Solow–Swan–Romer growth tradition (Jones, 1995; Romer, 1990; Solow, 1956). The automation and task-based labor-market implications connect to Acemoglu & Restrepo (2018), Acemoglu & Restrepo (2019), Acemoglu & Restrepo (2020), and Zeira (1998). The factor share dynamics connect to Karabarbounis & Neiman (2014) and Acemoglu (2002). The macroeconomics of AI as a technology connects to Acemoglu (2024) and Brynjolfsson et al. (2021). The empirical motivation draws on Peng et al. (2023), Eloundou et al. (2024), and Brynjolfsson et al. (2019). Our contribution relative to Acemoglu (2024) is to embed the token analogy explicitly within the Solow BGP framework, derive level-path convergence (the Token Divide), and characterize the semi-endogenous efficiency channel absent from his analysis. Brynjolfsson et al. (2021) document the productivity J-curve for general-purpose technologies; our model provides a growth-theoretic rationale for the eventual productivity lift in the software sector once token intensity crosses the high-intensity threshold.

Section 3 presents six motivating stylized facts. Section 4 introduces the model. Section 5 derives the BGP and includes a quantitative calibration. Section 6 proves the Token Divide and derives the convergence rate. Section 7 develops the CES extension and Vibe Coding Transition. Section 8 derives factor shares and wage dynamics. Section 9 presents the endogenous efficiency extension. Section 10 discusses policy implications. Section 11 concludes. Proofs are in the Appendix.


2. Stylized Facts

Six stylized facts motivate our modeling choices and constrain key parameter values.

NoteStylized Fact 1: Token Supply Is Growing at Exponential Rates

Global AI platforms processed an estimated several trillion tokens per week by late 2024, with leading providers reporting year-over-year usage growth of 200–500 percent (AI, 2024b; OpenAI, 2024). This rate of growth far exceeds that of any conventional factor of production and motivates modeling the AI inference frontier as a stock that grows at a sustained exogenous rate g_T.

NoteStylized Fact 2: Token Prices Have Declined by 80–90 Percent in Two Years

Frontier model API prices per million tokens fell from $10–60 in 2023 to $0.50–3.00 by late 2024, driven by model efficiency improvements (e.g., speculative decoding, mixture-of-experts architectures), hardware improvements, and competitive pressure (AI, 2024a). This price decline is analogous to the historical decline in the price of computing capital documented by Nordhaus (2021) and motivates the analysis of token price effects on the BGP level path in Corollary 1.

NoteStylized Fact 3: Token Consumption Is Highly Productive

The first randomized controlled study of AI-assisted coding by Peng et al. (2023) documents that AI-assisted developers complete a standard JavaScript task 55.8 percent faster than unassisted counterparts. GitHub (2023) reports consistent productivity gains across enterprise deployments. Eloundou et al. (2024) estimate that large language models have the potential to affect over 80 percent of software development occupational tasks. These findings are consistent with a large and positive output elasticity of token consumption in the software production function.

NoteStylized Fact 4: The Production Paradigm Is Shifting Toward Orchestration

The emergence of “vibe coding”—natural language goal specification as the primary developer input, with the AI agent generating and iterating on code—represents a qualitative shift in the human–AI production relationship (Karpathy, 2025). Developers increasingly serve as system architects and goal specifiers rather than manual coders (GitHub, 2024), consistent with a rise in the elasticity of substitution between human labor and AI tokens over time.

NoteStylized Fact 5: AI Adoption Is Highly Unequal Across Developers and Geographies

OECD (2024) documents substantial inequality in frontier AI model access across income groups and geographies. Access to frontier model APIs, the ability to optimize token consumption through prompt engineering, and the skills to orchestrate agentic workflows are concentrated among high-income economies and high-skill workers. This motivates the Token Divide analysis in Section 6.

NoteStylized Fact 6: Developer Optimization of Token Consumption Is Pervasive

Firms actively minimize token usage through prompt compression, retrieval-augmented generation (RAG), caching, and fine-tuning smaller specialized models (Gartner, 2024; Lewis et al., 2020). This cost-minimizing behavior is consistent with agents facing a binding token budget constraint and motivates the endogenous efficiency extension in Section 9.


3. The Token-Augmented Production Model

3.1 Setup and Aggregation

We model a software-producing economy populated by a unit mass of symmetric developers. Individual developer i has human labor H_i and physical computing capital K_i, and accesses AI inference tokens from the exogenously growing frontier. Under symmetry, all developers make identical choices, so individual-level variables (S_i, H_i, K_i, T_i) equal their aggregate counterparts (S, H, K, T) after integration over the unit mass. We work with aggregate variables throughout and drop individual subscripts. All markets are competitive: developers are price-takers in output and factor markets, and factor payments equal marginal products.

Human labor grows exogenously at rate n: H(t) = H_0 e^{nt}.

3.2 Pre-AI Baseline

In the pre-AI baseline, software output is produced from human labor H and computing capital K: S = K^{\beta} \cdot H^{1-\beta}, \quad \beta \in (0,1). \tag{1} This is the standard Cobb-Douglas production function with constant returns to scale. Labor’s share of output is (1-\beta) and capital’s share is \beta. Without technological progress, per-developer output is constant in steady state.

3.3 The AI Inference Frontier and Token Augmentation

We introduce the AI inference frontier as an exogenous technology stock, in the Solow–Swan tradition.

Assumption A2 (Exogenous AI Inference Frontier). The AI inference frontier F(t) grows at a constant exogenous rate g_T > 0 determined by the R&D investments and scale economies of frontier model providers (OpenAI, Anthropic, Google DeepMind, and others) operating outside any individual developer’s or economy’s optimization problem: F(t) = F_0 \, e^{g_T t}. This treatment is exactly analogous to the Solow–Swan treatment of disembodied technical progress A(t) = A_0 e^{g_A t}. Abstracting from the endogenous determination of g_T is appropriate for our focus on developer-level and economy-level steady states conditional on the technology frontier.

Individual token access per developer \tau(t) = T(t)/H(t) is proportional to the frontier F(t) net of depreciation and developer-population dilution. Specifically, a developer who devotes investment share s_T of output to token expenditure at real price P_T(t) achieves: T(t) = \frac{s_T \cdot S(t)}{P_T(t) \cdot (\delta_T + g_T)}, \tag{2} where \delta_T is the rate at which token-generating capacity becomes obsolete (model depreciation, contract expiry). On the balanced growth path, T and S grow at the same rate, consistent with g_T = g_S only if P_T is falling at rate g_T - g_S; in the exogenous-frontier interpretation, the relevant parameter is the frontier growth rate g_T, which drives the effective token intensity \tau(t) = T(t)/H(t) through the mechanism described above.

Remark. The key distinction from a fully endogenous capital-accumulation model is that g_T is not determined by the developer’s saving rate s_T. Rather, g_T is the frontier’s growth rate; s_T determines the level of token intensity \tau^* on the BGP, but not its growth rate. This is precisely the Solow model structure: the capital savings rate determines the level of the BGP, but not the BGP growth rate (which is pinned by g_A).

Let T_i denote token access per developer i and define the token-augmented effective labor input: \tilde{H} \equiv A(T, H) \cdot H, \tag{3} where A(T, H) is the token augmentation factor. We adopt: A(T, H) = 1 + \alpha \cdot \left(\frac{T}{H}\right)^{\!\varphi}, \quad \alpha > 0,\; \varphi \in (0,1). \tag{4} Here \alpha scales the productivity effect of tokens and \varphi governs returns to token intensity \tau = T/H (Equation Equation 4). The restriction \varphi < 1 ensures diminishing returns: doubling tokens per developer less than doubles effective labor. The “+1” ensures A \geq 1: in the absence of tokens (T = 0), the developer operates at baseline human productivity, recovering Equation Equation 1.

3.4 The Token Production Function

Substituting augmented labor Equation 3 into the production function: \boxed{S = K^{\beta} \cdot H^{1-\beta} \cdot \left[1 + \alpha\!\left(\frac{T}{H}\right)^{\!\varphi}\right]^{1-\beta}} \tag{5} This is the Token Production Function. The term \bigl[1 + \alpha(T/H)^{\varphi}\bigr]^{1-\beta} is the token augmentation multiplier. As T/H \to 0, it approaches unity and we recover the pre-AI baseline. As T/H \to \infty, it grows without bound at a diminishing rate.

3.5 Capital Accumulation

Physical computing capital accumulates according to: \frac{dK}{dt} = s_K \cdot S - \delta_K \cdot K, \tag{6} where s_K is the investment rate and \delta_K is capital depreciation.


4. Balanced Growth Path

4.1 High Token Intensity Assumption and BGP Derivation

All formal results in this section are derived under the following approximation, which is binding in the long-run high-adoption regime.

Assumption A1 (High Token Intensity). The economy has entered the high-token-intensity regime: \alpha(T/H)^{\varphi} \gg 1. Under this condition, the augmentation factor simplifies to A(T,H) \approx \alpha(T/H)^{\varphi}, and the token production function becomes: S \approx K^{\beta} \cdot \alpha^{1-\beta} \cdot T^{\varphi(1-\beta)} \cdot H^{(1-\varphi)(1-\beta)}. Assumption A1 is appropriate for economies that have already passed the initial adoption threshold—for instance, organizations where all developers routinely use AI assistance tools. For economies near zero token adoption (low T/H), the transition dynamics differ from the BGP characterization below.

A balanced growth path (BGP) is a trajectory on which S, K, and T all grow at constant (possibly different) rates, with all ratios K/S, T/S, and T/H constant. Denote growth rates by g_S, g_K, g_H = n; the token frontier grows at the exogenous rate g_T (Assumption A2).

Under Assumption A1, taking growth rates of the production function: g_S = \beta g_K + (1-\beta)(1-\varphi)n + \varphi(1-\beta)g_T. \tag{7} From capital accumulation Equation 6, on the BGP g_K = g_S. Solving for g_S: g_S = n + \varphi(g_T - n). \tag{8} Per-developer output growth is therefore: \boxed{g_s^* \equiv g_S - n = \varphi \cdot (g_T - n).} \tag{9}

Proposition 1 (Balanced Growth Path). Under Assumptions A1 and A2, the token-augmented growth model admits a unique balanced growth path on which per-developer software output grows at: g_s^* = \varphi \cdot (g_T - n), where g_T > n is required for positive BGP growth. The BGP growth rate is: (i) increasing in the exogenous token frontier growth rate g_T; (ii) decreasing in developer population growth n; (iii) increasing in the token return parameter \varphi; (iv) independent of the capital share \beta (a consequence of Assumption A1). Uniqueness follows from the linearity of the BGP condition Equation 9 in g_s^*, given that g_T is exogenous (Assumption A2).

4.2 Steady-State Token Intensity and Convergence Speed

Define the token intensity ratio \tau = T/H. On the BGP, \tau is constant at: \tau^* = \frac{s_T \cdot s^*}{P_T^* \cdot (\delta_T + n)}, \tag{10} where s^* = S^*/H^* is steady-state per-developer output and P_T^* is the BGP token price. The steady-state intensity is increasing in the token investment share s_T and per-developer output s^*, and decreasing in the token price P_T^*, the depreciation rate \delta_T, and developer population growth n.

The convergence speed to the BGP is derived by linearizing the capital dynamics around the steady state. Define \hat{k}(t) = K(t)/[e^{(n + g_s^*)t} K^*] and let \varepsilon(t) = \ln\hat{k}(t). Linearizing \dot{\hat{k}} = s_K f(\hat{k}) - (\delta_K + n + g_s^*)\hat{k} around \hat{k} = 1: \frac{d\varepsilon}{dt} = -\lambda \cdot \varepsilon(t), \tag{11} where the convergence rate is: \boxed{\lambda = (1-\beta)\bigl[\delta_K + n + g_s^*\bigr] = (1-\beta)\bigl[\delta_K + (1-\varphi)n + \varphi g_T\bigr].} \tag{12} The half-life of deviations from the BGP is \ln(2)/\lambda. Convergence is faster in economies with lower capital shares \beta, higher depreciation \delta_K, faster frontier growth g_T, and larger \varphi.

4.3 Token Price and the BGP Level Path

Corollary 1 Corollary 1 (Token Price and BGP Level). Let P_T(t) = P_{T0} \, e^{-g_p t} decline at rate g_p > 0 (consistent with Stylized Fact 2: g_p \approx 0.80/2 = 0.40 per year). A lower real token price P_T raises steady-state token intensity \tau^* via Equation Equation 10: \partial \tau^*/\partial P_T < 0. This increases the BGP level of per-developer output s^* but does not affect the BGP growth rate g_s^*, which is determined solely by the frontier growth rate g_T and developer population growth n. Formally: the price-decline channel operates as a level effect on the BGP, shifting the productivity trajectory upward without changing its slope. The observed 80–90 percent decline in token prices since 2023 corresponds to a one-time upward shift in the BGP level path whose magnitude, under the calibration of Section 5.4, is approximately \Delta \ln s^* = (1-\beta)\varphi \cdot \Delta \ln \tau^* \approx 0.30.

Remark. The distinction between price level effects and growth rate effects on the BGP is important for policy. Public subsidies that permanently lower P_T produce a one-time but permanent upward shift in the productivity level path. Policy that accelerates the growth of the AI inference frontier (e.g., public R&D in AI infrastructure) raises the BGP growth rate g_s^* directly.

4.4 Quantitative Calibration

Table Table 1 presents an illustrative calibration of the model to recent AI productivity data. Parameter \beta is set to 0.33, the standard capital share. The token return parameter \varphi is calibrated from the Peng et al. (2023) controlled experiment: a 55.8 percent productivity gain implies a multiplier [1 + \alpha\tau^{\varphi}]^{1-\beta} = 1.558; at the current mean token intensity, this implies \alpha\tau^{\varphi} \approx 1.33, consistent with \varphi \approx 0.30 under plausible \alpha. The frontier growth rate g_T is estimated from the approximately 200–400 percent year-on-year growth in reported AI API usage during 2023–2024 (AI, 2024b; OpenAI, 2024); we use a conservative central estimate of g_T = 0.65. Developer population growth n is set to 0.025, consistent with Bureau of Labor Statistics projections for software developer employment. Token capital depreciation \delta_K = 0.05.

Table 1: Illustrative calibration of the token-augmented growth model. All rates are annual. “Historical software productivity growth” refers to total factor productivity estimates for the software sector from Brynjolfsson et al. (2019).
Parameter / Quantity Symbol Value
Calibrated parameters
Capital share \beta 0.33
Token return parameter \varphi 0.30
Token frontier growth (central) g_T 0.65
Developer labor force growth n 0.025
Capital depreciation \delta_K 0.05
Derived BGP quantities
BGP per-developer growth rate g_s^* 0.188 (18.8% p.a.)
Convergence speed \lambda 0.188
BGP half-life (years) \ln(2)/\lambda 3.7 years
Historical software TFP growth 0.0200.030
Sensitivity (g_T range: 0.50–0.80)
Low-g_T BGP growth g_s^*|_{g_T=0.50} 0.143 (14.3% p.a.)
High-g_T BGP growth g_s^*|_{g_T=0.80} 0.233 (23.3% p.a.)

The central estimate implies per-developer software productivity growing at approximately 19 percent per year on the BGP—compared to historical growth of 2–3 percent per year in the pre-AI era. The sensitivity analysis shows this finding is robust to a wide range of g_T assumptions. The short half-life of 3.7 years confirms the model’s prediction that developers converge quickly to the new BGP once they adopt tokens, consistent with the rapid deployment timelines observed empirically.


5. Convergence and the Token Divide

5.1 Transition Dynamics

Consider two developers (or economies) i and j with identical technology parameters (\alpha, \varphi, \beta), investment rates (s_T, s_K), and labor growth n, but different token adoption dates t_i < t_j. Define the deviation from the BGP level: \varepsilon_i(t) = \ln s_i(t) - \ln s_i^{*}(t). From the linearized dynamics Equation 11: \varepsilon_i(t) = \varepsilon_i(0) \cdot e^{-\lambda t}, with \lambda given by Equation 12. Both developers converge to their respective BGP level paths at the same speed \lambda, but they converge to different level paths because their token endowments at adoption differ.

5.2 The Token Divide Theorem

Proposition 2 (Token Divide Theorem). Two developers (economies) with identical technology and preferences but different token adoption dates t_i < t_j maintain a permanent productivity gap on the balanced growth path: \ln s_i^*(\infty) - \ln s_j^*(\infty) = \varphi \cdot g_T \cdot (t_j - t_i) > 0. The permanent gap: (i) does not shrink over time; (ii) is proportional to the adoption delay (t_j - t_i); (iii) is amplified by the token return parameter \varphi and the frontier growth rate g_T; (iv) implies that late-adopting economies face a permanent downward shift in their productivity trajectory, not merely a temporary disadvantage.

Remark. The permanent gap \varphi g_T(t_j - t_i) can be expressed in terms of the BGP growth rate g_s^* = \varphi(g_T - n) as: \varphi g_T(t_j - t_i) = \left(g_s^* + \varphi n\right)(t_j - t_i), so the gap exceeds g_s^*(t_j - t_i) by the term \varphi n(t_j - t_i), which reflects the compounding of the demographic dilution term.

Proposition 2 has strong policy implications. Unlike standard growth models where convergence implies all economies reach the same BGP level path eventually, the Token Divide Theorem implies that early movers in AI token adoption secure a permanent first-mover advantage. For a representative economy with \varphi = 0.30 and g_T = 0.65, a 3-year adoption delay generates a permanent productivity gap of 0.30 \times 0.65 \times 3 \approx 0.59, i.e., a 58 percent permanent level disadvantage.

5.3 Conditional Convergence and Testable Implications

Within the group of token adopters, the model predicts conditional convergence: developers with lower initial token intensity \tau(0) grow faster than those near the BGP, at rate \lambda proportional to their distance from steady state. This generates the testable regression: g_{s,i} = \sigma_0 - \lambda \cdot \ln s_{i,0} + \sigma_1 \ln\!\tau_{i,0} + \sigma_2 \ln s_{T,i} + \sum_{k \geq 3} \sigma_k X_{k,i} + \varepsilon_i, \tag{13} where X_{k,i} includes model capability access, developer human capital, and economy-level AI infrastructure investment. The coefficient -\lambda on the initial log productivity level is the conditional convergence coefficient, directly interpretable from Equation 12: its magnitude \lambda = (1-\beta)[\delta_K + (1-\varphi)n + \varphi g_T] is identifiable from joint estimation of the convergence regression and cross-developer productivity data. Separate identification of \varphi and g_T requires an additional moment condition linking token expenditure to observed productivity growth. Developer-level panel data linking API expenditure to software output per developer—increasingly available from enterprise productivity platforms—provides the natural data source.


6. Elasticity of Substitution and the Vibe Coding Transition

6.1 Framework Extension: From Augmented Cobb-Douglas to CES

The analysis in Section 4Section 6 is built on the augmented Cobb-Douglas production function Equation 5, which restricts the elasticity of substitution between human labor and token-augmented effort to unity in the high-intensity limit. This restriction is useful for deriving clean BGP results but prevents analysis of the substitutability dimension that is central to understanding how AI capability improvements change the nature of human–AI interaction.

In this section we extend the baseline model to a CES specification that allows the elasticity of substitution \sigma to vary. The CES nests the Cobb-Douglas as a special case (\sigma = 1) and allows both the complementarity regime (\sigma < 1) and the substitutability regime (\sigma > 1). The BGP results of Proposition 1 continue to hold in the CES framework when \sigma = 1; what changes is the distributional structure. Results from Section 8 onward are derived from the CES framework.

6.2 The CES Production Function

We adopt a CES specification for the human–token composite: X(H,T) = \left[\gamma H^{(\sigma-1)/\sigma} + (1-\gamma) T^{(\sigma-1)/\sigma}\right]^{\sigma/(\sigma-1)}, \tag{14} where \sigma \geq 0 is the elasticity of substitution between H and T, and \gamma \in (0,1) is the factor distribution parameter. Full output is: S = K^{\beta} \cdot X(H,T)^{1-\beta}. \tag{15} The marginal rate of technical substitution is: \mathrm{MRTS}_{HT} = \frac{\gamma}{1-\gamma} \cdot \left(\frac{T}{H}\right)^{1/\sigma}. \tag{16}

6.3 Complementarity, Neutrality, and Substitutability

Table 2: Elasticity of substitution regimes and their economic interpretation. The critical threshold \sigma^* = 1 follows from the standard CES cross-derivative property: \partial^2 S / \partial H\,\partial T \gtrless 0 as \sigma \lessgtr 1.
Regime Condition and Interpretation
\sigma \to 0 (Leontief) H and T are perfect complements: each token requires a corresponding human to be productive.
\sigma \in (0,1) Assistance regime: H and T are gross complements; lower token prices raise labor’s marginal product.
\sigma = 1 (Cobb-Douglas) Transition point: constant factor shares; tokens and labor are allocatively neutral.
\sigma > 1 Automation regime: H and T are gross substitutes; lower token prices reduce labor demand.
\sigma \to \infty Perfect substitutes: human labor is fully replaceable by tokens at a constant MRTS.

Proposition 3 (Vibe Coding Transition). Applying the standard CES cross-derivative property to the software production context, there exists a critical elasticity \sigma^* = 1 such that: (i) for \sigma < \sigma^* (assistance regime), token adoption raises the marginal product of human labor and wages; (ii) for \sigma > \sigma^* (automation regime), token adoption reduces labor demand and wages; (iii) the dynamic transition from \sigma < 1 to \sigma > 1, driven by rising agent autonomy \mathcal{A} and improved model capability q, constitutes the formal microfoundation of the shift to vibe coding as the dominant software production paradigm (Karpathy, 2025).

6.4 Dynamic Evolution of the Elasticity of Substitution

As agent autonomy \mathcal{A} rises and model quality q improves, the range of tasks delegable to tokens expands. We model the dynamic evolution of \sigma as:1 \sigma(\mathcal{A}, q) = \sigma_0 + \mu_{\mathcal{A}} \cdot \mathcal{A} + \mu_q \cdot q, \quad \mu_{\mathcal{A}}, \mu_q > 0. \tag{17} The economy begins in the assistance regime (\sigma_0 < 1). The transition date t^* is defined by \sigma(\mathcal{A}(t^*), q(t^*)) = 1 and exists uniquely under monotone increasing \mathcal{A} and q.


7. Factor Shares and Distributional Implications

7.1 Scope Qualification

The factor share analysis in this section pertains to the software sector specifically. The aggregate labor share has been declining since the 1980s—a trend documented by Karabarbounis & Neiman (2014) and attributed primarily to the decline in the relative price of capital equipment. Our model does not predict the aggregate labor share; it characterizes dynamics within the software production technology for developers using AI tokens. The inverted-U prediction of Proposition 4 is testable with software-sector data and is consistent with the aggregate trend provided that the software sector is currently in the early assistance regime (\sigma < 1), while the aggregate economy has already crossed \sigma = 1 in capital-intensive industries through decades of automation.

7.2 Labor Share Under Token Augmentation

Define the labor share of software output as \pi_H = w \cdot H / S, where under competitive markets w = \partial S / \partial H. From the CES production function Equation 14Equation 15: w = (1-\beta) \cdot \frac{S}{X} \cdot \gamma \cdot \left(\frac{X}{H}\right)^{1/\sigma}, \tag{18} and the labor share is: \pi_H = \frac{w \cdot H}{S} = (1-\beta) \cdot \gamma \cdot \left(\frac{H}{X}\right)^{(\sigma-1)/\sigma}. \tag{19} As token intensity T/H rises, H/X falls. The labor share falls if \sigma > 1 and rises if \sigma < 1, with constant factor shares at \sigma = 1.

Proposition 4 (Non-Monotone Labor Share in Software). Within the software sector, under the dynamic elasticity specification Equation 17, the labor share \pi_H follows a non-monotone path: (i) during the assistance regime (\sigma < 1), rising token intensity raises the marginal product of human labor, increasing \pi_H; (ii) at the transition (\sigma = 1), the labor share is momentarily stationary; (iii) in the automation regime (\sigma > 1), rising token intensity compresses \pi_H. The labor share traces an inverted-U. This prediction applies to the software sector and is consistent with the observed aggregate labor share decline when the software sector is currently near or below the transition point \sigma^* = 1.

7.3 Wage Dynamics

Even in the automation regime where the software-sector labor share falls, wages need not fall in absolute terms if output rises sufficiently. The wage level is: w(t) = (1-\beta) \cdot \gamma \cdot K(t)^{\beta} \cdot X(t)^{(1-\beta-1/\sigma)} \cdot H(t)^{(1/\sigma-1)}. \tag{20} Along the BGP, g_w = \beta g_K + (1-\beta-1/\sigma)g_X + (1/\sigma-1)n. Whether absolute wages rise depends on whether output growth dominates the labor-diluting effect of rising T/H.

7.4 Within-Economy Inequality

Beyond aggregate factor shares, the model generates predictions for within-economy developer inequality. High-skill developers (high \theta_i) are more complementary with tokens—their creative problem-solving, system architecture, and domain expertise are precisely the inputs that tokens currently augment rather than replace. This suggests a within-economy heterogeneity in the effective \sigma across developer types. Formally, consider two developer types: high-skill developers with \sigma_H < 1 (firmly in the assistance regime) and low-skill developers with \sigma_L closer to or above 1 (more substitutable). As token intensity rises: (a) high-skill developers experience rising labor share and wages; (b) low-skill developers face a falling labor share. The net effect is rising within-sector wage inequality as a function of token intensity. This heterogeneous-agent extension is consistent with the empirical finding in Eloundou et al. (2024) that occupational AI exposure is highest for mid-routine software development tasks. Formalizing the two-type model is a natural next step for empirical work.


8. Endogenous Token Efficiency

8.1 The Token Efficiency Investment Decision

In the baseline model, token productivity \alpha and return parameter \varphi are fixed. We now endogenize efficiency by allowing developers to invest in prompt engineering, context optimization, and retrieval systems that raise the effective productivity of each token consumed. Let E denote the accumulated stock of token efficiency knowledge, and R_E investment in efficiency improvement. Efficiency accumulates as: \frac{dE}{dt} = \zeta \cdot R_E - \delta_E \cdot E, \tag{21} where \zeta > 0 is the productivity of efficiency investment and \delta_E is knowledge depreciation. The augmented token augmentation factor becomes: A(T, H, E) = 1 + \alpha \cdot E \cdot \left(\frac{T}{H}\right)^{\!\varphi}, \tag{22} so that efficiency E multiplicatively scales the productivity of each token consumed (Equation Equation 22), creating a complementarity between token quantity and efficiency investment.

Remark. Whether the efficiency stock E is private (firm-specific, excludable) or a public good (non-rival, partially spillable across firms) determines the scope for underinvestment externalities. If E is a public good, the model generates a social return to efficiency investment that exceeds the private return, analogous to the knowledge externality in Romer (1990). Under competition, private investment in efficiency is below the social optimum, providing a rationale for industry consortia, open-source prompt libraries, and public support for AI safety research that improves model reliability (a form of efficiency investment). We treat E as a private asset in the baseline analysis; incorporating the public-good case is a natural extension.

8.2 The Semi-Endogenous Efficiency Path

Following Jones (1995), if the productivity of efficiency investment is constant (d\zeta/dt = 0), efficiency grows semi-endogenously at a rate determined by research effort. Let s_E be the fraction of output invested in efficiency improvement (R_E = s_E S). On the BGP, the accumulation equation Equation 21 implies: g_E^* = \zeta \cdot s_E \cdot s^* / E^* - \delta_E. \tag{23} where E^* is the BGP efficiency level. In the high-intensity limit of Assumption A1, A \approx \alpha E (T/H)^{\varphi}, so: g_A = g_E + \varphi(g_T - n). From the BGP condition g_S = g_A + n (identical derivation to Proposition 1 with E replacing the constant \alpha): g_S - n = g_A = g_E^* + \varphi(g_T - n). Therefore: \boxed{g_s^{**} = g_E^* + \varphi \cdot (g_T - n).} \tag{24}

Proposition 5 (Endogenous Efficiency Growth). In the model with endogenous token efficiency, the balanced growth path growth rate is: g_s^{**} = g_E^* + \varphi(g_T - n), where g_E^* = \zeta s_E s^* / E^* - \delta_E is the endogenous efficiency growth rate. This decomposes the BGP growth rate into two independent engines: (a) the token quantity engine \varphi(g_T - n), driven by the frontier growth rate; and (b) the token efficiency engine g_E^*, driven by investment in prompt engineering, retrieval optimization, and context management. Optimal investment in token efficiency s_E^* satisfies the modified golden rule: \zeta \cdot \text{MPE} = \delta_E + r, where \text{MPE} = \partial g_s^{**}/\partial g_E^* \cdot \partial g_E^*/\partial s_E = \zeta s^*/E^* and r is the discount rate. Economies that underinvest in efficiency—by neglecting prompt engineering, RAG infrastructure, or model fine-tuning—operate below their BGP potential even with high token expenditure. Notably, the efficiency engine contributes one-for-one to the BGP growth rate (\partial g_s^{**}/\partial g_E^* = 1, not \varphi), meaning the marginal return to efficiency investment exceeds the return to raw token quantity whenever \varphi < 1.

The full BGP growth rate Equation 24 exceeds the baseline Equation 9 by g_E^*: endogenous efficiency adds a second, independently valuable engine of productivity growth. Under the calibration of Table Table 1, with g_E^* = 0.04 (modest efficiency improvements of 4 percent per year), the BGP growth rate rises from 18.8 percent to 22.8 percent per year.


9. Policy Implications

9.1 AI Infrastructure as Growth Policy

Proposition 1 identifies frontier growth g_T as the primary driver of long-run software productivity growth. Public policy that accelerates the AI inference frontier—through subsidized compute access for research, government procurement of AI services, or direct public investment in AI infrastructure—raises g_T and thus g_s^*, permanently. This contrasts with token price subsidies (Corollary 1), which generate level effects only.

The Token Divide Theorem (Proposition 2) reinforces urgency around adoption timing: delays in AI adoption create permanent productivity disadvantages proportional to \varphi g_T(t_j - t_i). Under our calibration, a three-year delay in adoption creates a permanent gap of approximately 58 percent. This implies that under certainty about the continued productivity of token technology, early adoption dominates waiting strategies under any reasonable social discount rate. Under uncertainty about the path of g_T and \sigma—including the possibility that token productivity stabilizes or alternative paradigms emerge—a precautionary option-value argument may favor delay; analyzing this trade-off in a dynamic stochastic framework is a natural extension.

9.2 Managing the Vibe Coding Transition

Proposition 3 identifies \sigma = 1 as the critical juncture for labor market outcomes. Policy that manages the transition—investments in human capital complementary to AI (creative, managerial, architectural, and interpersonal skills that tokens cannot replicate), retraining programs for developers in highly substitutable roles, and portable social insurance for workers exposed to automation risk—can shift the distributional consequences of the vibe coding transition toward broad-based wage growth. The within-economy inequality result (Section 8) suggests that the distributional stakes are highest for routine software development occupations currently near \sigma^* = 1.

9.3 Token Efficiency Investment as R&D

Proposition 5 establishes that efficiency investment is a distinct and high-return growth driver: its contribution to the BGP growth rate is dollar-for-dollar (\partial g_s^{**}/\partial g_E^* = 1), compared to the discounted return to raw token quantity (\partial g_s^*/\partial g_T = \varphi < 1). National growth accounting frameworks should track token efficiency investment—prompt engineering, RAG infrastructure, model fine-tuning—as a component of the R&D stock, not merely as an operating cost. Firms and governments that treat AI as a procurement budget item rather than a research-and-learning investment leave the efficiency engine unrealized.


10. Conclusion

This paper develops a macroeconomic growth theory of AI inference tokens as labor-augmenting technology in the software sector. Five formal results characterize the token-augmented economy. The BGP result establishes g_s^* = \varphi(g_T - n), with convergence speed \lambda = (1-\beta)[\delta_K + (1-\varphi)n + \varphi g_T]. Under a plausible calibration, this implies per-developer software productivity growth of approximately 15–23 percent per year—an order of magnitude above the pre-AI trend. The Token Divide Theorem shows that a three-year adoption delay creates a permanent 58 percent productivity gap. The Vibe Coding Transition identifies \sigma^* = 1 as the dividing line between AI complementarity and substitutability. The non-monotone labor share result predicts an inverted-U path within the software sector. The endogenous efficiency extension shows that the efficiency engine—with a one-for-one contribution to BGP growth—is more valuable per unit investment than raw token quantity.

Several extensions are natural. A fully dynamic model with optimizing households choosing between consumption and token investment would characterize the intertemporal allocation of growth gains. A multi-sector model would allow token technology to diffuse at heterogeneous rates, generating structural reallocation effects. Incorporating uncertainty about g_T and \sigma trajectories would yield precautionary technology investment results and the option value of early adoption. A two-type heterogeneous-developer model—following the sketch in Section 8—would generate quantitative predictions for within-economy wage inequality. Each extension builds directly on the framework developed here.

The broader contribution of this paper is to place AI inference tokens firmly within the Solow–Swan growth tradition. Tokens are not merely a cost item on a developer’s balance sheet: they are a technology that augments labor, drives balanced growth, divides the productivity distribution, and raises every fundamental question of growth theory and labor economics that every transformative general-purpose technology has raised before. The sooner our analytical frameworks reflect this, the better positioned we will be to understand—and shape—the growth implications of the AI era.


Appendix: Proofs of Propositions

Proof of Proposition 1. Under Assumption A1 (\alpha(T/H)^{\varphi} \gg 1), A(T,H) \approx \alpha(T/H)^{\varphi} and the production function becomes: S = K^{\beta}\bigl[\alpha(T/H)^{\varphi} H\bigr]^{1-\beta} = K^{\beta}\,\alpha^{1-\beta}\,T^{\varphi(1-\beta)}\,H^{(1-\varphi)(1-\beta)}. Taking growth rates and using Assumption A2 (g_T exogenous): g_S = \beta g_K + \varphi(1-\beta) g_T + (1-\varphi)(1-\beta) n. On the BGP, g_K = g_S. Solving: (1-\beta)g_S = \varphi(1-\beta) g_T + (1-\varphi)(1-\beta) n \implies g_S = n + \varphi(g_T - n). Hence g_s^* = g_S - n = \varphi(g_T - n). Existence follows from the explicit formula; uniqueness from the linearity of the BGP condition in g_s^* given exogenous g_T.

For the convergence rate, define effective output \hat{s}(t) = s(t)/e^{g_s^* t} and effective capital per normalized labor \hat{k} = K/[e^{(n+g_s^*)t}]. The dynamics are \dot{\hat{k}} = s_K \hat{s}(\hat{k}) - (\delta_K + n + g_s^*)\hat{k}. With \hat{s}(\hat{k}) \propto \hat{k}^{\beta} (from the production function), linearizing around \hat{k}^*: \frac{d\ln\hat{k}}{dt} \approx -(1-\beta)(\delta_K + n + g_s^*)\ln(\hat{k}/\hat{k}^*) \implies \lambda = (1-\beta)[\delta_K + n + g_s^*]. Substituting g_s^* = \varphi(g_T - n) yields \lambda = (1-\beta)[\delta_K + (1-\varphi)n + \varphi g_T]. \square

Proof of Proposition 2. Let s_i(t) = s_i^*(t) + \varepsilon_i(t) where s_i^*(t) = C_i e^{g_s^* t} is the BGP level path for developer i and \varepsilon_i(t) = \varepsilon_i(0) e^{-\lambda t} \to 0. The BGP level constant C_i is determined by the initial conditions at adoption date t_i. Under Assumption A1, C_i \propto T_i(t_i)^{\varphi}. Under Assumption A2 with both developers otherwise identical, T_i(t_i) = F(t_i) = F_0 e^{g_T t_i}, so: \ln C_i - \ln C_j = \varphi[\ln T_i(t_i) - \ln T_j(t_j)] = \varphi \cdot g_T \cdot (t_i - t_j). The permanent productivity gap is therefore: \ln s_i^*(\infty) - \ln s_j^*(\infty) = \ln C_i - \ln C_j = \varphi \cdot g_T \cdot (t_j - t_i) > 0. This is strictly positive for t_j > t_i, grows with \varphi, g_T, and the adoption delay. The gap does not shrink over time because both level paths grow at the same rate g_s^*. \square

Proof of Proposition 3. From the CES production function, w = \partial S/\partial H and r_T = \partial S/\partial T are the competitive factor prices. The cross-partial \partial^2 S/\partial H\,\partial T = (1-\beta)(1 - 1/\sigma)(\cdots), which is positive when \sigma < 1 (tokens and labor are complements: more tokens raise the marginal product of labor), zero at \sigma = 1, and negative when \sigma > 1 (tokens and labor are substitutes: more tokens reduce the marginal product of labor). Labor demand H^* = \arg\max\{S - wH - r_T T\} follows the same sign pattern with respect to T. The transition date t^* satisfying \sigma(\mathcal{A}(t^*), q(t^*)) = 1 exists and is unique under the continuity and strict monotonicity of \sigma(\cdot). \square

Proof of Proposition 4. From Equation 19: \pi_H = (1-\beta)\gamma(H/X)^{(\sigma-1)/\sigma}. As token intensity T/H rises, H/X falls (more tokens dilute the human fraction of composite input). The sign of d\pi_H / d(T/H): \text{sign}\!\left[\frac{d\pi_H}{d(T/H)}\right] = \text{sign}\!\left[\frac{1-\sigma}{\sigma}\right], which is positive for \sigma < 1 and negative for \sigma > 1. Under the dynamic specification Equation 17, \sigma begins below 1, crosses 1 at t^*, and rises above 1 thereafter. The labor share \pi_H therefore rises for t < t^*, reaches a maximum near t^*, and falls for t > t^*: the inverted-U path. \square

Proof of Proposition 5. With A(T,H,E) = 1 + \alpha E(T/H)^{\varphi}, under Assumption A1, A \approx \alpha E(T/H)^{\varphi}. Taking logs and growth rates: g_A = g_E + \varphi(g_T - n). From the BGP derivation (identical to Proposition 1 with E absorbing into the constant), g_S = g_A + n, so: g_s^{**} = g_S - n = g_A = g_E^* + \varphi(g_T - n). The efficiency growth rate on the BGP satisfies g_E^* = \zeta s_E s^*/E^* - \delta_E from Equation 21. The first-order condition for optimal s_E equates the marginal present value of efficiency investment to its user cost: \zeta \cdot s^*/E^* = (\delta_E + r), giving s_E^* = (\delta_E + r) E^* / (\zeta s^*). The marginal return to efficiency growth (\partial g_s^{**}/\partial g_E^* = 1) exceeds the token quantity return (\partial g_s^*/\partial g_T = \varphi < 1), confirming the priority of efficiency investment. \square


References

Acemoglu, D. (2002). Directed technical change. Review of Economic Studies, 69(4), 781–809.
Acemoglu, D. (2024). The simple macroeconomics of AI. Economic Policy, 39(100), 1–29.
Acemoglu, D., & Restrepo, P. (2018). The race between man and machine: Implications of technology for growth, factor shares, and employment. American Economic Review, 108(6), 1488–1542.
Acemoglu, D., & Restrepo, P. (2019). Automation and new tasks: How technology displaces and reinstates labor. Journal of Economic Perspectives, 33(2), 3–30.
Acemoglu, D., & Restrepo, P. (2020). Robots and jobs: Evidence from US labor markets. Journal of Political Economy, 128(6), 2188–2244.
AI, E. (2024a). AI model API pricing trends, 2023–2024. https://epochai.org/data/llm-pricing
AI, E. (2024b). Trends in AI inference compute usage. https://epochai.org
Brynjolfsson, E., Rock, D., & Syverson, C. (2019). Artificial intelligence and the modern productivity paradox: A clash of expectations and statistics. In A. Agrawal, J. Gans, & A. Goldfarb (Eds.), The economics of artificial intelligence: An agenda. University of Chicago Press.
Brynjolfsson, E., Rock, D., & Syverson, C. (2021). The productivity j-curve: How intangibles complement general purpose technologies. American Economic Journal: Macroeconomics, 13(1), 333–372.
Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2024). GPTs are GPTs: Labor market impact potential of large language models. Science, 384(6702), 1306–1310.
Gartner. (2024). Hype cycle for artificial intelligence, 2024.
GitHub. (2023). Quantifying GitHub copilot’s impact in the enterprise.
GitHub. (2024). The state of the octoverse: AI and developer productivity.
Jones, C. I. (1995). R&d-based models of economic growth. Journal of Political Economy, 103(4), 759–784.
Karabarbounis, L., & Neiman, B. (2014). The global decline of the labor share. Quarterly Journal of Economics, 129(1), 61–103.
Karpathy, A. (2025). There’s a new kind of coding i call “vibe coding.” https://x.com/karpathy/status/1886192184808149094
Lewis, P., Perez, E., Piktus, A., & al., et. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (NeurIPS 2020), 34, 9459–9474.
Nordhaus, W. D. (2021). Are we approaching an economic singularity? Information technology and the future of economic growth. American Economic Journal: Macroeconomics, 13(1), 299–332.
OECD. (2024). OECD AI policy observatory: Measuring AI adoption and inequality across countries. OECD Publishing.
OpenAI. (2024). OpenAI usage and impact report, 2024.
Peng, S., Kalliamvakou, E., Cihon, P., & Demirer, M. (2023). The impact of AI on developer productivity: Evidence from GitHub copilot (No. 31019). National Bureau of Economic Research.
Romer, P. M. (1990). Endogenous technological change. Journal of Political Economy, 98(5), S71–S102.
Solow, R. M. (1956). A contribution to the theory of economic growth. Quarterly Journal of Economics, 70(1), 65–94.
Zeira, J. (1998). Workers, machines, and economic growth. Quarterly Journal of Economics, 113(4), 1091–1117.
Back to top

Footnotes

  1. This parametric specification is deliberately simple and is adopted for analytical tractability. It captures the essential feature of a monotone increasing path from the assistance to the automation regime. Qualitative results—existence of a transition date t^*, the inverted-U labor share path, and the policy implications—are robust to alternative specifications (e.g., logistic \sigma(\mathcal{A}) = \bar{\sigma}/(1 + e^{-\mu(\mathcal{A}-\mathcal{A}^*)})) provided \sigma(\cdot) is continuous and strictly increasing with a unique crossing of \sigma = 1.↩︎