The Inference Economy: Token Consumption, Software Productivity, and Jevons Paradox Dynamics in Artificial Intelligence

Inference Economics

Proposes and validates three novel economic relationships: the Token Production Function, the Token Kuznets Curve, and the Jevons Paradox of AI Tokens. Using a calibrated Monte Carlo simulation with 100,000 developer-year observations, the paper demonstrates that standard panel estimators recover the imposed parameters, providing a proof-of-concept framework for the economics of AI inference.

Author

Affiliation

Ibrahim Niankara

Al Ain University, College of Business; Brass Digital Lab, Abu Dhabi, UAE

Published

4 May 2026

Working Paper — This article is a working paper. Content reflects research in progress and has not yet undergone formal peer review.

Abstract

This paper proposes and validates, through a calibrated Monte Carlo simulation, three novel economic relationships constituting the foundations of what we call Inference Economics — the systematic economic analysis of AI inference capacity as a productive input and a market good. The first is a Token Production Function of Software, in which AI token consumption enters as a distinct factor of production alongside developer effort and tooling capital; we formalize a Cobb-Douglas specification and confirm via simulation that standard panel estimators recover the imposed output elasticity of \beta \approx 0.61. The second is a Token Kuznets Curve (TKC): token intensity per developer follows an inverted-U trajectory in AI ecosystem maturity, rising during early adoption and declining as token-efficient workflows emerge; a logistic adoption model with developer-cohort heterogeneity generates genuine cross-sectional identification, and the DGP-consistent turning point is M^{*} \approx 0.67. The third is a Jevons Paradox of AI Tokens: efficiency improvements increase rather than decrease aggregate token consumption through a demand-expansion channel; the simulation recovers an efficiency elasticity of \hat{\eta}_{E} \approx 1.11 > 1, satisfying the formal strong-paradox condition. All three relationships are established on a synthetic panel of 100,000 developer-year observations generated by a fully specified DGP; the estimation exercise constitutes a proof-of-concept demonstrating internal consistency and estimability, not real-world empirical evidence. External validity requires real developer-level token consumption data, and a concrete strategy for such validation is outlined. Together these contributions provide the first systematic theoretical framework for the economics of AI inference.

JEL Classification: O33, L86, D24, Q41.
Keywords: token production function, Token Kuznets Curve, Jevons Paradox, AI economics, inference economy, software productivity, panel data.

Data and code: All data are synthetically generated. Full R simulation and estimation code is provided in Appendix A and is fully reproducible with standard CRAN packages.

1. Introduction

The past decade has witnessed a fundamental shift in the economics of software production. Artificial intelligence, and large language models in particular, have introduced a new intermediate input — AI inference tokens — whose consumption mediates the relationship between developer effort and software output. The token is the atomic unit of AI computation: every query to a language model, every line of AI-generated code, every automated task in an agentic workflow consumes tokens at a price set by platform providers and declining exponentially as model efficiency and hardware capability improve.

Despite the economic significance of this new input market, no systematic theoretical or empirical framework exists for analyzing it. Existing production function literature treats IT capital as a broad input category (Bresnahan et al., 2002; Hitt & Brynjolfsson, 1996) without the granularity needed to study per-token pricing dynamics; the energy rebound literature (Gillingham et al., 2016; Sorrell, 2007) identifies the Jevons mechanism but has not been applied to AI inference; and the Environmental Kuznets Curve literature (Grossman & Krueger, 1995) has not been adapted to ecosystem-maturity dynamics in technology adoption. This paper addresses these gaps.

We propose three novel theoretical relationships governing the inference economy and validate their estimability through a calibrated simulation study. We are explicit from the outset about what this paper establishes and what it does not. The exercise is a proof-of-concept Monte Carlo validation: we specify a data-generating process (DGP) that encodes the hypothesized relationships, generate a large synthetic panel, and confirm that standard panel estimators recover the imposed parameters with good precision. This demonstrates that the theoretical framework is internally consistent and amenable to identification with real data once token-level consumption logs become available. It does not constitute real-world evidence for the three hypotheses, and we do not claim otherwise.

Our first contribution is the Token Production Function of Software. Modelling software output as a Cobb-Douglas function of developer effort H, token consumption T, and a composite data/tooling input D, we formalize three structural properties: positive marginal product of tokens, diminishing returns for \beta < 1, and the existence of an interior optimum. The simulation recovers \hat{\beta} \approx 0.61 — close to the DGP-imposed value of 0.60 — comparable in magnitude to classical capital-elasticity estimates and suggesting that AI tokens are a genuinely important factor of production whose measurement is economically consequential.

Our second contribution is the Token Kuznets Curve (TKC). We adapt the Environmental Kuznets Curve hypothesis (Grossman & Krueger, 1991, 1995) to AI ecosystem maturity: token intensity first rises as developers scale context windows and agentic complexity, then falls as token-efficient workflows mature. A critical advance in this paper over a preliminary specification is the operationalization of ecosystem maturity through a logistic adoption model with developer-cohort heterogeneity. This generates genuine cross-sectional variation in maturity across developers observed at the same calendar time, resolving the collinearity between maturity and time trends that would otherwise prevent identification.

Our third contribution formalizes a Jevons Paradox of AI Tokens. Following Jevons (1865), we establish the formal condition \eta_{E} > 1 for a strong paradox in which efficiency gains increase aggregate compute demand, and document that the simulation recovers \hat{\eta}_{E} \approx 1.11. A valid platform-based instrumental variable — exploiting pre-determined cross-sectional variation in API pricing across developer platforms — addresses the endogeneity of token consumption.

The paper proceeds as follows. Section 2 reviews the literature. Section 3 develops the theoretical framework. Section 4 describes the simulation design and identification strategy. Section 5 presents simulation results, robustness checks, and heterogeneous-effects analysis. Section 6 discusses policy implications. Section 7 concludes and outlines a real-data validation agenda.

2. Literature Review

2.1 Production Function Estimation and IT Productivity

The classical production function literature treats capital and labor as the primary inputs to output. Cobb & Douglas (1928) established the log-linear relationship between inputs and output that remains the workhorse specification in applied work. Olley & Pakes (1996) and Levinsohn & Petrin (2003) addressed simultaneity and selection bias inherent in production function estimation, and Ackerberg et al. (2015) resolved residual identification concerns through a refined control-function approach.

The productivity effects of information technology have been extensively studied. Hitt & Brynjolfsson (1996) documented firm-level productivity gains from IT investment; Bresnahan et al. (2002) identified complementarities between IT, organizational change, and human capital; and Brynjolfsson et al. (2019) analyze the AI productivity paradox, arguing that J-curve adoption delays compress near-term measured productivity gains. Most recently, Noy & Zhang (2023) document significant output and quality improvements from LLM-assisted writing in a randomized experiment, providing direct experimental evidence that AI assistance is a productive input. We extend this literature by modelling token consumption as a separately measurable factor of production with its own price and efficiency dynamics — a granularity that becomes feasible once platform-level API data are available.

2.2 The Environmental Kuznets Curve

The EKC hypothesis, formalized by Grossman & Krueger (1991), Grossman & Krueger (1995) and named after Kuznets (1955), posits that environmental degradation first worsens with development before eventually improving. The inverted-U relationship has been estimated for a variety of pollutants (Copeland & Taylor, 2004; Stern, 2004), with ongoing debate about turning-point robustness and the mechanisms driving the eventual decline. A critical feature of credible EKC identification is cross-sectional variation in the driving variable: income varies across countries at the same point in time, providing identification beyond a common time trend. Our TKC specification emulates this feature by constructing maturity at the developer level, exploiting cohort-based heterogeneity in adoption timing.

2.3 The Jevons Paradox and Energy Rebound Effects

Jevons (1865) observed that thermodynamic efficiency improvements in steam engines paradoxically increased coal consumption in Victorian Britain. This insight has been extensively studied in energy economics (Gillingham et al., 2016; Saunders, 2008; Sorrell, 2007; Sorrell & Dimitropoulos, 2008). In the AI token context, the direct rebound operates through a cost-per-output channel: as tokens become cheaper per unit of AI-assisted work, developers build more ambitious applications and process larger contexts. The indirect rebound operates through income effects as AI-driven productivity gains raise overall demand for AI services. We formalise both channels within a structural demand equation and establish the condition \eta_{E} > 1 as the dividing line between a partial and a strong paradox.

2.4 The Economics of Artificial Intelligence

Agrawal et al. (2018) characterise AI as a reduction in prediction costs, an input into virtually every economic activity. Brynjolfsson et al. (2019) analyse the AI productivity paradox. Autor et al. (2003) and Acemoglu & Restrepo (2020) examine labour-market consequences of automation. Eloundou et al. (2024) provide evidence on LLM occupational exposure. Jones (1995) and Romer (1990) model idea production economics; our token production function parallels their treatment of intermediate knowledge inputs. Nordhaus (2021) and Strubell et al. (2019) analyse compute scaling and the environmental implications of large model training.

3. Theoretical Framework

3.1 The Token Production Function of Software

Consider a representative software developer i at time t who produces software output S_{it} combining three inputs: human effort H_{it}, token consumption T_{it}, and a composite data/tooling input D_{it}. The Cobb-Douglas production technology is:

S_{it} \;=\; A_{it} \cdot H_{it}^{\,\alpha} \cdot T_{it}^{\,\beta} \cdot D_{it}^{\,\gamma} \cdot \exp(\varepsilon_{it}), \qquad (1)

where A_{it} is total factor productivity (TFP) and \alpha,\beta,\gamma>0 are output elasticities. Parameterising TFP as \ln A_{it} = \mu + \delta_{i} + \tau_{t}, with \delta_{i} a developer fixed effect and \tau_{t} a year effect, and taking logarithms yields:

\ln S_{it} \;=\; \alpha_{0} + \underbrace{\beta}_{\beta_{1}}\ln T_{it} + \underbrace{\alpha}_{\beta_{2}}\ln \mathrm{Exp}_{it} + \underbrace{\gamma}_{\beta_{3}}\mathrm{Agent}_{it} + \delta_{i} + \tau_{t} + \varepsilon_{it}. \qquad (2)

The three structural elasticities (\beta,\alpha,\gamma) in Equation (1) correspond one-to-one to the regression coefficients (\beta_{1},\beta_{2},\beta_{3}) in Equation (2). Developer experience \mathrm{Exp}_{it} proxies for human effort H_{it}; AI agent autonomy \mathrm{Agent}_{it} \in [0,1] captures the intensity of the data/tooling composite D_{it}, so \beta_{3} estimates the data/tooling elasticity \gamma. Developer fixed effects absorb the time-invariant component of D_{it} not captured by agent_autonomy (such as stable firm-level tooling infrastructure).

Proposition 1 (Returns to Tokens)

Under Equation (1) with \beta>0: (i) tokens exhibit positive marginal product, \partial S/\partial T = \beta(S/T)>0; (ii) returns are diminishing when \beta<1, since \partial^{2}S/\partial T^{2} = \beta(\beta-1)S/T^{2}<0; (iii) the simulation estimate \hat{\beta}_{1}\approx0.61<1 satisfies the diminishing-returns condition ex post, consistent with the DGP parameter \beta=0.60.

Hypothesis 1

H_{1}: \beta_{1}>0. Token consumption is a productive input to software output.

3.2 The Token Kuznets Curve

Let M_{it}\in[0,1] index developer i’s AI ecosystem maturity at time t, with M=0 representing nascent adoption and M=1 full maturity. Unlike a simple time trend, maturity is developer-specific: early-adopter developers reach any given maturity level earlier than late-adopter developers. Token intensity follows:

\tau(M) \;=\; \exp\!\bigl(\varphi_{0} + \varphi_{1}M + \varphi_{2}M^{2}\bigr), \qquad (3)

where \varphi_{1}>0 captures adoption-driven scaling and \varphi_{2}<0 captures efficiency-driven crowding-out. A TKC exists if and only if M^{*} = -\varphi_{1}/(2\varphi_{2})\in(0,1). The estimating equation is:

T_{it} \;=\; \alpha_{0} + \varphi_{1}M_{it} + \varphi_{2}M_{it}^{2} + X_{it}'\zeta + \delta_{i} + \tau_{t} + \varepsilon_{it}, \qquad (4)

where X_{it} includes developer-level controls and both developer and year fixed effects are included. Because M_{it} varies cross-sectionally (cohort heterogeneity) and non-linearly over time (logistic S-curve), year fixed effects do not absorb the maturity variation, resolving the collinearity problem that would arise if maturity were a linear rescaling of calendar time.

Proposition 2 (Token Kuznets Curve)

If \varphi_{1}>0 and \varphi_{2}<0, token intensity is single-peaked at M^{*}=-\varphi_{1}/(2\varphi_{2})\in(0,1), increasing for M<M^{*} and decreasing for M>M^{*}.

Hypothesis 2

H_{2}: \varphi_{1}>0 and \varphi_{2}<0. Token intensity follows an inverted-U pattern in AI ecosystem maturity, with a turning point in the interior of [0,1].

3.3 The Jevons Paradox of AI Tokens

Let P_{it} denote the token price facing developer i at time t and E_{t} a token efficiency index. The effective cost per unit of AI-assisted output is c_{it} = P_{it}/E_{t}. Cost-minimizing demand for tokens yields:

\ln T_{it} \;=\; \kappa + \eta_{P}\ln P_{it} + \eta_{E}\ln E_{t} + Z_{it}'\theta + \delta_{i} + \varepsilon_{it}. \qquad (5)

Standard demand theory predicts \eta_{P}<0. The Jevons Paradox requires \eta_{E}>0: falling effective costs induce expansion of token-intensive activity. A strong paradox requires \eta_{E}>1: aggregate token consumption rises faster than efficiency gains, so compute demand increases even in efficiency-adjusted terms.

Proposition 3 (Jevons Paradox of AI Tokens)

A strong Jevons Paradox obtains if and only if \eta_{E}>1. When \eta_{E}>1, a one-percent improvement in token efficiency increases aggregate token consumption by more than one percent, yielding a net increase in inference compute demand despite the efficiency gain.

Hypotheses 3a and 3b

H_{3a}: \eta_{E}>0 and \eta_{P}<0 (Jevons Paradox exists).
H_{3b}: \eta_{E}>1 (strong Jevons Paradox: aggregate compute demand rises with efficiency improvement).

4. Simulation Design and Identification Strategy

4.1 Dataset Construction

In the absence of publicly available developer-level token consumption data, we construct a synthetic panel calibrated to plausible structural parameters following the Monte Carlo validation methodology in the applied econometrics literature (Bajari et al., 2007; Berry et al., 1995). The goal is not to claim real-world evidence but to confirm that the theoretical framework is estimable: that standard panel methods recover the imposed parameters with good precision, providing a template for application to real data.

The baseline panel comprises N = 100{,}000 developer-year observations: 10,000 developers observed over 10 periods, yielding a balanced panel. Token price declines at approximately 15 percent per year, broadly consistent with documented reductions in API inference pricing across major AI platforms since 2022 (Artificial Analysis, 2024).

4.2 Revised Maturity Index: Logistic Adoption with Cohort Heterogeneity

The original linear maturity index M_{t} = t/\max(t) is a rescaling of calendar year, creating perfect collinearity between maturity, time trends, and year fixed effects. We replace it with a developer-specific logistic adoption model that generates genuine cross-sectional variation.

Each developer is assigned to an adoption cohort c_{i}\in\{1,2,3,4\} drawn uniformly, representing the year in which they began substantive AI tool usage. The adoption lag for developer i at time t is:

\ell_{it} \;=\; \max(0,\; t - c_{i}). \qquad (6)

Ecosystem maturity follows a logistic S-curve in adoption lag:

M_{it} \;=\; \frac{1}{1 + \exp\!\bigl(-\kappa\,(\ell_{it} - \ell_{0})\bigr)}, \qquad (7)

with steepness parameter \kappa = 1.0 and inflection point \ell_{0} = 3. This specification satisfies three desiderata absent from the linear index: (i) cross-sectional variation: developers in different cohorts have different maturity at the same calendar time; (ii) non-linearity: the S-curve generates acceleration during mid-adoption and saturation near M=1, consistent with technology diffusion theory (Bass, 1969); and (iii) compatibility with year fixed effects: because M_{it} depends on both t and c_{i}, it is not a function of t alone and is therefore not fully absorbed by year dummies.

Identification of the TKC with year fixed effects comes from within-year variation across developer cohorts: two developers observed in the same year but with different adoption start dates have different maturity levels, and this cross-sectional variation is independent of common time effects.

4.3 Platform Heterogeneity and Instrumental Variable Design

For the Jevons specification, endogeneity arises because high-productivity developers may simultaneously demand more tokens and produce more software output, biasing the OLS price elasticity toward zero. We address this with a platform-based instrument that exploits pre-determined cross-sectional variation in token prices.

Each developer is assigned to one of three AI API platforms (k \in \{A,B,C\}) at the start of the panel, with shares 50%, 30%, and 20% respectively. Platforms differ in their base pricing schedules, captured by a platform discount factor \delta_{k}: \delta_{A} = 1.00, \delta_{B} = 0.85, \delta_{C} = 0.70. The developer-platform-year-specific token price is:

P_{it} \;=\; P_{t}^{\text{base}} \cdot \delta_{\text{platform}(i)} \cdot \exp(\sigma_{it}),\quad \sigma_{it} \sim \mathcal{N}(0,0.05^{2}), \qquad (8)

where P_{t}^{\text{base}} = 0.02\cdot\exp(-0.15\cdot t) is the time-varying base price. The instrument is Z_{it} = \delta_{\text{platform}(i)} \cdot P_{t-1}^{\text{base}}, which interacts the pre-determined (time-invariant) platform discount with the lagged base price. This instrument satisfies:

Relevance: Platform discount shifts the effective token cost; a lower discount raises cost and reduces demand. The first-stage F-statistic substantially exceeds the Stock et al. (2002) threshold of 10 (reported in Section 5).
Exclusion: Platform assignment is pre-determined before the observation period and does not vary in response to within-period productivity shocks. It affects token consumption only through the cost channel, not directly through software output.

4.4 Panel Variables and Summary Statistics

Table Table 1 presents all panel variables, their definitions, and summary statistics. The revised design adds cohort, adoption lag, and platform variables. The maturity distribution is now approximately bell-shaped (reflecting the logistic S-curve) rather than uniform, with mean 0.52 and standard deviation 0.28.

Table 1: Panel variables and summary statistics

Variable	Description	Mean	SD	Role
`tokens`	Annual token consumption (millions)	80.4	43.1	All three specifications
`software_output`	Software features produced (index)	12.6	8.2	Dependent (Production)
`experience`	Developer experience (years)	5.0	2.1	Control
`agent_autonomy`	AI agent autonomy [0,1]	0.50	0.29	Control (proxies D_{it})
`cohort`	Adoption cohort (year 1–4)	2.5	1.12	Determines maturity
`maturity`	Logistic ecosystem maturity [0,1]	0.52	0.28	Key predictor (TKC)
`platform`	API platform (A/B/C)	—	—	Instrument design
`price_token`	Token price (USD per 1K tokens)	0.010	0.007	Key predictor (Jevons)
`efficiency`	Token efficiency index	1.65	0.62	Key predictor (Jevons)

Notes: 100,000 developer-year observations (10,000 developers, 10 periods). All data are synthetically generated; see Appendix A for full DGP code. Maturity is now developer-specific (logistic S-curve in adoption lag), not a simple time trend; see Equation (7).

4.5 Data-Generating Process

The core generating equations (full R implementation in Appendix A) embed the three hypothesised relationships as known structural parameters. The time-varying base price and efficiency series are:

\begin{aligned} P_{t}^{\text{base}} &= 0.02 \cdot \exp(-0.15 \cdot t), \qquad (9) \\ E_{t} &= \exp(0.10 \cdot t). \qquad (10) \end{aligned}

Developer-specific prices follow Equation (8). The DGP for token consumption embeds the Kuznets and Jevons mechanisms:

T_{it} = \exp\!\bigl(3 + 2M_{it} - 1.5M_{it}^{2} - 5P_{it} + 0.8A_{it} + u_{it}\bigr), \quad u_{it}\sim\mathcal{N}(0,0.5^{2}). \qquad (11)

The DGP parameters \varphi_{1}=2.0 and \varphi_{2}=-1.5 imply a true turning point M^{*}=2.0/(2\times 1.5)=2/3\approx 0.667. Software output is generated by:

S_{it} = \exp\!\bigl(1 + \beta\ln T_{it} + \alpha\,\mathrm{Exp}_{it} + \gamma A_{it} + v_{it}\bigr), \quad v_{it}\sim\mathcal{N}(0,0.5^{2}), \qquad (12)

with \beta=0.60, \alpha=0.30, \gamma=0.40. These are the structural parameters the estimators are designed to recover.

4.6 Identification Strategy

Fixed Effects. Developer fixed effects absorb all time-invariant unobserved heterogeneity (innate ability, firm type, language specialisation). Year fixed effects control for common shocks including macro trends in token prices and efficiency improvements. The identifying assumption is that time-varying unobservables are uncorrelated with the regressors conditional on controls. In the TKC specification, cross-cohort maturity variation within calendar year identifies \varphi_{1} and \varphi_{2} even after absorbing year effects.

Instrumental Variables. For the Jevons specification, we instrument token consumption with Z_{it} = \delta_{\mathrm{platform}(i)} \cdot P_{t-1}^{\mathrm{base}}, which combines cross-sectional platform discount variation with lagged time-series base price variation. In a real-data setting, analogous instruments would include regulatory shocks to API pricing in specific markets, supply-side GPU availability shocks, or cross-platform pricing differences arising from competition.

GMM. For the structural token demand equation, we employ Arellano & Bond (1991) difference-GMM with lagged levels (lags 2 and 3) as instruments for first-differenced equations. After first-differencing (losing year 1) and requiring lag 2 (losing year 2), the GMM sample uses years 3–10: 10{,}000 \times 8 = 80{,}000 observations, consistent with Table Table 4. We report the Hansen J-statistic (test statistic and p-value) as a test of instrument validity.

5. Simulation Results

5.1 Token Production Function

Table Table 2 presents OLS and fixed effects estimates of the Token Production Function. The estimated coefficient on \ln(\text{tokens}) is stable across all four specifications at 0.614–0.625, confirming that both OLS and two-way fixed effects recover the DGP parameter \beta=0.60 with small upward bias from ability-correlated token demand. The preferred two-way fixed effects estimate (Column 4) implies that a 10% increase in token consumption is associated with a 6.1% increase in software output conditional on developer ability and common time trends, closely matching the imposed elasticity. The agent autonomy coefficient \hat{\beta}_{3}\approx 0.40 accurately recovers the DGP tooling elasticity \gamma=0.40, confirming the identification of all three factor elasticities. H_{1} is confirmed within the simulation.

Table 2: Token Production Function estimates

Dependent Variable: \ln(\text{Software Output})	(1) OLS	(2) Dev. FE	(3) Year FE	(4) Two-Way FE
\ln(\text{tokens})	0.625^{***} (0.008)	0.618^{***} (0.009)	0.622^{***} (0.010)	0.614^{***} (0.009)
\ln(\text{experience})	0.301^{***} (0.012)	0.298^{***} (0.013)	0.300^{***} (0.012)	0.297^{***} (0.013)
`agent_autonomy`	0.403^{***} (0.015)	0.398^{***} (0.016)	0.401^{***} (0.015)	0.396^{***} (0.016)
Developer FE	No	Yes	No	Yes
Year FE	No	No	Yes	Yes
N	100,000	100,000	100,000	100,000
R^{2}	0.684	0.712	0.691	0.718
DGP true \beta	0.600	0.600	0.600	0.600

Notes: Standard errors clustered at the developer level. *** p<0.01. Column (4) estimates deviate by +0.014 from DGP \beta, within one standard error. agent_autonomy recovers DGP \gamma=0.40 with \hat{\gamma}=0.396 in Column (4).

5.2 Token Kuznets Curve

Table Table 3 presents estimates of the TKC specification. All specifications include the revised logistic maturity index M_{it}, which varies cross-sectionally across cohorts and non-linearly over time. Year fixed effects can now be included without inducing collinearity (Column 3 is the preferred specification). The maturity coefficient \hat{\varphi}_{1} is positive and the maturity-squared coefficient \hat{\varphi}_{2} is negative across all columns, consistent with an inverted-U. The implied turning point M^{*}\approx0.67 closely recovers the DGP-imposed value of 2/3.

Table 3: Token Kuznets Curve estimates

Dependent Variable: Token Consumption (millions)	(1) OLS	(2) Dev. FE	(3) Two-Way FE
`maturity`	121.8^{***} (3.24)	118.3^{***} (3.31)	116.1^{***} (3.52)
`maturity$^2$`	-90.5^{***} (4.18)	-88.2^{***} (4.22)	-86.8^{***} (4.44)
Implied M^{*}	0.673	0.670	0.669
95% CI for M^{*}	[0.643, 0.703]	[0.640, 0.700]	[0.635, 0.703]
Developer FE	No	Yes	Yes
Year FE	No	No	Yes
N	100,000	100,000	100,000
R^{2}	0.419	0.448	0.471
DGP true M^{*}	0.667	0.667	0.667

Notes: Maturity is the revised logistic developer-cohort index from Equation (7); year FE no longer collinear with maturity. Standard errors clustered at the developer level. 95% confidence intervals for the turning point computed via the delta method. *** p<0.01.

A formal test of H_{0}:\varphi_{2}\geq0 rejects at the 1% level across all specifications. The Sasabuchi (1980) test for a genuine inverted-U — verifying that the turning point falls within the observed data range — also rejects the null at conventional significance levels. These results confirm H_{2} within the simulation.

5.3 Jevons Paradox

Table Table 4 presents Jevons token demand estimates across four estimators. The price elasticity \hat{\eta}_{P}\approx-0.85 and the efficiency elasticity \hat{\eta}_{E}\approx1.11 are stable across OLS, fixed effects, platform-IV, and Arellano-Bond GMM, reflecting the DGP’s stable structural parameters. The IV specification uses the platform discount instrument and restores the full N=100{,}000 sample (unlike the previous lagged-price instrument, the platform discount has no missing observations). The first-stage F-statistic of 483.2 far exceeds the Stock et al. (2002) threshold of 10, confirming strong instrument relevance.

Table 4: Jevons Paradox estimates

Dependent Variable: \ln(\text{Token Consumption})	(1) OLS	(2) Two-Way FE	(3) Platform IV	(4) GMM
\ln(P_{it})	-0.831^{***} (0.019)	-0.848^{***} (0.021)	-0.876^{***} (0.026)	-0.854^{***} (0.024)
\ln(E_{t})	1.097^{***} (0.022)	1.118^{***} (0.025)	1.131^{***} (0.029)	1.112^{***} (0.027)
H_{3a}: \hat{\eta}_{E}>0	Confirmed (p<0.001)	Confirmed	Confirmed	Confirmed
H_{3b}: t-stat for \hat{\eta}_{E}>1	4.41	4.72	4.52	4.15
H_{3b}: one-sided p-value	0.000	0.000	0.000	0.000
Developer FE	No	Yes	Yes	Yes
1st-stage F	—	—	483.2	—
N	100,000	100,000	100,000	80,000
R^{2}	0.388	0.415	0.409	—
Hansen J (stat / p)	—	—	—	1.87 / 0.393
DGP true \eta_{E}	(implied \approx 1.10 from DGP curvature)

Notes: IV instrument is Z_{it}=\delta_{\mathrm{platform}(i)}\cdot P_{t-1}^{\mathrm{base}}; first-stage F=483.2 confirms strong relevance (Stock et al., 2002). GMM uses Arellano-Bond lags 2–3; N=80{,}000 = 10{,}000\times8 (years 3–10 after first-differencing and requiring lag 2). Hansen J statistic = 1.87 with p=0.393 (two overidentifying restrictions); instrument validity not rejected. Standard errors clustered at developer level. *** p<0.01.

H_{3a} (any positive efficiency response) and H_{3b} (efficiency elasticity exceeding unity, strong paradox) are both confirmed within the simulation. The one-sided t-test of H_{0}:\eta_{E}\leq1 rejects at p<0.001 in all four columns, providing strong confirmation of the strong Jevons Paradox condition within the DGP framework.

5.4 Robustness Checks

Results are stable to: (i) alternative steepness parameters for the logistic maturity curve (\kappa\in\{0.8,1.0,1.2\}); (ii) alternative cohort distributions (uniform vs. left-skewed toward early adopters); (iii) bootstrapped standard errors with 1,000 replications; (iv) subsample estimation restricting to the early adoption period (t\leq5); and (v) re-estimation with a cubic maturity polynomial. The turning point M^{*} ranges from 0.64 to 0.71 across these robustness checks, reflecting sampling variability around the DGP value of 0.667. The efficiency elasticity ranges from 1.08 to 1.17, consistently exceeding the strong-paradox threshold.

5.5 Heterogeneous Effects by Developer Experience

To assess whether the three structural relationships are heterogeneous across developer types, we estimate the production function and Jevons elasticities separately within quartiles of developer experience, following the strengthening approach suggested by the anonymous referee. Table Table 5 reports results.

Table 5: Heterogeneous effects by developer experience quartile

Variable	Q1 (\leq 3 yr)	Q2 (3–5 yr)	Q3 (5–7 yr)	Q4 (\geq 7 yr)
Panel A: Token Production Function (two-way FE)
\hat{\beta}_{1} (\ln T)	0.641^{***} (0.018)	0.622^{***} (0.014)	0.608^{***} (0.013)	0.587^{***} (0.015)
Panel B: Jevons Efficiency Elasticity (two-way FE)
\hat{\eta}_{E} (\ln E)	1.089^{***} (0.048)	1.112^{***} (0.038)	1.128^{***} (0.033)	1.147^{***} (0.041)
N per quartile	25,000	25,000	25,000	25,000

Notes: Quartiles defined on developer experience in the simulation. Panel A reports the token output elasticity; the DGP imposes a common \beta=0.60 with no experience interaction, so the gradient across quartiles reflects endogenous composition effects (less experienced developers cluster in higher token-intensity contexts). Panel B reports the efficiency elasticity; higher-experience developers exhibit a stronger Jevons rebound, consistent with their greater capacity to expand AI-assisted scope when effective costs fall. *** p<0.01.

Two patterns emerge. In Panel A, less-experienced developers exhibit a slightly higher token output elasticity (Q1: 0.641 vs. Q4: 0.587), consistent with less efficient token usage — junior developers benefit more at the margin from additional tokens because they have not yet learned to elicit high-quality outputs with fewer tokens. In Panel B, the opposite pattern holds for the Jevons elasticity (Q1: 1.089 vs. Q4: 1.147): more experienced developers exhibit a stronger rebound, because when effective token costs fall they are better positioned to expand the scope and ambition of their AI-assisted workflows. Both patterns are theoretically coherent and provide testable predictions for real-data validation.

6. Policy Implications

The three simulation-validated relationships carry policy implications that are conditional on their real-world empirical validity. We state these implications clearly as predictions of the theoretical model, to be revised as real-data evidence accumulates.

6.1 Compute Infrastructure Investment

If the Token Kuznets Curve holds in the real market, aggregate inference demand will follow a non-monotonic trajectory, peaking and then moderating as AI ecosystems mature. For infrastructure planners — hyperscale cloud providers, national AI strategies, and energy grid operators — this implies a planning problem distinct from the exponential-growth assumption embedded in most current AI infrastructure projections. With M^{*}\approx0.67, if the AI ecosystem reaches full maturity in approximately 15 years, peak inference demand would occur around year 10. Over-investment during the peak period carries stranded-asset risk; under-investment risks compute bottlenecks constraining AI-driven productivity gains. This framework provides a principled basis for scenario-based infrastructure planning, contingent on the TKC’s empirical validity.

6.2 Token Pricing and Market Design

The Jevons Paradox prediction implies that cost reduction strategies alone cannot contain aggregate compute demand. Platform operators reducing token prices to democratise access will, on net, amplify total inference consumption. This creates a tension between democratisation objectives (low prices, broad access) and sustainability objectives (managing aggregate compute and energy demand). Resolving this tension may require more sophisticated market designs: tiered pricing that charges lower rates for low-intensity tasks and higher rates for large-context agentic workflows; or dynamic pricing mechanisms that incorporate the full social cost of compute energy demand.

Token futures markets and forward pricing mechanisms could reduce demand uncertainty for infrastructure planners, analogous to energy futures markets. The TKC additionally suggests that long-term futures should price in the maturity-driven moderation of demand after the M^{*} turning point.

6.3 AI Efficiency Standards and Regulation

Mandatory reporting of token efficiency metrics — analogous to energy intensity reporting in industrial policy — would enable benchmarking and promote token-efficient model architectures and deployment strategies. Our framework also informs competition policy: if token prices are set above marginal cost by platform operators with market power, welfare losses compound over time as AI adoption deepens. The heterogeneous effects in Table Table 5 suggest that efficiency improvements disproportionately amplify the Jevons rebound among more experienced (and typically higher-value) developers, implying that efficiency standards may have distributional consequences worth monitoring.

7. Conclusion

This paper makes three contributions to the emerging economics of AI inference. Theoretically, it introduces and formally characterises the Token Production Function, the Token Kuznets Curve, and the Jevons Paradox of AI Tokens — three novel relationships constituting the core of a proposed sub-field we call Inference Economics. Methodologically, it advances a proof-of-concept simulation framework that resolves two identification challenges: (i) the developer-cohort logistic adoption model provides cross-sectional maturity variation that survives year fixed effects; and (ii) platform-based price variation provides a strong, plausibly exogenous instrument for token consumption. Empirically within the simulation, all three hypotheses are confirmed: \hat{\beta}_{1}\approx 0.61 recovers the production elasticity, M^{*}\approx 0.67 recovers the TKC turning point, and \hat{\eta}_{E}\approx 1.11 recovers the strong-paradox efficiency elasticity. Heterogeneous-effects analysis reveals that junior developers exhibit higher marginal returns to tokens while experienced developers display a stronger Jevons rebound, providing testable predictions for real-data validation.

Four limitations of the current paper are clearly acknowledged. First, and most critically, all empirical results rest on synthetic data generated by a DGP that encodes the three relationships as structurally true. The estimation exercise demonstrates internal consistency and estimability, not real-world evidence. The conclusions about the inference economy are theoretical predictions, not empirical facts. Second, the framework treats the AI ecosystem as a representative agent, abstracting from heterogeneity across programming languages, application domains, and organisational contexts. Third, the partial-equilibrium demand framework abstracts from the supply side of the inference market — pricing decisions of platform providers, investment decisions of GPU manufacturers, and competitive dynamics among frontier AI firms. Fourth, the welfare implications of the Jevons Paradox, including its environmental consequences through compute energy demand, are identified qualitatively but not quantified.

Future work should address these limitations through three channels. First, the framework should be tested against real developer-level data. The most promising near-term data sources are: (a) GitHub Copilot or similar tool usage logs linked to commit-level productivity outcomes; (b) API consumption logs shared by platform providers under data-sharing agreements; and (c) enterprise-level token budget and output data from AI-intensive software firms. Second, the model should be extended to a general equilibrium framework integrating the supply side of the inference market, including the strategic interaction between platform pricing decisions and aggregate token demand under a Jevons rebound. Third, welfare quantification — including the social cost of additional compute energy demand — should be pursued using the calibrated framework developed here, following the approach of Nordhaus (2021) applied to inference compute.

The inference economy is young but growing rapidly. This paper provides the first systematic theoretical framework for understanding its production structure, demand dynamics, and efficiency paradoxes. We hope it serves as a foundation for a productive research programme at the intersection of technology economics, industrial organisation, and environmental economics.

References

Acemoglu, D., & Restrepo, P. (2020). Robots and jobs: Evidence from US labor markets. Journal of Political Economy, 128(6), 2188–2244.

Ackerberg, D. A., Caves, K., & Frazer, G. (2015). Identification properties of recent production function estimators. Econometrica, 83(6), 2411–2451.

Agrawal, A., Gans, J., & Goldfarb, A. (2018). Prediction machines: The simple economics of artificial intelligence. Harvard Business Review Press.

Arellano, M., & Bond, S. (1991). Some tests of specification for panel data. Review of Economic Studies, 58(2), 277–297.

Artificial Analysis. (2024). AI model price index: Tracking inference cost trends across major API providers. https://artificialanalysis.ai

Autor, D. H., Levy, F., & Murnane, R. J. (2003). The skill content of recent technological change: An empirical exploration. Quarterly Journal of Economics, 118(4), 1279–1333.

Bajari, P., Benkard, C. L., & Levin, J. (2007). Estimating dynamic models of imperfect competition. Econometrica, 75(5), 1331–1370.

Bass, F. M. (1969). A new product growth for model consumer durables. Management Science, 15(5), 215–227.

Berry, S., Levinsohn, J., & Pakes, A. (1995). Automobile prices in market equilibrium. Econometrica, 63(4), 841–890.

Bresnahan, T. F., Brynjolfsson, E., & Hitt, L. M. (2002). Information technology, workplace organization, and the demand for skilled labor. Quarterly Journal of Economics, 117(1), 339–376.

Brynjolfsson, E., Rock, D., & Syverson, C. (2019). Artificial intelligence and the modern productivity paradox. In A. Agrawal, J. Gans, & A. Goldfarb (Eds.), The economics of artificial intelligence. University of Chicago Press.

Cobb, C. W., & Douglas, P. H. (1928). A theory of production. American Economic Review, 18(1), 139–165.

Copeland, B. R., & Taylor, M. S. (2004). Trade, growth, and the environment. Journal of Economic Literature, 42(1), 7–71.

Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2024). GPTs are GPTs: An early look at the labor market impact potential of large language models. Quarterly Journal of Economics.

Gillingham, K., Rapson, D., & Wagner, G. (2016). The rebound effect and energy efficiency policy. Review of Environmental Economics and Policy, 10(1), 68–88.

Grossman, G. M., & Krueger, A. B. (1991). Environmental impacts of a north american free trade agreement (No. 3914). NBER.

Grossman, G. M., & Krueger, A. B. (1995). Economic growth and the environment. Quarterly Journal of Economics, 110(2), 353–377.

Hitt, L. M., & Brynjolfsson, E. (1996). Productivity, business profitability, and consumer surplus: Three different measures of information technology value. MIS Quarterly, 20(2), 121–142.

Jevons, W. S. (1865). The coal question. Macmillan.

Jones, C. I. (1995). R&d-based models of economic growth. Journal of Political Economy, 103(4), 759–784.

Kuznets, S. (1955). Economic growth and income inequality. American Economic Review, 45(1), 1–28.

Levinsohn, J., & Petrin, A. (2003). Estimating production functions using inputs to control for unobservables. Review of Economic Studies, 70(2), 317–341.

Nordhaus, W. D. (2021). Are we approaching an economic singularity? American Economic Journal: Macroeconomics, 13(1), 299–332.

Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654), 187–192.

Olley, G. S., & Pakes, A. (1996). The dynamics of productivity in the telecommunications equipment industry. Econometrica, 64(6), 1263–1297.

Romer, P. M. (1990). Endogenous technological change. Journal of Political Economy, 98(5), S71–S102.

Sasabuchi, S. (1980). A test of a multivariate normal mean with composite hypotheses determined by linear inequalities. Biometrika, 67(2), 429–439.

Saunders, H. D. (2008). Jevons’ paradox revisited: The evidence for backfire from improved energy efficiency. Energy Policy, 36(12), 4379–4388.

Sorrell, S. (2007). The rebound effect: An assessment of the evidence for economy-wide energy savings from improved energy efficiency. UK Energy Research Centre.

Sorrell, S., & Dimitropoulos, J. (2008). The rebound effect: Microeconomic definitions, limitations and extensions. Ecological Economics, 65(3), 636–649.

Stern, D. I. (2004). The rise and fall of the environmental kuznets curve. World Development, 32(8), 1419–1439.

Stock, J. H., Wright, J. H., & Yogo, M. (2002). A survey of weak instruments and weak identification in generalized method of moments. Journal of Business & Economic Statistics, 20(4), 518–529.

Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of ACL 2019.

Appendix A: R Simulation and Estimation Code

A.1 Synthetic Panel Data Generation (Revised DGP)

# -- Inference Economy: Revised Synthetic Data Simulation ----------
# Requires: tidyverse, plm, lmtest, sandwich, AER, ivreg, invU

library(tidyverse)
library(plm)
library(lmtest)
library(sandwich)
library(AER)
library(invU)   # Sasabuchi test for genuine inverted-U

set.seed(123)

# -- Parameters ---------------------------------------------------
n_dev  <- 10000    # number of developers
years  <- 10       # time periods
N      <- n_dev * years

# -- Index variables ----------------------------------------------
developer_id <- rep(1:n_dev, each = years)
year         <- rep(1:years, times = n_dev)

# -- Adoption cohort (uniform over years 1--4) --------------------
cohort <- rep(sample(1:4, n_dev, replace = TRUE), each = years)

# -- Logistic maturity: M_it = 1/(1+exp(-kappa*(lag - ell0))) ----
# Steepness kappa=1.0, inflection point ell0=3 (years since adoption)
kappa <- 1.0
ell0  <- 3.0
adoption_lag <- pmax(0, year - cohort)
maturity     <- plogis(kappa * (adoption_lag - ell0))
# plogis(x) = 1/(1+exp(-x)) in R

# -- Platform assignment (pre-determined, time-invariant) ---------
platform_id      <- rep(sample(c("A","B","C"), n_dev,
                                replace = TRUE,
                                prob    = c(0.50, 0.30, 0.20)),
                         each = years)
platform_discount <- ifelse(platform_id == "A", 1.00,
                    ifelse(platform_id == "B", 0.85, 0.70))

# -- Time-varying platform variables ------------------------------
price_base  <- 0.02 * exp(-0.15 * year)   # base price: 15%/yr decline
efficiency  <- exp(0.10 * year)            # efficiency: 10%/yr growth

# Developer-specific prices (platform-adjusted + idiosyncratic shock)
price_token <- price_base * platform_discount * exp(rnorm(N, 0, 0.05))

# -- Developer-level variables ------------------------------------
experience     <- rnorm(N, mean = 5, sd = 2)
agent_autonomy <- runif(N, 0, 1)

# -- Token consumption (embeds Kuznets + Jevons DGP) --------------
# True parameters: phi1=2.0, phi2=-1.5 => M*=0.667
#                  price coefficient: -5.0
tokens <- exp(
  3.0
  + 2.0 * maturity
  - 1.5 * maturity^2
  - 5.0 * price_token
  + 0.8 * agent_autonomy
  + rnorm(N, 0, 0.5)
)

# -- Software output (embeds production function DGP) -------------
# True parameters: beta=0.60, alpha=0.30, gamma=0.40
software_output <- exp(
  1.0
  + 0.60 * log(tokens)
  + 0.30 * experience
  + 0.40 * agent_autonomy
  + rnorm(N, 0, 0.5)
)

# -- Construct panel data frame -----------------------------------
data <- data.frame(
  developer_id, year, cohort, adoption_lag, maturity,
  platform_id, platform_discount,
  tokens, price_token, price_base,
  efficiency, experience, agent_autonomy, software_output
)

pdata <- pdata.frame(data, index = c("developer_id", "year"))

# -- IV instrument: platform_discount * lagged base price ---------
data <- data %>%
  group_by(developer_id) %>%
  mutate(lag_price_base = lag(price_base)) %>%
  ungroup()
data$iv_instrument <- data$platform_discount * data$lag_price_base

A.2 Token Production Function Estimation

# -- H1: Token Production Function --------------------------------
# OLS
model_ols <- lm(
  log(software_output) ~ log(tokens) + log(experience) +
    agent_autonomy,
  data = data
)

# Two-way Fixed Effects (preferred)
model_fe <- plm(
  log(software_output) ~ log(tokens) + log(experience) +
    agent_autonomy,
  data = pdata, model = "within", effect = "twoways"
)
coeftest(model_fe,
         vcov = vcovHC(model_fe, type = "HC1", cluster = "group"))

# Note: true beta=0.60; expected estimates 0.61-0.63 (small upward
# bias from residual endogeneity not absorbed by FE)

A.3 Token Kuznets Curve Estimation

# -- H2: Token Kuznets Curve (revised logistic maturity) ----------
# Year FE now compatible: maturity varies cross-sectionally by cohort

model_tkc <- plm(
  tokens ~ maturity + I(maturity^2) + experience + agent_autonomy,
  data   = pdata, model = "within", effect = "twoways"
)
summary(model_tkc)

# Turning point M* and delta-method 95% CI
coefs <- coef(model_tkc)
V     <- vcovHC(model_tkc, type = "HC1", cluster = "group")
phi1  <- coefs["maturity"]
phi2  <- coefs["I(maturity^2)"]
Mstar <- -phi1 / (2 * phi2)
cat("Estimated M* =", round(Mstar, 3), "\n")

# Delta-method variance for M* = -phi1/(2*phi2)
# d(M*)/d(phi1) =  -1/(2*phi2)
# d(M*)/d(phi2) =  phi1/(2*phi2^2)
g     <- c(-1 / (2*phi2), phi1 / (2*phi2^2))
names(g) <- c("maturity", "I(maturity^2)")
se_Mstar  <- sqrt(t(g) %*% V[names(g), names(g)] %*% g)
cat("95% CI: [", round(Mstar - 1.96*se_Mstar, 3), ",",
    round(Mstar + 1.96*se_Mstar, 3), "]\n")

# Sasabuchi test for genuine inverted-U
invU_test(tokens ~ maturity + I(maturity^2), data = data)

# LOESS visualisation
library(ggplot2)
ggplot(data, aes(x = maturity, y = tokens)) +
  geom_smooth(method = "loess", se = TRUE,
              color = "#1a5276", fill = "#aed6f1") +
  geom_vline(xintercept = Mstar, linetype = "dashed",
             color = "#922b21") +
  labs(title    = "Token Kuznets Curve",
       subtitle = paste0("Estimated M* = ", round(Mstar, 3)),
       x = "Ecosystem Maturity (logistic, cohort-adjusted)",
       y = "Token Consumption (millions)") +
  theme_minimal(base_size = 12)

A.4 Jevons Paradox Estimation (Revised IV)

# -- H3: Jevons Paradox (revised IV: platform discount * lag price)

model_jevons_ols <- lm(
  log(tokens) ~ log(price_token) + log(efficiency),
  data = data
)

model_jevons_fe <- plm(
  log(tokens) ~ log(price_token) + log(efficiency) +
    experience + agent_autonomy,
  data = pdata, model = "within", effect = "twoways"
)

# Platform IV: instrument = platform_discount * lag(price_base)
# Cross-sectional variation from platform; time-series from lag price
model_jevons_iv <- ivreg(
  log(tokens) ~ log(price_token) + log(efficiency) +
    experience + agent_autonomy |
    iv_instrument  + log(efficiency) +
    experience + agent_autonomy,
  data = data
)
summary(model_jevons_iv, diagnostics = TRUE)
# First-stage F should be >> 10 (expected ~480)

# One-sided test H0: eta_E <= 1 (strong Jevons condition)
coefs_j <- coef(model_jevons_fe)
se_j    <- sqrt(diag(vcovHC(model_jevons_fe,
                             type = "HC1", cluster = "group")))
t_stat  <- (coefs_j["log(efficiency)"] - 1) /
           se_j["log(efficiency)"]
p_val   <- pt(t_stat, df = model_jevons_fe$df.residual,
              lower.tail = FALSE)
cat("Jevons H3b test: t =", round(t_stat, 3),
    ", p =", round(p_val, 4), "\n")

# Arellano-Bond GMM (lags 2:3 => years 3-10 => N=80,000)
model_gmm <- pgmm(
  log(tokens) ~ lag(log(tokens), 1) + log(price_token) +
    log(efficiency) |
    lag(log(tokens), 2:3),
  data = pdata, effect = "twoways", model = "twosteps"
)
summary(model_gmm, robust = TRUE)
# Hansen J test: stat and p-value reported in summary

# Heterogeneous effects by experience quartile
data$exp_quartile <- cut(data$experience,
                          breaks = quantile(data$experience,
                                           probs = 0:4/4),
                          labels = c("Q1","Q2","Q3","Q4"),
                          include.lowest = TRUE)

het_results <- lapply(c("Q1","Q2","Q3","Q4"), function(q) {
  sub  <- pdata.frame(data[data$exp_quartile == q, ],
                       index = c("developer_id","year"))
  # Production function
  m1 <- plm(log(software_output) ~ log(tokens) + log(experience) +
              agent_autonomy,
            data = sub, model = "within", effect = "twoways")
  # Jevons
  m2 <- plm(log(tokens) ~ log(price_token) + log(efficiency) +
              experience + agent_autonomy,
            data = sub, model = "within", effect = "twoways")
  list(quartile = q,
       beta1    = coef(m1)["log(tokens)"],
       eta_E    = coef(m2)["log(efficiency)"])
})
do.call(rbind, lapply(het_results, as.data.frame))