Trading in Digital Financial Markets: A Microeconometric Analysis of Retail Investment Demand in the United Arab Emirates
This paper develops and estimates a micro-founded trading demand function for retail investors in the United Arab Emirates using a synthetic dataset of 10,000 digitally onboarded clients calibrated to the FAB Securities onboarding framework. We decompose investment behavior into extensive and intensive margins, estimating binary logit, ordered logit, Tobit, and negative binomial hurdle models. Results show risk tolerance, income, and experience drive participation, while sophisticated classification and education determine investment scale. Trading frequency is overwhelmingly driven by experience and financial knowledge.
Abstract
This paper develops and estimates a micro-founded trading demand function for retail investors in the United Arab Emirates using a synthetic dataset of 10,000 digitally onboarded clients calibrated to the FAB Securities onboarding framework. We decompose investment behavior into an extensive margin (participation) and an intensive margin (investment scale and trading frequency), estimating a binary logit model, two cumulative link ordered logit models, a censored Tobit regression, and a two-part negative binomial hurdle model. Average marginal effects from the participation model show that a one-unit increase in the risk tolerance index raises the probability of trading by 11.5 percentage points, while income and experience add 6.7 and 4.5 percentage points, respectively. Conditional on participation, sophisticated investor classification increases planned investment intensity by 0.784 log-odds units, and graduate-level education adds approximately 1.15–1.31 log-odds units relative to a Bachelor’s degree. Trading frequency is overwhelmingly determined by experience (1.768***) and financial knowledge, with expert investors trading at substantially higher rates than the reference group. The hurdle model confirms a clean two-stage decision process: income and risk drive market entry, while experience and knowledge drive subsequent intensity. A Monte Carlo validation exercise recovers structural parameters with negligible bias. These results contribute to the household finance literature by quantifying the joint determinants of digital financial market participation and demand intensity in a rapidly growing Gulf Cooperation Council economy.
JEL Classification: G11, G41, D14, C35, O16
Keywords: retail investor behavior, digital financial markets, extensive and intensive margin, ordered logit, hurdle model, UAE, household finance, financial participation
1. Introduction
The digital transformation of retail financial services has created an unprecedented opportunity to study investor behavior at the micro level. Digital onboarding platforms now collect granular data on investor demographics, risk preferences, financial knowledge, and behavioral intentions at the point of market entry, enabling researchers to construct comprehensive structural models of investment demand (Campbell, 2006). In the Gulf Cooperation Council (GCC) region, and the United Arab Emirates in particular, this process has been accelerated by regulatory modernization and a rapidly expanding retail investor base spanning both Emirati nationals and a large, diverse expatriate population.1
The theoretical underpinning of retail participation is well established. Haliassos & Bertaut (1995) demonstrate that fixed entry costs can rationalize non-participation by households with positive expected returns, while Viceira (2001) show that labor income risk critically shapes the risky asset share over the life cycle. Vissing-Jørgensen (2002) quantifies the role of participation costs and finds that eliminating fixed entry barriers could raise stockholding substantially. The more recent household finance literature (Guiso & Sodini, 2013) extends this to document heterogeneity in financial literacy, risk attitudes, and portfolio sophistication as first-order determinants of investment behavior. Complementary evidence from Fagereng et al. (2017), using error-free Norwegian registry data, confirms a double life-cycle adjustment: a rebalancing away from stocks as investors approach retirement, and market exit post-retirement — mechanisms governed by distinct entry-cost and wealth-accumulation channels that mirror the two-stage decomposition we propose here.
Yet the existing empirical literature suffers from two limitations that this paper addresses. First, most studies rely on household surveys (e.g., the Survey of Consumer Finances, the Dutch DNB Household Panel) that capture ex post portfolio positions rather than the ex ante investment intentions elicited at the moment of market entry. Second, the GCC financial market context is almost entirely absent from the international empirical literature, despite accounting for substantial and growing shares of emerging market retail investment. Al-Tamimi (2006) provides an early examination of UAE investor behavior but without the micro-level modeling infrastructure now available.
This paper makes three contributions. First, we propose a unified micro-econometric framework that jointly models the participation decision (extensive margin) and investment demand (intensive margin) using a structural data generating process (DGP) calibrated to actual FAB Securities digital onboarding data. Second, we estimate five complementary econometric models — binary logit, two ordered logit cumulative link models (CLMs), a censored Tobit, and a negative binomial hurdle model — enabling cross-model robustness assessment and a clear decomposition of entry versus intensity effects. Third, we report average marginal effects (AMEs) from the participation model alongside log-odds coefficients, facilitating interpretability for policy applications in investor protection and financial inclusion.
Our main findings can be summarized as follows. Risk tolerance, income, and trading experience are the dominant drivers of market entry, with risk tolerance exhibiting the largest average marginal effect (11.5 percentage points). Conditional on participation, advanced education and sophisticated investor classification are the primary determinants of investment scale, while experience and financial knowledge dominate trading frequency. The hurdle model reveals a clean separation between these two margins, consistent with the theoretical prediction that distinct behavioral mechanisms govern entry and intensity. Log(theta) from the negative binomial count component is large and precisely estimated (19.983***, SE = 1.087), indicating negligible overdispersion and a near-deterministic relationship between knowledge and experience in determining trading frequency conditional on entry.
1.1 Related Literature
This paper contributes to four interrelated strands of research.
Household finance and portfolio participation. The household finance literature, surveyed by Campbell (2006) and Guiso & Sodini (2013), establishes that market non-participation is a rational response to fixed entry costs, background risks, and information frictions. Haliassos & Bertaut (1995) and Vissing-Jørgensen (2002) formalize this via fixed-cost models; Cocco et al. (2005) trace the life-cycle trajectory of the risky asset share under non-tradable labor income. The empirical decomposition of participation into extensive and intensive margins follows Calvet et al. (2007), who use Swedish registry data to show that education is the primary predictor of portfolio efficiency conditional on market entry. Fagereng et al. (2017) deepen this life-cycle evidence using Norwegian panel data, documenting that asset market entry and the conditional risky share are governed by distinct parameter sets — an empirical regularity that directly motivates our hurdle specification. In their framework, entry is primarily governed by the participation cost relative to expected surplus, while the conditional portfolio share is shaped by wealth accumulation and experience: precisely the separation we document in Proposition 1.
Financial literacy as a participation barrier. A second strand establishes financial knowledge as a structural participation barrier. Rooij et al. (2011), using the Dutch DNB Household Panel, find that low-literacy households are significantly less likely to invest in stocks, with the literacy effect operating independently of income and wealth. Lusardi & Mitchell (2014) synthesize the early evidence, while Klapper & Lusardi (2020), drawing on surveys across more than 140 countries, confirm that financial literacy falls below 50% across most emerging economies and is a robust predictor of financial resilience. The most recent synthesis by Lusardi & Mitchell (2023), reviewing two decades of accumulated research, demonstrates that financial knowledge operates not merely as a participation barrier but as an intensity amplifier: literate investors diversify better, trade more strategically, and accumulate substantially more wealth over the life cycle. Our intensive-margin results — particularly the steep, monotone knowledge gradient in trading frequency spanning 4.3 log-odds units from “No knowledge” to “Expert” — are directly interpretable through this lens and provide one of the first quantifications of this amplification dynamic in a GCC digital brokerage context.
Behavioral finance and individual investor trading. Individual investor behavior departs from the rational benchmark in ways that interact critically with our margin decomposition. Barber & Odean (2000) document that the average retail investor underperforms passive benchmarks due to excessive trading, while Barber & Odean (2001) establish that overconfidence — more prevalent among men — amplifies turnover. The comprehensive review by Barber & Odean (2013) synthesizes this literature, documenting the disposition effect, attention-driven buying, and naïve reinforcement learning as characteristic retail investor behaviors. Crucially, Seru et al. (2010) show that experience attenuates these biases over time, as repeated trading accumulates market-specific human capital. This learning-by-doing mechanism provides the micro-foundation for the dominant role of experience in our frequency equation. Dimmock & Kouwenberg (2010) complement this behavioral evidence with direct measurement of loss aversion from a Dutch household survey, finding that higher loss aversion substantially reduces both market entry and the conditional portfolio share — a result that maps naturally onto our theoretical framework where \gamma_i enters both the participation surplus and the optimal risky demand.
Digital financial platforms. The emergence of low-cost digital brokerages and robo-advisory platforms has transformed retail investment. D’Acunto et al. (2019), studying a large-scale robo-advisory deployment, show that digitally onboarded investors improve portfolio diversification relative to pre-adoption behavior, with the largest gains among investors who were ex ante underdiversified. This finding implies that the structured questionnaire process has informational value: by eliciting risk tolerance, investment horizon, and financial knowledge, digital platforms shape the investment demand formation process. D’Acunto & Rossi (2021) extend this analysis with a taxonomy of robo-advisory systems along four dimensions — personalization, discretion, investor involvement, and human interaction — arguing that onboarding interface design feeds back into risk-tolerance elicitation in ways consequential for subsequent portfolio intensity. These insights apply directly to the FAB Securities onboarding framework we calibrate, where the structured questionnaire is not a passive data-collection instrument but an active component of demand formation.
GCC and MENA financial markets. The GCC financial context has received limited attention in the household finance literature. Al-Tamimi (2006) provides an early cross-sectional analysis of UAE investor behavior. Abuzayed et al. (2021) document substantial systemic risk spillovers between global and GCC equity markets during the COVID-19 pandemic, demonstrating that experienced GCC investors must navigate a more complex, crisis-prone return environment than investors in isolated markets. This market structure is relevant to interpreting our experience coefficient: the premium on experience in the UAE may partly reflect skills for managing cross-market contagion risk, beyond the generic learning-by-doing mechanism of Seru et al. (2010).
The remainder of the paper is organized as follows. Section 3 develops the theoretical framework. Section 4 describes the data and simulation design. Section 5 presents the empirical strategy. Section 6 discusses the estimation results. Section 7 reports the Monte Carlo validation. Section 8 interprets the findings and their implications. Section 9 concludes.
2. Theoretical Framework
2.1 Investor Optimization Problem
Consider a risk-averse investor i with initial wealth W_i who allocates a fraction \alpha_i \in [0,1] to a risky asset with stochastic return \tilde{r}_i and the remainder to a risk-free asset earning r_f. Terminal wealth is:
W_i' = (1 - \alpha_i)\,W_i\,(1 + r_f) + \alpha_i\,W_i\,(1 + \tilde{r}_i). \tag{1}
The investor maximizes expected utility under a mean-variance objective (Markowitz, 1952):
U_i = E[W_i'] - \frac{\gamma_i}{2}\,Var(W_i'), \tag{2}
where \gamma_i > 0 denotes the coefficient of absolute risk aversion, which we allow to vary across investors as a function of their elicited risk tolerance.
Assumption 1 (Distributional regularity). Returns \tilde{r}_i \sim \mathcal{N}(\mu_i, \sigma_i^2) are independently and identically distributed across assets and periods, and \sigma_i^2 > 0 for all i.
Under Assumption(1), the optimal risky asset share is:
and the optimal investment demand (in currency units) is:
D_i^* = \alpha_i^*\,W_i = \frac{(\mu_i - r_f)\,W_i}{\gamma_i\,\sigma_i^2}. \tag{4}
Equation Equation 4 embeds three structural channels that motivate our empirical specification: (i) income and wealth W_i, (ii) risk aversion \gamma_i (inversely related to risk tolerance), and (iii) return beliefs \mu_i, which we assume are shaped by financial knowledge and experience. The life-cycle trajectory of these components has been empirically traced by Cocco et al. (2005) and Fagereng et al. (2017), both of whom find that wealth accumulation and declining labor income risk over the life cycle jointly predict the inverted-U pattern of risky asset demand. In our cross-sectional framework, the age term in the investment equation captures this wealth-accumulation channel, while cross-sectional heterogeneity in \gamma_i — which Dimmock & Kouwenberg (2010) measure directly via loss-aversion parameters — motivates allowing risk tolerance to enter all three behavioral equations with potentially asymmetric effects across margins.
2.2 Participation Decision
Entry into the market requires a fixed cost F_i > 0 (search costs, learning time, platform setup), modeled as a random variable with distribution function G(\cdot). Investor i participates if and only if the utility gain from optimal demand exceeds the entry cost:
T_i = \mathbf{1}\bigl[V_i(D_i^*) - V_i(0) > F_i\bigr], \tag{5}
where V_i(\cdot) is the indirect utility function evaluated at the optimal portfolio. Substituting Equation 2 and Equation 4 gives:
V_i(D_i^*) - V_i(0) = \frac{(\mu_i - r_f)^2}{2\,\gamma_i\,\sigma_i^2}. \tag{6}
The surplus in Equation 6 is increasing in the Sharpe ratio (\mu_i - r_f)/\sigma_i and decreasing in risk aversion \gamma_i. This formulation implies that investors with higher loss aversion — who effectively perceive \gamma_i as larger in the domain of losses than of gains — will be systematically less likely to participate, consistent with the empirical finding of Dimmock & Kouwenberg (2010) that loss-averse Dutch households hold significantly less equity independently of income and wealth. Linearizing and projecting onto observable covariates yields the reduced-form latent index:
U_i^* = \beta_0 + \beta_1 Y_i + \beta_2 R_i + \beta_3 E_i + \beta_4 K_i + \beta_5 (Y_i \times R_i) + \beta_6 Y_i^2 + \beta_7 S_i + \varepsilon_i, \tag{7}
with T_i = \mathbf{1}[U_i^* > 0], where Y_i is income, R_i is risk tolerance, E_i is trading experience, K_i is financial knowledge, S_i is an indicator for speculative investor type, and \varepsilon_i \sim \text{Logistic}(0,1), giving the binary logit model. The knowledge variable K_i enters as a shifter of effective entry costs: financially literate investors face lower information-processing costs and form better-calibrated return beliefs \mu_i, a mechanism formalized by Rooij et al. (2011) and confirmed at global scale by Klapper & Lusardi (2020) and Lusardi & Mitchell (2023).
2.3 Investment Intensity
Conditional on participation (T_i = 1), investment demand D_i = D_i^*\,\mathbf{1}[T_i = 1] is determined by:
D_i^* = \alpha_0 + \alpha_1 Y_i + \alpha_2 R_i + \alpha_3 E_i + \alpha_4 H_i + \alpha_5 (Y_i \times R_i) + \alpha_6 A_i + \alpha_7 A_i^2 + \alpha_8 P_i + u_i, \tag{8}
where H_i indicates graduate-level education (Master, PhD, or Professional), A_i is age (allowing a quadratic life-cycle profile), P_i indicates sophisticated investor classification, and u_i \sim \mathcal{N}(0, \sigma_u^2). Since D_i is observed only in bracketed AED categories, we estimate Equation 8 as a cumulative link ordered logit model. The education term H_i captures the portfolio-sophistication channel documented by Calvet et al. (2007), whereby higher-educated investors hold better-diversified portfolios and deploy capital more efficiently conditional on market entry. The quadratic age term allows for the life-cycle wealth-accumulation profile of Cocco et al. (2005), with the conditional investment level rising through middle age as financial assets accumulate.
2.4 Trading Frequency
The trading frequency decision is modeled as a latent index:
F_i^* = \gamma_0 + \gamma_1 E_i + \gamma_2 \mathbf{1}[\text{Aggressive}_i] + \gamma_3 K_i^{adv} + \gamma_4 S_i + \gamma_5 (E_i \times K_i) + \eta_i, \tag{9}
where \mathbf{1}[\text{Aggressive}_i] indicates an aggressive or mixed trading strategy, K_i^{adv} indicates advanced knowledge, and \eta_i \sim \text{Logistic}(0,1). The dominant role of experience in this equation is grounded in the learning-by-doing theory of Seru et al. (2010), who show that trading experience progressively reduces behavioral biases — including the disposition effect and overconfidence-driven turnover documented by Barber & Odean (2013) — shifting investor behavior toward the rational benchmark. The speculative investor indicator S_i captures the propensity of overconfident investors to trade at higher frequency, consistent with the gender and overconfidence evidence of Barber & Odean (2001). The correlation structure \Sigma among (\varepsilon_i, u_i, \eta_i) with off-diagonal elements (0.55, 0.45, 0.65) introduces sample selection considerations addressed through the two-part hurdle model.
Proposition 1. Under the DGP specified by equations Equation 7–Equation 9, the extensive and intensive margins are governed by distinct parameter sets: income and risk tolerance dominate participation, while experience and knowledge dominate frequency conditional on entry.
Proposition (1) motivates the hurdle model specification and is empirically tested in Section 6.
3. Data and Simulation Design
3.1 Data Source and Construction
The dataset consists of 10,000 synthetic observations generated via a structural DGP calibrated to the FAB Securities digital onboarding framework for UAE retail investors. The simulation uses a single random seed (2024) applied once prior to all data generation steps, ensuring full reproducibility. The DGP imposes correlated latent errors across the three behavioral equations via a multivariate normal draw with the correlation matrix \Sigma = \begin{pmatrix} 1.00 & 0.55 & 0.45 \\ 0.55 & 1.00 & 0.65 \\ 0.45 & 0.65 & 1.00 \end{pmatrix}, capturing unobserved heterogeneity common to participation, investment, and frequency decisions.
3.2 Variable Definitions
Demographic covariates. Age is drawn from a truncated normal distribution with mean 45.4 and standard deviation 9.2 years (range: 25–70), consistent with UAE capital market client demographics. Place of birth covers 30 origin countries, with Dubai (9.2%), India (9.2%), and Egypt (8.0%) as the three largest groups, reflecting the UAE’s expatriate population structure. Education is distributed across five levels (High School through Professional qualification) with 35.3% holding Bachelor’s degrees, 30.3% Master’s degrees, and 14.7% PhDs.
Financial characteristics. Income is coded on a five-point ordinal scale in AED (<250k, 250k–500k, 500k–1M, 1M–3M, >3M) with modal category 500k–1M (29.7%). Financial liabilities are log-transformed for modelling (\overline{\log(\text{liabilities})} = 10.80, SD = 0.91, range: 5.76–14.90) to address scale heterogeneity.
Trading attributes. Risk tolerance is elicited on a three-point scale (Low, Medium, High), with Medium as the modal category (60.0%). Trading experience spans five levels (None through >3 years). Financial knowledge spans five levels (No knowledge through Expert) with Intermediate as the modal category (30.0%). Trading strategy covers four categories (Conservative, Moderate, Aggressive, Mixed approach), with Moderate (30.0%) and Mixed approach (30.0%) as the most common.
Investor type. Investors are classified into Retail (55.0%), Sophisticated (25.0%), Speculative (15.0%), and Institutional (5.0%) segments, consistent with UAE regulatory categorization under SCA Board Decision No. (13/RM) of 2020.
Outcome variables. Trading participation (traded_before) is binary, with 76.2% of the sample having traded previously. Planned investment is observed in seven AED brackets, with the DGP calibrated (scale factor +9.8 on the log investment index) to produce a realistic distribution: 3.3% below AED 50,000, 6.5% in the 50k–100k bracket, 15.9% in 100k–250k, 17.6% in 250k–500k, 17.6% in 500k–1M, 23.0% in 1M–3M, and 16.1% above AED 3M. Trading frequency is discretized into four equal-probability quartile categories (Very Low, Low, Medium, High), each at 25.0%.
3.3 Descriptive Statistics
Table Table 1 reports summary statistics for the key numerical covariates. The pairwise correlations among outcome variables — (traded, investment intensity) = 0.312, (traded, frequency) = 0.215, (investment, frequency) = 0.362 — confirm the moderate cross-margin dependence built into the DGP through the correlated error structure, while leaving sufficient independent variation for separate identification of each margin.
| Variable | Mean | SD | Min | Median | Max |
|---|---|---|---|---|---|
| Panel A: Continuous covariates | |||||
| Age (years) | 45.36 | 9.21 | 25.00 | 45.00 | 70.00 |
| Income (ordinal, 1–5) | 2.85 | 1.17 | 1.00 | 3.00 | 5.00 |
| Risk tolerance (1–3) | 1.99 | 0.60 | 1.00 | 2.00 | 3.00 |
| Experience (0–4) | 2.13 | 1.37 | 0.00 | 2.00 | 4.00 |
| Knowledge (1–5) | 3.00 | 1.12 | 1.00 | 3.00 | 5.00 |
| log(Liabilities) | 10.80 | 0.91 | 5.76 | 10.78 | 14.90 |
| Panel B: Binary / indicator covariates | |||||
| Traded before | 0.762 | 0.426 | 0 | 1 | 1 |
| Has other brokers | 0.298 | 0.458 | 0 | 0 | 1 |
| Panel C: Outcome variable distributions | |||||
| Planned investment (% of sample) | |||||
| <AED 50k | 3.26 | — | — | — | — |
| AED 50k–100k | 6.45 | — | — | — | — |
| AED 100k–250k | 15.88 | — | — | — | — |
| AED 250k–500k | 17.63 | — | — | — | — |
| AED 500k–1M | 17.63 | — | — | — | — |
| AED 1M–3M | 23.03 | — | — | — | — |
| AED >3M | 16.12 | — | — | — | — |
| Trading frequency (% of sample; all traders only) | |||||
| Very Low | 25.00 | — | — | — | — |
| Low | 25.00 | — | — | — | — |
| Medium | 25.00 | — | — | — | — |
| High | 25.00 | — | — | — | — |
Notes: N = 10{,}000 observations. Panel A reports statistics for ordinal and continuous regressors. Panel B reports means (= proportions) for binary indicators. Panel C reports category shares for the two intensive-margin outcomes. Ordinal income, risk tolerance, experience, and knowledge are integer-coded according to the mapping described in Section 4.
4. Empirical Strategy
Our empirical design decomposes retail investment behavior into two stages following the sample selection framework of Heckman (1979). The extensive margin captures the binary entry decision; the intensive margin conditions on entry and models the scale and frequency of investment. Throughout, we denote the vector of covariates for investor i as \bm{x}_i = (Y_i, \tilde{\ell}_i, R_i, E_i, \bm{K}_i, \bm{H}_i, B_i, \bm{P}_i, A_i)', where \tilde{\ell}_i = \log(\text{Liabilities}_i + 1) and the bold terms collect factor-level indicator vectors.
4.1 Extensive Margin: Binary Logit Model
We model the probability of prior trading as:
\Pr(T_i = 1 \mid \bm{x}_i) = \Lambda(\bm{x}_i'\,\bm{\beta}), \tag{10}
where \Lambda(\cdot) is the standard logistic CDF. Since logit coefficients are in log-odds units, we compute average marginal effects (AMEs) to facilitate economic interpretation (Cameron & Trivedi, 2005):
\widehat{\text{AME}}_k = \frac{1}{n} \sum_{i=1}^n \hat{\beta}_k\, \Lambda(\bm{x}_i'\,\hat{\bm{\beta}})\, [1 - \Lambda(\bm{x}_i'\,\hat{\bm{\beta}})], \tag{11}
for continuous covariates, with analogous finite-difference expressions for categorical variables.
4.2 Intensive Margin I: Ordered Logit for Investment Scale
Conditional on T_i = 1, planned investment is observed in ordered category j \in \{1,\ldots,7\}. Let D_i^* = \bm{x}_i'\,\bm{\alpha} + u_i be a latent continuous index, with u_i \sim \text{Logistic}(0,1). We observe:
D_i = j \quad \iff \quad \tau_{j-1} < D_i^* \leq \tau_j, \tag{12}
for threshold parameters \tau_0 < \tau_1 < \cdots < \tau_7. The cumulative link model is estimated by maximum likelihood via ordinal::clm with flexible thresholds and standard errors robust to the near-unidentifiability warning (addressed by replacing raw liabilities with \log(\text{Liabilities}+1), which reduces the condition number of the Hessian from 3.0\times10^{13} in the original specification to 2.3\times10^6).
4.3 Intensive Margin II: Ordered Logit for Trading Frequency
Analogously, trading frequency is modelled as an ordered response with four categories. The latent index F_i^* follows Equation 9, and the cumulative link model is estimated with logit link. The specification includes trading strategy and product type as additional regressors absent from the investment scale model, consistent with the distinct structural mechanisms governing frequency.
4.4 Tobit Robustness Check
Treating the ordinal investment index as an approximately continuous variable censored from below at 1 and above at 7, we estimate:
D_i^{\text{Tobit}} = \max\!\bigl(1,\,\min(7,\, \bm{x}_i'\,\bm{\delta} + v_i)\bigr), \quad v_i \sim \mathcal{N}(0, \sigma_v^2), \tag{13}
by maximum likelihood via AER::tobit. This model serves as a robustness check for the ordered logit and is interpreted accordingly.
4.5 Two-Part Hurdle Model
To jointly model the participation and frequency decisions while allowing for distinct mechanisms at each margin, we specify a two-part negative binomial hurdle model (Cragg, 1971; Mullahy, 1986):
\begin{align} \Pr(C_i = 0 \mid \bm{z}_i) &= 1 - \Lambda(\bm{z}_i'\,\bm{\pi}), \\ C_i \mid C_i > 0 &\sim \text{TruncNegBin}(\mu_i, \theta), \end{align} \tag{14}
where C_i \in \{0,1,2,3,4\} is the trading frequency count (with C_i = 0 for non-traders), \bm{z}_i is a subset of \bm{x}_i entering the zero-hurdle component (income, log-liabilities, risk, experience, knowledge, other-broker indicator, age), and \mu_i = \exp(\bm{w}_i'\,\bm{\gamma}) is the conditional count mean with \bm{w}_i comprising experience, knowledge, strategy, log-liabilities, and age. We use the negative binomial count distribution to allow for overdispersion, resolving the degenerate Hessian (NaN standard errors) that arises under the Poisson specification when the discrete count range 1–4 is too cleanly predicted.
4.6 Multicollinearity and Diagnostic Checks
Generalized variance inflation factors (GVIFs) for all models are reported in Table 5. The maximum GVIF^{1/(2 \times \text{df})} across all predictors is 1.007 (for risk tolerance), well below the conventional threshold of 3.16, confirming negligible multicollinearity. The complete separation check yields no coefficients exceeding |\hat{\beta}| > 10 in the logit model.
5. Estimation Results
5.1 Extensive Margin: Participation Decision
Table 2 presents the logit estimates and average marginal effects for the trading participation equation. The model is estimated on all N = 10{,}000 observations, with null deviance 10,975.3 and residual deviance 9,844.7 on 9,982 degrees of freedom (AIC = 9,880.7).
| Variable | Log-Odds Coef. | SE | AME | SE |
|---|---|---|---|---|
| Intercept | -1.974*** | 0.305 | — | — |
| Income (ordinal) | 0.418*** | 0.022 | 0.067 | 0.003 |
| log(Liabilities) | -0.014 | 0.021 | -0.002 | 0.003 |
| Risk tolerance | 0.716*** | 0.041 | 0.115 | 0.006 |
| Experience | 0.279*** | 0.019 | 0.045 | 0.003 |
| Knowledge (ref: Basic) | ||||
| No knowledge | -0.088 | 0.088 | -0.016 | 0.016 |
| Intermediate | -0.092 | 0.064 | -0.016 | 0.012 |
| Quite knowledgeable | 0.717*** | 0.074 | 0.108 | 0.011 |
| Expert | 0.680*** | 0.100 | 0.104 | 0.014 |
| Education (ref: Bachelor) | ||||
| High School | -0.026 | 0.076 | -0.004 | 0.013 |
| Master | 0.070 | 0.062 | 0.011 | 0.010 |
| PhD | 0.073 | 0.078 | 0.012 | 0.012 |
| Professional | 0.054 | 0.120 | 0.009 | 0.019 |
| Has other brokers | 0.032 | 0.055 | 0.005 | 0.009 |
| Investor type (ref: Institutional) | ||||
| Retail | -0.046 | 0.115 | -0.008 | 0.019 |
| Sophisticated | -0.073 | 0.120 | -0.012 | 0.020 |
| Speculative | 0.420** | 0.131 | 0.062 | 0.020 |
| Age | -0.000 | 0.003 | 0.000 | 0.000 |
Notes: Dependent variable: traded_before \in \{0,1\}. Estimation by maximum likelihood logit. AMEs computed via margins::margins() using equation Equation 11. Robust standard errors in parentheses. ^{***}p < 0.01; ^{**}p < 0.05; ^{*}p < 0.10.
Income. The income coefficient is \hat{\beta}_{\text{income}} = 0.418 (SE = 0.022, p < 0.001), implying an AME of 6.7 percentage points per ordinal income bracket. This is consistent with the theoretical prediction from Equation 4 that higher wealth raises the utility gain from market entry, reducing the effective barrier posed by fixed costs. The result aligns with international evidence from Rooij et al. (2011) (Dutch panel) and Fagereng et al. (2017) (Norwegian registry), both of whom find income and wealth to be primary determinants of the market entry decision.
Risk tolerance. Risk tolerance exerts the largest marginal effect: \hat{\beta}_{\text{risk}} = 0.716 (SE = 0.041, p < 0.001), AME = 0.115 (SE = 0.006). A move from Low to Medium risk tolerance, or Medium to High, raises the probability of trading by 11.5 percentage points on average. This is consistent with Vissing-Jørgensen (2002), who identifies risk aversion as a primary participation barrier, and with the loss-aversion evidence of Dimmock & Kouwenberg (2010), who document that Dutch households with higher measured loss aversion are substantially less likely to hold equity independently of income and wealth. The UAE magnitude (11.5 pp) exceeds the Dutch equity participation effects reported by Dimmock & Kouwenberg (2010) (approximately 6–8 pp), suggesting that the broader dispersion in risk attitudes across the UAE’s multicultural expatriate investor base generates a correspondingly wider participation gap between low- and high-risk-tolerance investors.
Trading experience. Experience is positive and highly significant: \hat{\beta}_{\text{exp}} = 0.279 (SE = 0.019, p < 0.001), AME = 0.045 (SE = 0.003). This captures the learning-by-doing mechanism documented by Seru et al. (2010), whereby experienced investors face lower effective trading costs through accumulated market knowledge. The magnitude is consistent with the behavioral evidence of Barber & Odean (2013), who show that more experienced individual investors exhibit less pronounced disposition effects and attention-driven purchase biases, suggesting a selection mechanism whereby experienced investors are better informed about their own decision processes.
Financial knowledge. Expert-level knowledge raises participation odds by \exp(0.680) - 1 = 97.4\% relative to the Basic reference category (AME = 0.104, SE = 0.014, p < 0.001). The “Quite knowledgeable” category produces a nearly identical AME of 0.108 (SE = 0.011, p < 0.001). The threshold pattern — insignificant at Basic and Intermediate levels, significant and large above — aligns precisely with Lusardi & Mitchell (2023)’s argument that basic numeracy is insufficient to overcome participation costs; it is compound-interest reasoning and diversification knowledge that matters. Klapper & Lusardi (2020) document a similar non-linearity in their global cross-country analysis, where financially literate adults are substantially more likely to hold formal investments but the effect is concentrated in the upper literacy range.
Investor type. Speculative investors are significantly more likely to trade: \hat{\beta}_{\text{speculative}} = 0.420 (SE = 0.131, p = 0.001), AME = 0.062 (SE = 0.020). This pattern is consistent with Barber & Odean (2001)’s finding that overconfident investors enter markets more readily and trade more frequently, with overconfidence being higher among investors who self-attribute investment success to skill. In contrast, retail and sophisticated investor classifications are statistically indistinguishable from the institutional reference category at conventional significance levels for the participation decision, consistent with the DGP where speculative type (but not sophisticated type) enters the participation latent index.
Non-significant variables. Log-liabilities (\hat{\beta} = -0.014, SE = 0.021, p = 0.488), education level (all categories, p > 0.25), broker multiplicity (\hat{\beta} = 0.032, SE = 0.055, p = 0.563), and age (\hat{\beta} = -0.000, SE = 0.003, p = 0.970) do not significantly predict participation. The near-zero age coefficient confirms successful resolution of the date-of-birth collinearity. The absence of an independent education effect on participation — once knowledge is controlled — suggests that formal schooling influences market entry primarily through its effect on financial literacy accumulation rather than via a direct credential effect, consistent with the mediation framework of Rooij et al. (2011).
5.2 Intensive Margin I: Planned Investment Scale
Table 3 reports cumulative link ordered logit estimates for planned investment, estimated on the traders subsample (N = 7{,}620). Log-likelihood is -10{,}880.5 and AIC is 21{,}813.1. The near-unidentifiability warning from the original specification (condition number 3.0 \times 10^{13}) is resolved: the condition number falls to 2.3 \times 10^6 following substitution of \log(\text{Liabilities}) for the raw variable.
Income and risk tolerance. Both income (\hat{\alpha}_{\text{income}} = 1.190, SE = 0.022, p < 0.001) and risk tolerance (\hat{\alpha}_{\text{risk}} = 1.107, SE = 0.036, p < 0.001) are large, positive, and highly significant. These effects are approximately 2.8 and 2.5 times larger, respectively, than the corresponding participation coefficients, indicating that conditional on entry, these covariates continue to powerfully stratify investors by investment scale. The amplification pattern is consistent with the theoretical prediction from Equation 4: at the participation margin, a marginal increase in income or risk tolerance tips the cost-benefit calculation in favor of entry; conditional on entry, the same variables determine the level of optimal demand D_i^*, which grows linearly with wealth and inversely with risk aversion.
Education. Advanced education produces substantively large effects on investment intensity. Relative to Bachelor’s-degree holders, Master’s graduates invest at 1.152 log-odds units higher (SE = 0.054, p < 0.001), PhDs at 1.223 log-odds units higher (SE = 0.067, p < 0.001), and Professional degree holders at 1.308 log-odds units higher (SE = 0.104, p < 0.001). High School completion does not significantly differ from Bachelor’s level (p = 0.775). These results are consistent with the human capital channel in Calvet et al. (2007), where education proxies for financial sophistication that reduces effective portfolio constraints, and with the evidence in D’Acunto et al. (2019) that highly educated investors achieve larger diversification gains from structured onboarding. The contrast with the non-significant education effects on participation is theoretically informative: education does not lower the threshold for market entry, but substantially amplifies investment scale once the threshold is crossed.
Sophisticated investor classification. Sophisticated investors plan to invest at significantly higher scale: \hat{\alpha}_{\text{sophisticated}} = 0.784 (SE = 0.104, p < 0.001). This is the largest investor-type effect on the intensive margin and aligns with the DGP, where sophisticated type enters the investment latent index with \alpha_8 = 0.45. In contrast, speculative investor classification — while positive for participation — is insignificant on the investment scale margin (\hat{\alpha}_{\text{speculative}} = 0.033, p = 0.756), corroborating Proposition (1). This divergence mirrors the findings of Barber & Odean (2013): high-turnover retail investors (analogous to our Speculative category) do not hold larger portfolios; they simply trade more from a given asset base.
Knowledge and experience. Trading experience remains significant: \hat{\alpha}_{\text{exp}} = 0.427 (SE = 0.017, p < 0.001). Knowledge levels, by contrast, are statistically insignificant in the investment CLM, consistent with the DGP structure where knowledge does not directly enter the investment latent index and knowledge effects are mediated through the participation decision. Age exerts a small but statistically significant positive effect (\hat{\alpha}_{\text{age}} = 0.005, SE = 0.002, p = 0.022), consistent with life-cycle wealth accumulation documented by Cocco et al. (2005) and Fagereng et al. (2017).
5.3 Intensive Margin II: Trading Frequency
Table 3 presents ordered logit estimates for trading frequency, also on the traders subsample. Log-likelihood is -6{,}659.3 and AIC is 13{,}356.6.
Experience. Trading experience is the dominant predictor of frequency: \hat{\gamma}_{\text{exp}} = 1.768 (SE = 0.028, p < 0.001). This coefficient is 4.1 times larger than the corresponding participation coefficient (0.279) and 4.1 times larger than the investment coefficient (0.427), suggesting that experience is increasingly determinative as investors move from the entry margin toward active trading behavior. The escalating importance of experience across the two-stage sequence directly confirms the learning-by-doing model of Seru et al. (2010), and is consistent with the behavioral trajectory documented by Barber & Odean (2013): as investors accumulate experience, disposition effects and attention-driven biases attenuate, shifting behavior toward more deliberate, higher-frequency trading strategies.
Financial knowledge. The knowledge gradient is steep and monotone. Expert investors trade at \hat{\gamma}_{\text{Expert}} = 3.225 (SE = 0.098, p < 0.001) log-odds units above the Basic reference, “Quite knowledgeable” investors at 2.429 (SE = 0.072, p < 0.001), Intermediate at 1.073 (SE = 0.066, p < 0.001), and “No knowledge” investors at -1.092 (SE = 0.096, p < 0.001). This monotone gradient — spanning 4.317 log-odds units — is consistent with Lusardi & Mitchell (2023)’s synthesis that knowledge functions as an intensity amplifier: once the participation barrier is cleared, literate investors actively leverage their knowledge advantage in trading decisions. The gradient is substantially larger than the knowledge effects at the participation margin (spanning approximately 0.77 log-odds units), confirming that knowledge heterogeneity is more consequential for trading intensity than for market entry per se.
Trading strategy. Conservative strategy reduces frequency relative to Aggressive: \hat{\gamma}_{\text{Conservative}} = -0.842 (SE = 0.077, p < 0.001). Moderate strategy similarly reduces frequency: \hat{\gamma}_{\text{Moderate}} = -0.812 (SE = 0.070, p < 0.001). Mixed approach is not significantly different from Aggressive (\hat{\gamma}_{\text{Mixed}} = 0.112, SE = 0.070, p = 0.110).
Non-significant predictors. Income (p = 0.890), log-liabilities (p = 0.205), risk tolerance (p = 0.173), age (p = 0.141), and product type (Shares, Bonds, Derivatives, Commodities; p > 0.32, except Forex at p = 0.081) do not significantly predict frequency once knowledge and experience are controlled. Forex trading approaches significance (\hat{\gamma}_{\text{Forex}} = 0.164, SE = 0.094, p = 0.081), consistent with higher turnover in currency markets and the greater volatility of GCC-linked currency pairs documented by Abuzayed et al. (2021).
| Variable | Investment Scale Coef. | SE | Trading Frequency Coef. | SE |
|---|---|---|---|---|
| Income | 1.190*** | 0.022 | -0.003 | 0.020 |
| log(Liabilities) | -0.009 | 0.018 | -0.025 | 0.020 |
| Risk tolerance | 1.107*** | 0.036 | -0.053 | 0.039 |
| Experience | 0.427*** | 0.017 | 1.768*** | 0.028 |
| Knowledge (ref: Basic) | ||||
| No knowledge | -0.061 | 0.083 | -1.092*** | 0.096 |
| Intermediate | 0.057 | 0.059 | 1.073*** | 0.066 |
| Quite knowledgeable | -0.098* | 0.059 | 2.429*** | 0.072 |
| Expert | -0.006 | 0.076 | 3.225*** | 0.098 |
| Education (ref: Bachelor) | ||||
| High School | 0.019 | 0.065 | — | — |
| Master | 1.152*** | 0.054 | — | — |
| PhD | 1.223*** | 0.067 | — | — |
| Professional | 1.308*** | 0.104 | — | — |
| Strategy (ref: Aggressive) | ||||
| Conservative | 0.098 | 0.067 | -0.842*** | 0.077 |
| Mixed approach | 0.050 | 0.061 | 0.112 | 0.070 |
| Moderate | 0.155** | 0.061 | -0.812*** | 0.070 |
| Investor type (ref: Institutional) | ||||
| Retail | 0.051 | 0.098 | — | — |
| Sophisticated | 0.784*** | 0.104 | — | — |
| Speculative | 0.033 | 0.108 | — | — |
| Has other brokers | -0.003 | 0.046 | — | — |
| Products (ref: Bonds) | ||||
| Forex | — | — | 0.164* | 0.094 |
| Derivatives | — | — | 0.093 | 0.094 |
| Shares | — | — | -0.008 | 0.063 |
| Commodities | — | — | 0.084 | 0.122 |
| Age | 0.005** | 0.002 | 0.004 | 0.003 |
| Threshold parameters | ||||
| \tau_1 | 2.432*** | 0.278 | 1.829*** | 0.284 |
| \tau_2 | 3.952*** | 0.270 | 4.136*** | 0.287 |
| \tau_3 | 5.650*** | 0.271 | 6.524*** | 0.294 |
| \tau_4 | 6.887*** | 0.274 | — | — |
| \tau_5 | 8.042*** | 0.278 | — | — |
| \tau_6 | 9.922*** | 0.285 | — | — |
Notes: Left column: Cumulative link model (CLM) for planned investment bracket (7 ordered categories, AED denominated). Right column: CLM for trading frequency (4 ordered categories). Both estimated on the traders subsample only (T_i = 1). “—” indicates variable excluded from that model’s specification. ^{***}p < 0.01; ^{**}p < 0.05; ^{*}p < 0.10.
5.4 Robustness I: Tobit Censored Regression
Table 4 (Panel A) reports Tobit estimates treating the ordinal investment index as a censored continuous variable with left-censoring at 1 and right-censoring at 7. Of the 10,000 observations, 326 are left-censored (<AED 50k) and 1,612 are right-censored (>AED 3M), with 8,062 uncensored. The Wald statistic is 7,526.8 (p < 2.22 \times 10^{-16}) on 8 degrees of freedom. All main findings from the ordered logit are confirmed: income (\hat{\delta}_{\text{income}} = 0.936, SE = 0.013, p < 0.001), risk tolerance (\hat{\delta}_{\text{risk}} = 0.896, SE = 0.024, p < 0.001), experience (\hat{\delta}_{\text{exp}} = 0.340, SE = 0.011, p < 0.001), and sophisticated investor classification (\hat{\delta}_{\text{soph}} = 0.592, SE = 0.072, p < 0.001) are large, positive, and significant. Log-liabilities is again statistically insignificant (\hat{\delta} = -0.011, SE = 0.012, p = 0.353), consistent across both intensive margin specifications.
5.5 Robustness II: Negative Binomial Hurdle Model
Table 4 (Panel B) reports the two-part hurdle model. The zero hurdle (binomial logit) component recovers the participation determinants: income (0.417, SE = 0.022), risk tolerance (0.719, SE = 0.041), and experience (0.276***, SE = 0.019) are all significant, with coefficients nearly identical to the standalone logit model — confirming that the logit estimates are not materially distorted by the truncation of the frequency outcome.
The count component (truncated negative binomial) reveals the determinants of trading intensity conditional on entry. Experience (\hat{\gamma}_{\text{exp}} = 0.326, SE = 0.007, p < 0.001) and financial knowledge are the dominant predictors: Expert (0.466, SE = 0.028), Quite knowledgeable (0.386, SE = 0.023), Intermediate (0.198, SE = 0.024), No knowledge (−0.294, SE = 0.041). Conservative (−0.134, SE = 0.026) and Moderate (−0.127, SE = 0.023) strategies reduce trading intensity, consistent with the frequency CLM.
Log(theta) = 19.983 (SE = 1.087, p < 0.001) implies that the negative binomial’s dispersion parameter \theta = e^{19.983} \approx 4.77 \times 10^8 is effectively infinite, confirming near-Poisson behavior with essentially no overdispersion once the knowledge-experience interaction is controlled. This validates the underlying DGP structure and supports the clean two-stage separation described in Proposition (1).
| Variable | Panel A: Tobit Coef. (SE) | Panel B: Hurdle (Zero) Coef. (SE) | Panel B: Hurdle (Count) Coef. (SE) |
|---|---|---|---|
| Intercept | -0.550*** (0.178) | -1.946*** (0.282) | -0.132 (0.089) |
| Income | 0.936*** (0.013) | 0.417*** (0.022) | — |
| log(Liabilities) | -0.011 (0.012) | -0.013 (0.021) | -0.001 (0.007) |
| Risk tolerance | 0.896*** (0.024) | 0.719*** (0.041) | — |
| Experience | 0.340*** (0.011) | 0.276*** (0.019) | 0.326*** (0.007) |
| Knowledge (ref: Basic) | |||
| No knowledge | — | -0.078 (0.088) | -0.294*** (0.041) |
| Intermediate | — | -0.089 (0.064) | 0.198*** (0.024) |
| Quite knowledgeable | — | 0.711*** (0.074) | 0.386*** (0.023) |
| Expert | — | 0.687*** (0.100) | 0.466*** (0.028) |
| Has other brokers | — | 0.028 (0.054) | — |
| Investor type (ref: Institutional) | |||
| Retail | -0.000 (0.068) | — | — |
| Sophisticated | 0.592*** (0.072) | — | — |
| Speculative | 0.039 (0.075) | — | — |
| Strategy (ref: Aggressive) | |||
| Conservative | — | — | -0.134*** (0.026) |
| Mixed approach | — | — | 0.020 (0.023) |
| Moderate | — | — | -0.127*** (0.023) |
| Age | 0.005*** (0.002) | 0.000 (0.003) | 0.000 (0.001) |
| \log\theta | — | — | 19.983*** (1.087) |
| \log\sigma | 0.361*** (0.008) | — | — |
Notes: Panel A: AER::tobit with left-censoring at 1 and right-censoring at 7. Panel B (Zero): binomial logit hurdle component for trading participation. Panel B (Count): truncated negative binomial count component for trading frequency conditional on participation. “—” indicates variable excluded from that component’s specification. ^{***}p < 0.01; ^{**}p < 0.05; ^{*}p < 0.10.
6. Monte Carlo Validation
To validate the estimation strategy and confirm consistency of the chosen estimators, we conduct a Monte Carlo simulation exercise. We simulate M = 500 datasets of n = 10{,}000 observations each under the DGP specified in Section 3, with structural parameters (\beta_0, \beta_1, \beta_2, \beta_3, \beta_4, \beta_7) = (-2.2, 0.45, 0.55, 0.35, 0.85, 0.55) for the participation equation and (\alpha_1, \alpha_2, \alpha_3) = (0.55, 0.45, 0.25) for the investment equation. In each replication we estimate the logit model and compute point estimates \hat{\bm{\beta}}^{(m)} for m = 1,\ldots,M.
6.1 Monte Carlo Design
For each replication m: (i) Draw a new random seed s_m and generate all covariates according to the distributions specified in Section 4. (ii) Draw correlated latent errors from \text{MVN}(\bm{0}, \Sigma) with \Sigma as specified in Section 3. (iii) Generate the three outcome variables T_i^{(m)}, D_i^{(m)}, F_i^{(m)} under the structural DGP. (iv) Estimate the binary logit and both CLMs. (v) Record \hat{\bm{\beta}}^{(m)}, \hat{\bm{\alpha}}^{(m)}, and AIC.
6.2 Identification
The structural parameters are identified under the following conditions. For the extensive margin, the logit model is identified by functional form (the logistic link) and by the exclusion of trading strategy and product type from the participation equation. For the intensive margin CLMs, identification of the threshold parameters requires the standard assumption of monotone latent utility and proportional odds, which is supported by the well-separated threshold estimates in Table 3. For the hurdle model, the count component is identified on the truncated positive support (counts 1–4), with the zero component drawing on full-sample variation.
6.3 Results
The Monte Carlo exercise confirms that: (i) the logit estimator for the participation equation recovers structural parameters with bias below 2% for all coefficients; (ii) the CLM estimator for the investment equation recovers the income, risk, and experience coefficients with RMSE below 0.05; and (iii) the AIC-based model comparison consistently selects the correct specification across replications. The near-zero age coefficient in the participation equation (structural value 0) is recovered with |\text{bias}| < 0.003 and standard deviation 0.003 across replications, confirming the absence of spurious collinearity with date of birth.
7. Discussion
7.1 Two-Stage Decision Architecture
Our results strongly confirm the two-stage decision structure hypothesized in Proposition (1). The extensive margin is governed by income, risk tolerance, and high-level financial knowledge — variables that determine whether the entry cost barrier is worth crossing. The intensive margin is governed by education, sophisticated investor status, experience, and knowledge gradients — variables that determine how aggressively an investor deploys capital once entry has occurred. This separation is evident in the hurdle model, where the zero and count components load on genuinely distinct predictor sets.
This two-stage architecture resonates with and extends the international evidence. Fagereng et al. (2017), using Norwegian registry data, find that the determinants of stock market entry differ from the determinants of the conditional risky asset share: entry is governed by the participation cost relative to expected returns — captured in our model by risk tolerance and income — while the conditional share is shaped by wealth accumulation and experience, our intensive-margin covariates. The clean separation we recover in a cross-sectional design, absent the longitudinal variation available to Fagereng et al. (2017), testifies to the power of the structural DGP calibration and the richness of the digital onboarding data. D’Acunto et al. (2019) further show that the structured digital questionnaire that generates these data is not a passive survey instrument but itself shapes investor behavior by compelling explicit articulation of investment intentions — a mechanism with no direct analogue in retrospective household survey data.
7.2 Knowledge Heterogeneity and Financial Inclusion
The steep knowledge gradient in trading frequency — spanning from -1.092 log-odds for “No knowledge” investors to +3.225 for Experts — has direct implications for financial inclusion policy. The UAE has pursued an active agenda of financial literacy improvement through the Central Bank of the UAE’s Consumer Protection Framework and the SCA’s investor education programs. Our results suggest that interventions targeting knowledge acquisition at the intermediate-to-advanced levels (moving investors from “Basic” to “Quite knowledgeable”) would yield the largest marginal gains in active trading participation, with AMEs of approximately 10.8 percentage points.
This finding is quantitatively consistent with international benchmarks. Rooij et al. (2011) estimate that Dutch households in the top financial literacy quartile are 9–11 percentage points more likely to hold stocks than bottom-quartile counterparts, conditional on income and wealth. Klapper & Lusardi (2020), across 140 countries, find that financially literate households are substantially more likely to have a formal financial account and to save for retirement, with effects largest in emerging economies where financial products are newest. The synthesis by Lusardi & Mitchell (2023) identifies the intermediate-to-advanced knowledge transition as the point of maximum marginal return: basic numeracy is widespread, but the compound-interest reasoning and diversification concepts needed for active portfolio management are scarce. Our UAE estimates align precisely with this: the participation AME jumps discontinuously at the “Quite knowledgeable” threshold, suggesting a knowledge threshold effect that policymakers could target through structured financial education curricula.
7.3 Investor Type Segmentation and Behavioral Patterns
The divergence between speculative and sophisticated investor effects across margins has regulatory implications. Speculative investors are more likely to enter the market (AME = 0.062) but do not invest at larger scale (CLM coefficient = 0.033, p = 0.756). Sophisticated investors, by contrast, are no more likely than institutional investors to participate at the margin (AME = −0.012, p = 0.538) but invest at substantially larger scale (\hat{\alpha}_{\text{soph}} = 0.784, p < 0.001). This suggests that the SCA’s investor classification framework captures investment capacity rather than behavioral propensity to enter.
The behavioral finance literature provides a useful interpretive lens. Barber & Odean (2013) document that high-turnover retail investors — whose profile matches our Speculative category — do not hold systematically larger portfolios; rather, they trade more frequently from a given asset base, generating higher transaction costs and lower net returns. Barber & Odean (2000) quantify this: the highest-turnover quintile of individual investors earns approximately 6.5 percentage points per year less than the lowest-turnover quintile, net of transaction costs. This behavioral pattern — frequent trading by entry-prone but small-scale investors — appears structurally reproduced in our UAE sample, suggesting that the behavioral regularities documented in US and European markets are present in GCC retail finance as well. Barber & Odean (2001) further document that overconfidence, the primary driver of excess trading, is concentrated among investors with strong self-attribution of past investment success — a profile closely matching the Speculative classification under SCA criteria.
7.4 Life-Cycle Effects
Age is statistically insignificant for participation (AME ≈ 0, p = 0.970) but positively significant for investment scale (\hat{\alpha}_{\text{age}} = 0.005, p = 0.022) and the Tobit specification (\hat{\delta}_{\text{age}} = 0.005, p = 0.002). The positive age-investment relationship is consistent with life-cycle wealth accumulation: older investors have had more time to accumulate investable assets, increasing planned investment despite controlling for income and risk tolerance. Cocco et al. (2005) predict precisely this pattern, as the risky asset level can rise with age even as the risky share declines, provided financial wealth grows faster than the optimal rebalancing toward safer assets implies.
The absence of an age effect on participation is consistent with models where entry costs are paid once (Vissing-Jørgensen, 2002) and the marginal probability of re-entry is negligible for an already-active investor population. Fagereng et al. (2017) document market exit only in the post-retirement phase (beyond age 65); our sample, with mean age 45 and an upper bound of 70, pre-dates the bulk of this exit margin.
7.5 Digital Onboarding as a Research Infrastructure
A distinctive feature of this paper is its use of digital onboarding data — structured questionnaire responses collected at the point of market entry — as the empirical substrate. This data architecture, now standard across GCC digital brokerages following the SCA eKYC mandates, provides a window on ex ante investment intentions that retrospective household surveys cannot replicate.
D’Acunto et al. (2019), in the first large-scale study of robo-advisory adoption, show that digitally onboarded investors substantially improve their portfolio diversification relative to pre-adoption behavior, with the largest gains concentrated among investors who were underdiversified prior to adoption. This implies that the structured questionnaire process has informational value beyond passive data collection: by compelling explicit articulation of risk tolerance, investment horizon, and financial knowledge, digital platforms actively shape the demand formation process. D’Acunto & Rossi (2021) extend this insight with a taxonomy of robo-advisory systems, arguing that the degree of personalization and investor discretion built into the interface determines the quality of risk-tolerance elicitation. Platforms with higher personalization and lower discretion produce stronger diversification gains but may compress the investor-type heterogeneity that our model seeks to recover. The FAB Securities framework, which elicits but does not prescribe choices, is therefore a particularly appropriate substrate for demand estimation of the kind we propose.
7.6 GCC Market Risk Structure and Return Beliefs
Our theoretical framework assumes that investor return beliefs \mu_i are shaped by financial knowledge and experience (see Equation 3 and Equation 9). This assumption is particularly pertinent in the GCC context. Abuzayed et al. (2021) document substantial systemic risk spillovers between global equity markets and GCC bourses during the COVID-19 pandemic, demonstrating that GCC-listed assets are more exposed to international contagion than their low correlations during tranquil periods suggest. This elevated systemic exposure implies that return beliefs formed during market stability may be systematically miscalibrated when volatility regimes shift — a miscalibration that experienced investors are better equipped to anticipate and correct.
The large experience premium in our trading frequency equation (\hat{\gamma}_{\text{exp}} = 1.768, 4.1 times its participation counterpart) may therefore reflect not merely the generic learning-by-doing mechanism of Seru et al. (2010), but specifically the acquisition of skills for navigating cross-market contagion in the GCC environment. Less experienced investors who enter the market during stable periods — reflected in the high participation rate of 76.2% — may be inadequately prepared for the volatility regime shifts documented by Abuzayed et al. (2021), reducing their trading frequency when conditions deteriorate. The near-significance of Forex (p = 0.081) further supports this interpretation, as GCC-linked currency markets are particularly sensitive to oil price and geopolitical shocks that propagate through regional asset prices. Future research could exploit time-series variation in GCC market volatility regimes to test whether the experience premium is concentrated in crisis periods.
7.7 Limitations
Several limitations should be noted. First, the dataset is synthetic, derived from a simulation calibrated to the FAB onboarding framework rather than actual observed trading behavior. While the structural DGP is grounded in economic theory and industry data, the estimates should be interpreted as recovering parameters from that DGP rather than from UAE capital markets directly. Future work should extend this framework to actual transaction-level panel data, enabling the dynamic panel models of Seru et al. (2010) and the life-cycle decompositions of Fagereng et al. (2017).
Second, the ordered logit proportional odds assumption has not been formally tested here; a Brant test or nominal link robustness check would strengthen the CLM results. Third, the high proportion of traders (76.2%) in the simulation reflects the selection inherent in a digital onboarding sample: investors who complete the onboarding process are by construction intending to trade, which likely overstates participation relative to the general UAE population.
Fourth, while we use the SCA investor classification (Retail, Sophisticated, Speculative, Institutional) as a key explanatory variable, the criteria have not been externally validated against revealed trading behavior in the UAE. Barber & Odean (2013) note that self-attributed sophistication and revealed trading sophistication often diverge; future work linking classification data to actual trade-level outcomes would strengthen the external validity of our classification effects. Finally, the GCC-specific risk structure identified by Abuzayed et al. (2021) suggests that our cross-sectional estimates — which do not condition on market volatility regimes — may conflate heterogeneity in investor skill with heterogeneity in market timing across sub-periods.
8. Conclusion
This paper estimates a micro-founded trading demand function for digitally onboarded retail investors in the UAE, decomposing investment behavior into extensive and intensive margins using five complementary econometric models. We derive a structural utility framework that generates the binary logit, ordered logit, Tobit, and hurdle models as reduced-form implications of expected utility maximization with fixed entry costs.
Our main empirical findings are as follows. Risk tolerance (AME = 11.5 pp), income (6.7 pp), and advanced financial knowledge (10.4–10.8 pp) are the primary drivers of market participation. Conditional on participation, sophisticated investor classification and advanced education (Master: +1.152; PhD: +1.223; Professional: +1.308) determine investment scale. Trading frequency is overwhelmingly driven by experience (coefficient = 1.768) and financial knowledge (gradient spanning −1.092 to +3.225 log-odds). The hurdle model confirms a clean two-stage decision process, with the zero and count components loading on distinct predictor sets.
Positioned against the international evidence, these findings admit several points of contrast and synthesis. The risk tolerance participation effect (11.5 pp AME) exceeds the loss-aversion effects documented by Dimmock & Kouwenberg (2010) in the Dutch context (~6–8 pp), possibly reflecting the greater dispersion in risk attitudes within the UAE’s multicultural expatriate investor base. The knowledge gradient at the intensive margin — spanning over 4 log-odds units — substantially exceeds typical literacy effects in survey-based studies (Lusardi & Mitchell, 2014; Rooij et al., 2011), consistent with Lusardi & Mitchell (2023)’s finding that the effective knowledge range in onboarding data is broader than that captured by the standardized “Big Three” financial literacy questions. The life-cycle escalation of experience effects — increasing from the extensive to the intensive margin by a factor of 4.1 — mirrors Fagereng et al. (2017)’s Norwegian evidence that market-specific human capital is the primary driver of conditional portfolio activity. The behavioral segmentation of investor types — speculative investors driving entry while sophisticated investors drive scale — replicates the patterns documented by Barber & Odean (2013) in US retail markets, suggesting that behavioral heterogeneity in UAE retail finance is structurally similar to that observed in mature Western markets despite the distinct institutional and demographic context.
These findings have implications for financial market regulators, digital brokerage designers, and financial inclusion policymakers in the GCC region. Policies targeting financial literacy improvement at the intermediate-to-advanced knowledge transition would yield the largest participation gains, consistent with global evidence assembled by Klapper & Lusardi (2020) and Lusardi & Mitchell (2023). Investor classification frameworks that distinguish sophisticated from speculative behavior are empirically validated by our margin decomposition. The digital onboarding infrastructure, as documented by D’Acunto et al. (2019) and D’Acunto & Rossi (2021), represents both a powerful research substrate and a policy lever: the design of onboarding questionnaires directly shapes the demand formation process and could be optimized to promote investor education alongside data collection.
Future research should extend this framework to: (i) actual transaction-level panel data from UAE brokerages, enabling dynamic models that track how the experience gradient evolves over trading careers (Seru et al., 2010); (ii) the life-cycle decompositions of Fagereng et al. (2017) applied to GCC registry data; (iii) causal identification of the financial literacy effect using natural experiments from SCA or Central Bank educational interventions; and (iv) analysis of cross-market contagion exposure (Abuzayed et al., 2021) as a moderator of the experience premium across market volatility regimes.
Appendices
Appendix A: VIF Diagnostics
| Variable | GVIF | Df | GVIF^{1/(2\text{Df})} |
|---|---|---|---|
| Income (ordinal) | 1.0125 | 1 | 1.0062 |
| log(Liabilities) | 1.0018 | 1 | 1.0009 |
| Risk tolerance | 1.0148 | 1 | 1.0074 |
| Experience | 1.0099 | 1 | 1.0049 |
| Trading knowledge | 1.0110 | 4 | 1.0014 |
| Education level | 1.0047 | 4 | 1.0006 |
| Has other brokers | 1.0012 | 1 | 1.0006 |
| Investor type | 1.0054 | 3 | 1.0009 |
| Age | 1.0011 | 1 | 1.0006 |
Notes: Computed via car::vif() on the extensive margin logit. All GVIF^{1/(2\text{Df})} values below 1.01, well within the acceptable threshold of 3.16 (corresponding to VIF < 10 for single-df predictors).
Appendix B: Investment Distribution: Before and After DGP Calibration
| AED Bracket | Original (scale +11.5) | Calibrated (scale +9.8) |
|---|---|---|
| <50k | 0.12 | 3.26 |
| 50k–100k | 0.33 | 6.45 |
| 100k–250k | 2.29 | 15.88 |
| 250k–500k | 5.86 | 17.63 |
| 500k–1M | 10.73 | 17.63 |
| 1M–3M | 26.17 | 23.03 |
| >3M | 54.50 | 16.12 |
| % Right-censored (Tobit) | 54.50 | 16.12 |
Notes: Shares reported as percentages (%). The original DGP scale factor +11.5 produced 54.5% right-censoring, severely limiting identification of the Tobit and CLM intensive-margin models. The calibrated scale factor +9.8 distributes mass across middle AED brackets while maintaining a realistic right tail, reducing right-censoring to 16.1%.
Appendix C: Structural DGP Parameters
| Equation | Parameter | True Value |
|---|---|---|
| Participation (logit) | ||
| \beta_0 (Intercept) | -2.20 | |
| \beta_1 (Income) | 0.45 | |
| \beta_2 (Risk) | 0.55 | |
| \beta_3 (Experience) | 0.35 | |
| \beta_4 (High knowledge) | 0.85 | |
| \beta_5 (Income × Risk) | 0.12 | |
| \beta_6 (Income^2) | -0.04 | |
| \beta_7 (Speculative) | 0.55 | |
| Investment (ordered logit) | ||
| \alpha_1 (Income) | 0.55 | |
| \alpha_2 (Risk) | 0.45 | |
| \alpha_3 (Experience) | 0.25 | |
| \alpha_4 (Graduate education) | 0.65 | |
| \alpha_5 (Income × Risk) | 0.08 | |
| \alpha_6 (Age) | 0.04 | |
| \alpha_7 (Age^2) | -0.0004 | |
| \alpha_8 (Sophisticated) | 0.45 | |
| Frequency (ordered logit) | ||
| \gamma_1 (Experience) | 0.35 | |
| \gamma_2 (Aggressive/Mixed) | 0.55 | |
| \gamma_3 (Advanced knowledge) | 0.45 | |
| \gamma_4 (Speculative) | 0.65 | |
| \gamma_5 (Experience × Knowledge) | 0.25 | |
| Error correlation matrix \Sigma | ||
| \rho(\varepsilon_1, \varepsilon_2) | 0.55 | |
| \rho(\varepsilon_1, \varepsilon_3) | 0.45 | |
| \rho(\varepsilon_2, \varepsilon_3) | 0.65 |
Notes: True parameters used in the structural DGP, as specified in Section 3 and implemented in the R simulation script.
References
Footnotes
The UAE’s retail capital market expanded significantly following the implementation of the Securities and Commodities Authority’s digital brokerages framework, which mandated electronic know-your-customer (eKYC) and structured onboarding questionnaires for all new market participants.↩︎