← Back to Dashboard

Methodology & Definitions

Data sources, calculations, dashboards, and limitations across the REPI analytics suite. This document is the reference for any figure or indicator rendered on a REPI dashboard.

REPI v2.20 Last updated Apr 12 2026
01

Data Sources

eFront Holdings Extract

The primary portfolio registry. A CSV snapshot (currently REPI eFront 04_03_26.csv) containing ~550 properties with classifications, ownership, advisors, JV partners, acquisition prices, regions, metros, and fiscal year ends.

Field CategoryExamples
Identity & Classification Property, Short Name, Custom Classification 1/2 (asset & sub-type)
Geography NCREIF Region, NCREIF Sub Region, Metro Area, State, City, ZIP
Ownership & Partnership SBAF Legal Ownership %, JV Partner, Title Holding Entity, Company (advisor)
Dates Acquisition Date, Disposal Date, JV Buyout Date, Investment Year End
Valuation Acquisition Price, Disposal Price (when realized)
Physical Total GLA, OFC/RTL/APT/LAB GLA, # Units, # Buildings, Year Built, Year Last Renovation

SitusAMC Quarterly Appraisal Database

Source for forward-looking valuation mechanics. Used by the Monte Carlo volatility calibration and DCF assumptions. Provides property-level appraisals (value, rent, occupancy, cap rates, terminal & discount rates) at quarterly cadence.

Data cleaning: Dollar values stripped of formatting. Percentages converted to decimals. Rows filtered to live assets. Invalid sentinels (-, #N/A, #DIV/0!) treated as missing.

Loan Maturities Extract

A per-property loan file (REPI_Loan_Maturities/eFront_Loan_Maturities.csv) powering debt analytics. Columns: Current Leverage Balance (SBA level), Outstanding Loan Balance (total), Maturity Date, Lender Type, Loan Type, Life Cycle, Risk Profile, SBA RE Action.

MTF Excel (Most-Recent Valuation)

Property-level NAV and Life Cycle data. Loaded dynamically by pattern match in the REPI_Dashboard directory. Feeds into PRISM scenario analysis at the SBA Investment Level.

Weekly Operating Workbooks

Per-property Excel workbooks (approximately 32 files) containing weekly-cadence operating metrics: occupancy, availability, leased %, market and effective rent, rent-per-SF, vacant units. Drives the Operations dashboard's weekly view.

Income Statements

Quarterly income-statement Excel files, organized by period folder (1Q25, 4Q25, etc.). Actual vs. budget for 17+ properties; powers the Financial Performance section of the property report and the Variance Report.

Bitcoin Monthly Prices

btc_monthly.csv — monthly close prices from September 2014 onwards. Used by the Monte Carlo engine when the BTC Sleeve toggle is enabled. Yahoo refresh runs best-effort on startup; the committed CSV is the fallback.

Geographic Mapping

NCREIF four-region taxonomy (East, West, South, Midwest). A geocache.csv maps property names to pre-computed lat/lon; fallback logic uses metro center then state center then US center when a property is absent from the cache.

02

Portfolio Operations

What It Shows

Property-level KPIs, variance reporting, Google Reviews, Monte Carlo projections, investment-story narratives, and the property report across the multifamily portfolio. Weekly workbooks + quarterly income statements feed every view.

Property Report Header

The institutional-style blue header on each property report carries: property name, location, address, year built, and the asset's Fiscal Year End — pulled from the eFront Investment Year End column for 540+ properties. Fuzzy name matching reconciles Weekly ↔ eFront title differences.

Key Metrics

MetricCalculation
Total Portfolio Value Sum of all live-asset valuations at latest quarter
YoY Portfolio Appreciation (Total Value Qt / Total Value Qt−4) − 1
By-Type Aggregates Group by property type → count, total value, avg implied cap rate, avg terminal rate, avg discount rate, avg YoY value change
By-Region Aggregates Group by NCREIF region → count, total value, avg implied cap rate
Estimated NOI Value × Implied Cap Rate
NOI Margin Actual NOI / Actual Revenue (from income statements)
Budget Variance (Actual − Budget) / Budget × 100
03

Historical Performance

What It Shows

Trailing 1Y / 3Y / 5Y / 10Y composite returns, excess returns measured in basis points, and strategy attribution versus institutional benchmarks. Data source: TTG_Data_Historical.xlsx.

How to Interpret

Composite returns reflect time-weighted performance over each trailing window. Excess returns (bps) show out/underperformance vs. the selected benchmark. Strategy attribution decomposes returns into allocation and selection effects.

Note: Returns are calculated from appraisal-based valuations, which inherently smooth short-term volatility compared to transaction-based indices. See §08 Volatility Calibration for how this is addressed in forward-looking models.

04

Portfolio Holdings & PRISM

Holdings Registry

A 550-property registry from the eFront extract: geographic concentration mapping, advisor and JV partner breakdowns, loan maturity wall, and benchmark positioning against NCREIF NPI / ODCE.

Benchmarks

BenchmarkDescription
NCREIF NPI National Council of Real Estate Investment Fiduciaries Property Index. Quarterly, unleveraged total-return index of institutional-grade commercial real estate.
NCREIF ODCE Open-End Diversified Core Equity. Fund-level benchmark that includes leverage and fees; a closer proxy to realized investor experience.

PRISM — Portfolio Risk & Investment Scenario Model

An interactive transaction-approval tool. The user enters the attributes of a proposed acquisition — property, advisor, JV partner, ownership %, acquisition price, equity, leverage, NAV, risk profile, life cycle, investment structure — and PRISM shows the before / after impact on every Investment Portfolio Guideline.

Data Unification

PRISM merges four sources at the property level:

SourceProvides
eFront CSV Property attributes, classifications, acquisition prices, ownership
Loan Maturities CSV SBA leverage balance, total loan balance, life cycle, risk profile
MTF Excel (NAV) Current property-level Net Asset Value (SBA investment level)
PRISM Adjustments Excel Pipeline adjustments: Closed Pipeline NAV, Pending Pipeline NAV, Unfunded Commitment, Unfunded Construction Loan Draws, Unallocated Commitment, GAV

Property name matching across sources uses a Jaccard token-similarity fuzzy merge so "Carlyle Crossing - Dylan" in the weekly workbook reconciles to the same entity in eFront.

IPG Compliance Matrix (11 checks)

Every proposed transaction is scored against all 11 Investment Portfolio Guidelines. Each check returns a pass / fail badge on the PRISM report.

#GuidelineRule
1Development & Lease-Up Exposure(Dev + Initial Leasing + Redevelopment) Adj NAV / Total ≤ 15%
2Investment Advisor ConcentrationMax advisor Adj NAV / Total RE NAV ≤ 35%
3JV CapitalizationAny JV partner total capitalization / Gross NAV ≤ 10%
4JV LeverageIndividual JV LTV ≤ 70% (case-by-case)
5Metro AreaMax metro Adj NAV / Total ≤ 15%
6Portfolio-Level LeverageTotal LEV / Total GAV ≤ 35%
7Property TypeEach type within ±15pp of ODCE weight
8RegionEach region within ±15pp of ODCE weight
9Risk ProfileCore 85–100%, Non-Core 0–15%
10Single Investment ConcentrationMax single investment NAV / Total ≤ 7%
11Wholly Owned Property LeverageIndividual wholly-owned LTV ≤ 50%

PRISM Dimension Tables

Before / after tables are computed for nine dimensions:

Property Type · Region · Sub-Type · Risk Profile · Life Cycle · Investment Structure · Investment Advisor · JV Partner · Metro Area

Each row carries the full PRISM column set:

Adjusted NAV Before NAV + Closed Pipeline + Pending Pipeline + Unfunded + Construction Loans + Unallocated
Adjusted NAV After Adjusted NAV Before + Proposed Transaction NAV
ODCE Comparison % Adj NAV After − ODCE Weight (shown for Property Type & Region)
05

Data Validation

What It Does

A standalone dashboard giving the team three tools for building confidence in the eFront holdings extract: a random-question quiz, a rule violation scanner, and an outlier surface. Results persist in SQLite so accuracy trends can be reviewed over time.

Quiz Engine

Eleven templates draw random questions from the 550-property data. Each session is attributed to a display name; answers are graded in real time and persisted with the correct answer for audit.

TemplateExample
Count by advisorHow many properties does Heitman manage?
Count by asset × regionHow many Industrial properties are in the West?
Sum acquisition by assetTotal acquisition price of all Retail properties?
Sum acquisition by stateTotal acquisition price in Texas?
Lookup metro / advisorWhich metro is Astoria in? Which advisor runs it?
Lookup ownership %What is SBAF's ownership in Carlyle Crossing - Dylan?
Lookup JV partnerWho is the JV partner on Bozzuto I - Enders Place?
Lookup year acquired / builtWhat year was South Beach Marina acquired?
Boolean: wholly owned?Is Dimension wholly owned?

Grading tolerance: exact match for strings (case-insensitive, trimmed); ±1 percentage point for ownership; ±1% or ±$100K (whichever is larger) for sums; exact for counts and years.

Rule Violations (10 rules)

IDRule
R1SBAF Legal Ownership % between 0 and 1.0
R2Disposal Date ≥ Acquisition Date
R3Status consistent with presence/absence of Disposal Date
R4Year Built within [1800, current year]
R5Year Last Renovation ≥ Year Built and ≤ current year
R6Total GLA ≥ max component GLA (within 5% tolerance)
R7Required fields populated (Property, Asset Type, Region, Metro)
R8JV Partner ↔ Ownership % consistency
R9Disposal Date and Disposal Price present together
R10Completeness warnings — missing optional data (Acq Date, Year Built, Metro)

Outlier Detection (8 heuristics)

IDCheck
O1Acquisition price > 2× P95 or < P5/2 within its asset type
O2SBAF ownership < 50% (portfolio is mostly 95–100%)
O3Pre-1950 construction (vintage-risk flag)
O4Zero Total GLA on a non-Agriculture property
O5Acquisition Date in the future
O6Active holding period > 15 years
O7Disposed without a Disposal Price
O8JV Partner present but ownership = 100%
07

Snowflake Migration Proposal

Scope

A 9-month implementation proposal for lifting the REPI-ONE stack onto Snowflake, unifying advisor chart-of-accounts, and standing up a sandboxed development environment for continued feature work. This is a planning document — no live Snowflake connection exists yet.

Phased Timeline

PhaseMonthsDeliverables
01 · Foundation1–3 Mac Studio + DGX Spark fleet; Snowflake tenant (Dev+Prod); VLAN design; VPN access; schema drafts
02 · Migration4–6 Port eFront, MTF, Income Statements, SitusAMC into Snowflake; chart-of-accounts crosswalk; shadow reconciliation
03 · Cutover & Governance7–9 Production cutover; RBAC; audit logging; runbook; decommission legacy pipelines

Infrastructure Requirements

  • 3–4 Mac Studio M3 Pros with max RAM — development cluster.
  • 1 NVIDIA DGX Spark — ML/AI coordinator for document summarization, anomaly detection, forecasting.
  • Snowflake tenant with separate Dev and Prod accounts and promotion gates.
  • VLAN isolation — dev fleet sandboxed; egress to Snowflake only.
  • VPN-gated access — role-scoped, easily revocable per team member.

Supporting Documents

The proposal page hosts seven document categories with the same drag-and-drop + SQLite indexing pattern as the Legal Library: Migration Plan · Chart of Accounts · Hardware Specifications · Network & Security · Accounting DB Schema · Cost Estimate · Risk Register.

08

Volatility Calibration

The Problem: Appraisal Smoothing

Real estate appraisals lag actual market movements. Observed quarter-to-quarter value changes understate true economic volatility because appraisers anchor to prior valuations. Raw portfolio statistics will therefore look less volatile than they truly are.

Geltner (1993) Unsmoothing

The Geltner unsmoothing correction — an institutional standard — backs out the appraisal lag to estimate true transaction-based volatility.

Geltner Inflation Factor factor = 1 / (1 − α) = 1 / (1 − 0.40) = 1.667
Calibrated NOI Volatility σNOI = min(raw_vol × 1.667, 0.40)
Calibrated Cap Rate Volatility σcap = raw_volcap × 1.667
ParameterValueSource
Smoothing parameter (α)0.40NCREIF / NPI core real estate standard
Raw NOI volatilityIQR-winsorized std of YoY value changesSitus quarterly
Raw cap rate volatilityStd of implied cap rate changesSitus quarterly
Small-sample fallback8% NOI vol, 70 bps cap volApplied when calibration sample < 3 assets

Outlier handling: Before computing raw volatilities, values are IQR-winsorized at ±3×IQR and capped at 25% to prevent single extreme appraisals from dominating the calibration.

09

Monte Carlo — NAV Projection

What It Projects

Per-property forward NAV projection at monthly cadence when the BTC Sleeve is on, quarterly otherwise. The engine anchors at the most recent non-null SBA NAV and emits a full distribution of forward paths — fan chart percentiles, a terminal histogram, and ten sample paths for visual interpretation.

Four Simulation Methods

MethodModelBest For
Historical Bootstrap Resamples observed quarterly returns with replacement Preserving empirical tails; no distributional assumption
Log-Normal GBM Fits log-normal diffusion (µ, σ) to historical log-returns Smooth parametric fit; easy to explain; thin-tailed
Regime-Switching 3-state Markov chain (Growth / Stable / Decline) with tercile-based labels and Laplace-smoothed transitions Modeling sticky regimes; macro-cycle aware
Bitcoin Power Law BTC follows Santostasi's log-log power law with mean-reverting shocks; property returns bootstrapped BTC Sleeve mode; long-horizon trend-aware BTC modeling

Controls

InputRangeNotes
PropertyAny apartment in the portfolioMust have ≥ 7 quarterly NAV observations
Simulation MethodBootstrap / GBM / Regime / Power LawPower Law requires BTC Sleeve on
Horizon4 – 40 quarters1 to 10 years forward
Paths1,000 – 50,000Default 10,000 — stable percentile estimates
Fit WindowLast N quarters or "All"Downweights older regimes if shorter than max
BTC SleeveOff / On · 2.5 / 5 / 10%Joint property + BTC simulation with three allocation scenarios

BTC Sleeve — Joint Simulation

When the BTC Sleeve toggle is on, the engine runs a joint property + Bitcoin simulation at monthly cadence. For Bootstrap and Regime-Switching methods, correlation is preserved automatically because every drawn quarter index carries both its property return and its three paired BTC monthly returns from the same historical period.

Each run produces three portfolio scenarios side-by-side: 2.5%, 5%, and 10% BTC allocation at t₀ (no rebalance). Metrics displayed per scenario: Median Terminal, VaR-95, Expected Shortfall, P(Loss), Best/Worst Case.

Data alignment: BTC monthly data starts Sep 2014. Property quarters ending before that date are dropped. Below-6-quarter alignment returns a "no data" state; the dashboard shows an empty placeholder.

Bitcoin Power Law (Santostasi)

Giovanni Santostasi's thesis: long-run BTC price follows a power function of time since the Genesis block (Jan 3, 2009). The engine fits this model by log-log OLS on the committed BTC monthly history at service startup.

Power Law ln(price) = ln(A) + B · ln(days_since_genesis)
Stochastic Path Generation Pt = mean_price(datet) · exp(deviationt)
deviationt = φ · deviationt−1 + εt, εt ~ N(0, σresid)

The deviation AR(1) reverts toward the trend with a 24-month half-life (φ = 0.5^(1/24)). Deviations are clipped at ±3× the support/resistance offsets to prevent explosive excursions.

BandLog OffsetInterpretation
Fair-value (mean)0Long-run power-law trend line
Cycle-low support−0.6~45% below trend; catches most bear-market lows since 2013
Fair-value / cycle-top+0.7~100% above trend; catches cycle tops

Typical fit: exponent B ≈ 5.84 on the current btc_monthly.csv, consistent with Santostasi's published B ≈ 5.82. The UI shows the fitted exponent next to a log-price chart with all three bands overlaid.

Outputs

OutputDescription
Percentile BandsP10 / P25 / P50 / P75 / P90 at every time step
Fan ChartMonthly (BTC on) or quarterly (BTC off) projection envelope
Terminal HistogramDistribution at the final time step with median & VaR markers
Sample PathsTen randomly-drawn paths for visual texture
KPI StripMedian terminal, VaR-95, ES-95, P(Loss), annualized rate
Power Law Band Chart(Power Law method only) BTC price log chart with support / mean / resistance
10

Definitions

A working glossary of the metrics, methods, and benchmarks used across the REPI analytics suite — grouped by the question each one answers.

i.

Valuation & Yield

How income, leverage, and equity are translated into value.

Implied Cap Rate
NOI / Property Value

The market's implied yield based on current appraised value. Moves inversely with price.

Stabilized Cap Rate
Yield at steady-state occupancy

Strips out lease-up, renovation, and transition effects to reveal the asset's underlying yield.

Terminal (Exit) Rate
Sale Price = NOI10 / Exit Rate

Cap rate applied at hold-period end to estimate sale value. Higher exit rate = lower exit value.

Discount Rate
Hurdle rate · Cost of capital

Required return used to discount future cash flows. Reflects the investor's cost of capital.

Net Operating Income (NOI)
Revenue − OpEx

Gross rental income less operating expenses, before debt service and capex.

Loan-to-Value (LTV)
Total Leverage / Gross Asset Value

Standard leverage ratio. Calculated on an incremental basis in PRISM — i.e., after pipeline adjustments and the proposed transaction.

NAV (SBA Investment Level)
Current market value of SBA's equity

The headline value the LP realizes. Used as the anchor for Monte Carlo NAV projection and for PRISM compliance math.

Basis Points (bps)
1 bp = 0.01%

Standard unit for small yield or rate differences. 100 bps = one percentage point.

ii.

Statistical & Modeling Methods

How uncertainty is measured and simulated.

IQR Winsorization
Clip at ±3 × IQR

Limits the pull of a few extreme appraisals on a volatility estimate by capping values at three times the interquartile range.

Geltner Alpha (α)
α = 0.40 (NCREIF standard)

The share of each appraisal anchored to the prior period. Unsmoothed via 1 / (1 − α) to estimate transaction-based volatility.

Internal Rate of Return (IRR)
r where NPV(r) = 0

Discount rate that makes the net present value of all cash flows — including acquisition and terminal sale — equal to zero.

Monte Carlo Simulation
10,000 stochastic paths (default)

Rather than one forecast, thousands of randomized scenarios are run so outcomes can be expressed as a distribution and percentiles.

Historical Bootstrap
Resample with replacement

Non-parametric method: future returns drawn at random from the observed historical sample. Preserves empirical tails exactly.

Log-Normal GBM
dlog(P) ~ N(µ, σ) per step

Geometric Brownian Motion — a smooth parametric model. Fits historical µ and σ of log-returns. Thin-tailed relative to real CRE.

Regime-Switching (3-state Markov)
Growth / Stable / Decline

Labels quarters by tercile of return, estimates a Laplace-smoothed transition matrix, and bootstraps within the active regime.

Percentile Bands
P10 / P25 / P50 / P75 / P90

Per-time-step envelopes across the simulated path ensemble. P50 is the median outcome; P10–P90 brackets 80% of outcomes.

iii.

Benchmarks

The yardsticks the portfolio is measured against.

NCREIF NPI
Unleveraged · Property-level

National Council of Real Estate Investment Fiduciaries Property Index. Quarterly, unleveraged total-return index of institutional-grade U.S. commercial real estate.

NCREIF ODCE
Levered · Fund-level · Net of fees

Open-End Diversified Core Equity. Fund-level benchmark that includes leverage and management fees — a closer proxy for realized investor experience.

iv.

Governance & Compliance

The policy rails that every transaction must respect.

IPG — Investment Portfolio Guidelines
11 quantitative rules

The rulebook checked by PRISM on every proposed transaction. Covers concentration, leverage, risk profile, and benchmark-deviation limits.

±15pp ODCE Tolerance
|Allocation − ODCE Weight| ≤ 15pp

Each asset type and each region must sit within 15 percentage points of its NCREIF ODCE reference weight.

Risk Profile Mix
Core 85–100% · Non-Core 0–15%

Portfolio-level allocation limit between Core and Opportunistic / Non-Core strategies.

Development & Lease-Up Cap
(Dev + Initial Leasing + Redev) / Total ≤ 15%

Caps aggregate exposure to pre-stabilized assets. Stabilized Operating exposure is the residual.

Single Investment Concentration
Max single property NAV / Total ≤ 7%

No one asset can dominate the portfolio. Evaluated after pipeline adjustments.

JV Capitalization Cap
Any JV total cap / Gross NAV ≤ 10%

Limits total capitalization (equity + CFP) to any single JV partnership.

Advisor Concentration
Max advisor NAV / Total RE NAV ≤ 35%

Real Estate Allocation Policy 15-701. Caps exposure to any one investment advisor.

Florida Statutes Ch. 215
Financial Matters — General Provisions

The governing statute for SBA investment authority. Real estate authority lives at §215.47; IPS required by §215.475.

§215.4401 — SBA Public Records
Records retention & disclosure

Governs what SBA records may be withheld from public disclosure. Drives the Legal Library's audit-log posture.

v.

Bitcoin Power Law Model

Santostasi-style BTC trend modeling used in the Monte Carlo engine.

Santostasi Exponent (B)
B ≈ 5.82 (canonical)

The log-log slope of BTC price vs. days since Genesis. REPI fits B locally at startup; current fit ≈ 5.84.

Days Since Genesis
Genesis = 2009-01-03

The independent variable. The Bitcoin genesis block was mined January 3, 2009; day 1 anchors the power law.

Log-Log Fit
OLS on (ln days, ln price)

Fitted with numpy.polyfit (degree 1) on all valid observations. Residuals feed the log-normal shock standard deviation.

Residual Sigma (σresid)
std of ln(price) − ln(fit)

Empirical dispersion around the trend. Sets the scale of log-normal shocks applied in forward simulation; floor 0.15, cap 1.20.

Cycle-Low Support Band
Mean × exp(−0.6) ≈ −45%

Roughly 45% below the fair-value line in price space. Catches most bear-market lows since 2013.

Fair-Value / Cycle-Top Band
Mean × exp(+0.7) ≈ +100%

Roughly 2× the fair-value line in price space. Catches historical cycle tops.

Mean-Reversion Half-Life
24 months · φ = 0.51/24

AR(1) coefficient on log-deviation from trend. Every 24 months, a shock loses half its pull.

BTC Sleeve Allocations
2.5% · 5% · 10%

Three side-by-side scenarios rendered in BTC Sleeve mode. No rebalance — allocation applies at t₀.

11

Known Limitations

Data

  • Snapshot calibration. Volatility is typically calibrated from one quarterly cycle. This may under- or over-estimate long-run volatility.
  • Appraisal lag. Situs valuations lag actual transaction prices by 0–6 months. The Geltner correction mitigates but does not eliminate this.
  • Cross-sectional proxy for NOI volatility. Year-over-year value changes are used as a proxy for NOI volatility; raw NOI data is not in the Situs feed.
  • Small samples. When fewer than 3 assets match a property type or state filter, hardcoded fallback volatilities are used (8% NOI, 70 bps cap rate).

PRISM

  • Pipeline adjustments are manually maintained. The PRISM Adjustments Excel is the single source of truth; refresh cadence is quarterly.
  • ODCE weights are coded. Hardcoded at PDF values; refreshed manually when NCREIF publishes new quarter weights.
  • No debt waterfall modeling. Leverage is treated as a single aggregate per property; senior/mezz/preferred equity distinctions are not modeled.

Monte Carlo

  • Equity-only. No debt modeling, no waterfall allocations, no LP/GP splits.
  • Fixed discount rate. Constant across paths; real-world interest rates are themselves stochastic.
  • No CapEx. Simplified NOI stream; major renovation or capex events are not modeled.
  • Cap rate bounds. Exit cap rates hard-clamped to 3%–12%; extreme distress or bubble scenarios outside this range are not captured.
  • Power Law breaks BTC ↔ RE correlation. Under Power Law method, BTC paths follow the trend model while property returns are bootstrapped independently. The historical monthly linkage is not preserved.

Data Validation

  • Single data source. Rules and outliers run against the eFront extract only. Cross-source reconciliation (MTF vs. eFront vs. Situs) is deferred.
  • Static quiz templates. Eleven hardcoded question patterns. No temporal or change-detection questions yet.

Legal Library

  • No full-text search. Search operates on filename, description, property, and advisor — PDF content is not OCR-indexed.
  • Admin-gated delete via HTTP basic auth. Audit log captures the username, but basic auth is transport-layer; HTTPS is recommended before exposing beyond the internal team.

Snowflake Proposal

  • No live Snowflake connection. The tile hosts a planning document and document library. Code does not currently query Snowflake.
  • Hardware and cost figures are estimates. Subject to vendor quotes and AWS marketplace pricing at procurement.

Platform

  • Surface-level health check. Dashboard status is determined by HTTP HEAD probes with a 1.5-second timeout. "Live" confirms the server is responding, not that underlying data is correct or current.
  • Polling lag. Status polls every 8 seconds. Brief delay between an actual outage and the status indicator changing.
12

Code & Security

Source-code architecture and an internal review of code-level vulnerabilities. This section covers the Python application source only — network, transport-layer, and authentication posture are tracked separately.

Platform Codebase — At a Glance

The REPI platform is composed of seven Python / Flask applications plus a shared data directory, spread across roughly 70 Python source files and ~24,500 lines of application code. Each dashboard is a standalone service with its own Gunicorn process; Nginx reverse proxies route traffic by URL prefix.

Project Role Py Files LOC
REPI_ONE Hub · landing page · DCF API 6 1,034
REPI_Dashboard Operations · Monte Carlo · property reports 20 15,215
REPI_eFront_Dashboard Holdings registry · PRISM scenario analyzer 14 4,479
REPI_Data_Validation Quiz · rules · outliers 10 1,539
REPI_Legal 6-bucket document library · Ch. 215 reference 8 929
REPI_Snowflake Migration proposal · supporting docs 7 678
REPI_Historical_ReturnLong-horizon returns · composite attribution 5 646
REPI_Loan_Maturities CSV-only source (no Python code) 0

Stack & Conventions

LayerTechnology / Convention
LanguagePython 3.9+ (type hints on public functions; PEP 8)
Web frameworkFlask 3.0+ (thin routes → logic lives in services/)
DataPandas 2.0+, NumPy 1.24+ (in-memory; no ORM)
VisualizationPlotly 5.18+ (server-rendered JSON → client hydration)
PersistenceSQLite (per dashboard; parameterized queries throughout)
Docs parsingopenpyxl 3.1+ for Excel; csv stdlib
Production serverGunicorn 21.2+ (sync workers)
Proxy fixwerkzeug.middleware.proxy_fix.ProxyFix(x_prefix=1) on every app
Config.env via python-dotenv; secrets never in source
StylingBespoke CSS (no Bootstrap); DM Sans + DM Mono; shared main.css
ChartsPlotly JSON emitted from services/charts.py; hydrated client-side

Shared Patterns Across Dashboards

  • services/data_loader.py — loads source data once at startup into an AppData container that route handlers treat as immutable.
  • services/storage.py — on dashboards with persistence (Legal, Snowflake, Data Validation): SQLite schema + versioned file storage + audit log.
  • services/charts.py — Plotly figure builders returning JSON strings ready for the template.
  • utils/formatters.pyfmt_money, fmt_pct, fmt_bps, fmt_date; kept in sync across all apps.
  • templates/_jump_nav.html — floating dashboard-switcher widget; identical copy in each project.

Code-Level Security Review

Internal audit of the Python source across all seven projects, scanning for SQL injection, path traversal, unsafe deserialization, command injection, secret hardcoding, file-upload validation gaps, CSRF exposure, and template injection.

Summary: The codebase is well-architected for security. All SQL uses bound parameters. All file operations use sanitized filenames and send_from_directory with database-verified IDs. No eval, exec, pickle.load, or yaml.load on user input. No hardcoded production secrets. No unsafe subprocess.shell=True with user input.

Five open items were identified; one is Medium severity (CSRF token absence on state-mutating routes), the rest are Low or Informational.

Positive Findings (Mitigations Already in Place)

  • SQL parameterization throughout. Every sqlite3.execute() call uses ? placeholders with bound parameters. No f-string or concatenation-based SQL anywhere in the codebase.
  • Safe filename handling. Uploads pass through safe_filename() (regex-stripped to A-Za-z0-9._ -()), forced .pdf extension, and version_filename() timestamp suffix to prevent overwrites.
  • PDF-only upload whitelist. MIME check on application/pdf + extension match at the server; non-PDF uploads rejected with a clear error. 50 MB size cap enforced pre-save.
  • Downloads via send_from_directory. File downloads use a database-looked-up filename — the client only supplies an integer doc_id, never a path.
  • Path traversal protection in legacy endpoints. REPI_Snowflake’s /docs/<path> endpoint uses os.path.normpath() + .startswith(DOCS_DIR) check to block .. traversal.
  • No unsafe deserialization. Zero instances of pickle.load, yaml.load (without safe_load), eval, exec, or compile on user-controlled data.
  • No command injection. Application code never calls os.system or subprocess with shell=True on user input.
  • No template injection. Jinja2 autoescape is on by default. |safe filter appears only on server-generated chart JSON, never on user data.
  • No secrets in source. API keys, passwords, and tokens are not hardcoded. Production secret keys are pulled from environment variables.
  • Audit logging. Legal library logs every upload and delete with timestamp + actor (Nginx basic-auth username).

Open Findings (5)

ID Category Severity File Finding
V-01 CSRF Medium REPI_Data_Validation/flask_app.py POST routes lack CSRF tokens. /quiz, /quiz/new, /quiz/end mutate session state without token validation. Similar pattern in upload endpoints across dashboards. Same-origin policy + Nginx basic auth limit practical risk on the current private deployment, but an embedded malicious image on any reachable page could trigger unwanted actions from an authenticated browser.
V-02 Secret Defaults Low REPI_Legal/flask_app.py · REPI_Data_Validation/flask_app.py Dev-only Flask secret-key defaults. Both apps fall back to string literals ("dev-legal-secret-change-in-prod" etc.) if the env var is unset. Safe when production is configured correctly, but a missed deployment step would leave sessions forgeable. Production must set REPI_LEGAL_SECRET and REPI_VALIDATION_SECRET to 32+ random bytes.
V-03 Authorization Low REPI_Snowflake/flask_app.py No admin gate on /delete/<doc_id>. Any authenticated user can delete any document. Legal library correctly gates delete to repi_admin; Snowflake should follow the same pattern. Low severity because Snowflake is an internal proposal tool, but creates an inconsistent authorization model across the platform.
V-04 Debug Mode Informational REPI_Dashboard/flask_app.py Hardcoded debug=True in app.run(). Development-only path — production uses Gunicorn and bypasses app.run() entirely. Still recommend gating on FLASK_ENV or FLASK_DEBUG env var for clarity.
V-05 Audit Completeness Informational REPI_Snowflake/services/storage.py No audit log on Snowflake document deletes. Legal library logs every mutation with actor + timestamp; the Snowflake library does not. Deletes are silent. Aligning the two storage modules would improve traceability.

Remediation Priority

  1. V-01 CSRF (Medium). Integrate flask-wtf’s CSRFProtect. Add hidden tokens to state-mutating forms. ~1–2 days across all apps.
  2. V-02 Secret keys (Low). Document required env vars in deployment runbook; generate via python3 -c "import secrets; print(secrets.token_hex(32))". ~30 minutes.
  3. V-03 Snowflake admin gate (Low). Port Legal’s _is_admin() pattern to Snowflake. ~1 hour.
  4. V-05 Snowflake audit log (Informational). Add an audit_log table + _audit() helper matching Legal. ~1 hour.
  5. V-04 Debug mode guard (Informational). One-line conditional. ~5 minutes.

Dependency Risk

Every project pins major versions in requirements.txt: flask>=3.0.0, pandas>=2.0.0, numpy>=1.24, plotly>=5.18, openpyxl>=3.1, gunicorn>=21.2. As of the last audit cycle, no critical CVEs against the pinned minimum versions. Dependency hashes are not enforced in CI — recommend adding pip-compile --generate-hashes before next major deployment.

Review Cadence

This review is a point-in-time snapshot. Recommended cadence:

  • Quarterly — re-run the grep-based security scan for new instances of the 13 vulnerability classes above (SQL injection, path traversal, unsafe deserialization, command injection, CSRF, open redirect, template injection, secret hardcoding, file-upload gaps, input validation gaps, logging of sensitive data, race conditions, dependency freshness).
  • At every major release — run pip-audit against all seven requirements.txt files; file any CVE over Low severity.
  • On significant new code — new upload, new auth boundary, or new external integration triggers an ad-hoc review.