Skip to content

Machine Learning Engine

KasinathCA edited this page Feb 21, 2026 · 1 revision

Machine Learning Engine

The Behavioral Analysis Layer is the secondary enforcement mechanism in the Intent-Aware Security architecture. It employs strictly unsupervised machine learning—specifically the Isolation Forest algorithm—to detect deviations from established normative operational behavior in real-time.

The Theory of Intent Analysis

Standard security paradigms evaluate authentication mathematically (e.g., verifying a hash or signature). They presume that a mathematically correct input correlates to a legitimate user. Intent-Aware security functions on the premise that an attacker operating with compromised credentials will inherently display anomalous temporal, spatial, or volumetric behavior compared to genuine users.

The task of the Machine Learning Engine is to separate legitimate requests from mimicking bots, scrapers, and automated stuffing utilities strictly by analyzing operational characteristics, irrespective of the credential payload.

Isolation Forest Algorithm

The model relies entirely on scikit-learn's IsolationForest implementation. It utilizes random feature partitioning to isolate observations. Outlier variables require fewer partitions to isolate than normative clusters. The framework outputs an anomaly score mapped continuously: lower scores (typically negative) indicate high-risk anomalies, while higher positive values map to conventional behavior.

Hyperparameters:

  • Extensively tuned to balance stringency against the false positive rate (FPR).
  • Configured specifically to detect low-frequency, perfectly-spoofed attacks (mimicry attacks).
  • Contamination ratio defined to reflect statistically valid malicious traffic baselines (0.15 in prototype).

Feature Engineering and Encoding

The Isolation Forest requires quantitative input features derived from network telemetry.

Extracted Parameters:

  • hour: The hour of the day (0-23) the request is initiated.
  • request_rate: The frequency of API requests generated by the origin IP address per minute.
  • payload_size_kb: The total byte volume of the transmitted request body payload.

Categorical Transformation (Label Encoding):

Raw telemetry features like geographic locale and destination endpoint must be mathematically normalized prior to analysis using LabelEncoder():

  1. geo_location: Transforms discrete string datasets ("India", "Russia", "USA", "Unknown") to continuous integers (0, 1, 2, 3).
  2. endpoint: Mathematically differentiates requests targeting /verify_license, /user_profile, /admin_login, or /bulk_export.

Training Data Distribution

The model is initialized upon a massive array of synthetically produced behavior telemetry generated explicitly via generate_data.py. The sample size per generation protocol defaults to 10,000 discrete records.

Normal Distributions

Formulate a baseline of statistically standard user workflows:

  • High activity during daylight hours.
  • Predictably low payload sizes.
  • Uniform transaction rates (e.g., 2-5 requests per minute).
  • Target endpoints focused exclusively on standard portals (e.g., /user_profile).

Attack Distributions

Defines known malicious anomalies acting within the network:

  • Brute Force Nodes: Intense query rates over a confined period targeting /admin_login endpoints.
  • Data Scrapers: Abnormally massive payload sizes simulating bulk data extraction sequences.
  • Geographic Oddities: Spikes in requests originating unpredictably from disparate international IP blocks at unorthodox temporal thresholds.

Real-Time Evaluation Protocol

In a production workflow:

  1. The incoming JSON request maps features matching the defined inputs.
  2. The deployed application evaluates this against model.pkl.
  3. The Isolation forest outputs an explicitly continuous integer metric.
  4. Any array mapping below 0.0 translates structurally to an anomaly, halting standard propagation and returning HTTP 403 Forbidden bound with an architectural classification flag of "Blocked By ML".