Tiered Filtering (TVP Core: Clean & Verify)
The Tiered Filtering layer takes the massive raw corpus from Ingestion Front and applies progressive scrubs to eliminate junk while preserving valuable crypto signals. Handling billions of data points per cycle, it ensures the output is lean and reliable, flagging potential shills or bots early to boost purity from 40% raw intake to over 85% filtered gold. This core TVP step sets up the stack for accurate validations, focusing on depth over speed alone in volatile markets.
The filtering progresses through structured tiers to manage the volume:
Filtering Tiers
Bulk Deduplication
Scans the entire influx for exact and near-matches using content hashes and similarity scores, collapsing redundant posts like repeated hype threads into single representatives to cut volume by 50-70% without losing context.
Anomaly Detection
Identifies outliers such as bot patterns or spam bursts through basic heuristics like repetition rates and engagement anomalies, prioritizing removal of low-quality items while elevating human-like signals from dev discussions or genuine rumors.
Source Weighting
Assigns tiers based on origin credibility, such as boosting verified social handles over anonymous blogs and downranking coordinated clusters, to balance the dataset and prevent bias from echo chambers.
Gap Refinement
Reviews for coverage holes post-initial clean, like missing timeframes in trend data, and applies light fills from buffered reserves to maintain completeness before final polish.
