Indoor Localization Using WiFi Signal Fingerprinting
GPS for Indoors: Tracking Location with WiFi Signals
Abstract
This project develops an indoor positioning system using WiFi signal fingerprinting and machine learning to provide room-level localization accuracy without requiring additional hardware infrastructure. The system operates in two phases: a training phase where WiFi signal strength measurements (RSSI) are collected at known locations to build a reference database, and a localization phase where real-time measurements are compared against this database to estimate position. Multiple machine learning algorithms including K-Nearest Neighbors, Random Forest, Support Vector Machines, and Neural Networks were evaluated for classification performance. The implementation leverages existing WiFi infrastructure, making it a cost-effective alternative to GPS for indoor environments. Results demonstrate reliable room-level accuracy while remaining robust to signal variability and environmental changes.
The Problem
GPS works great outdoors, but fails indoors where satellite signals can't penetrate buildings. This creates challenges for indoor navigation in shopping malls, airports, hospitals, and large office buildings. Traditional indoor positioning systems require expensive hardware installations like Bluetooth beacons or specialized sensors, making them impractical for many applications.
The Solution
This project creates a "GPS for indoors" by recording how WiFi signal strength varies across different locations. By building a fingerprint database of WiFi signals at known positions, the system can determine a user's location by comparing their current WiFi readings against this database. The approach leverages existing WiFi infrastructure, requiring no additional hardware.
How It Works
Signal Fingerprinting
The core concept is that WiFi signal strength (RSSI - Received Signal Strength Indicator) from multiple access points creates a unique "fingerprint" for each location. The system works in two phases:
- Training Phase: Walk through the building recording WiFi signals at known locations to build a reference database
- Localization Phase: Compare real-time WiFi readings against the database to estimate current position
Machine Learning Classification
The system uses a Python script to process WiFi signal data and predict locations. Multiple machine learning algorithms were evaluated to find the optimal approach:
- K-Nearest Neighbors (KNN) for pattern matching based on signal similarity
- Random Forest for handling noisy signal data and feature importance
- Support Vector Machines (SVM) for classification in high-dimensional signal space
- Neural Networks for learning complex signal patterns
Technical Implementation
Data Collection
WiFi signal data was collected across multiple rooms and locations within a building. For each position, the system recorded:
- RSSI values from all visible WiFi access points
- MAC addresses of access points for unique identification
- Timestamp and ground truth location labels
- Multiple samples per location to account for signal variability
Feature Engineering
Raw WiFi signals are noisy and variable. Several preprocessing techniques improved accuracy:
- Signal smoothing using moving averages to reduce noise
- Normalization to handle different access point power levels
- Feature selection to identify the most informative access points
- Handling missing values when access points aren't visible from certain locations
Model Training and Evaluation
The machine learning pipeline was built using Python with scikit-learn:
- Train-test split to evaluate generalization performance
- Cross-validation to ensure robust accuracy estimates
- Hyperparameter tuning to optimize model performance
- Confusion matrix analysis to identify problematic locations
Key Achievements
Room-Level Accuracy
The system achieved reliable room-level localization, correctly identifying which room a user is in with high accuracy. This level of precision is sufficient for many practical applications like indoor navigation, asset tracking, and location-based services.
No Additional Hardware Required
By leveraging existing WiFi infrastructure, the solution requires no new hardware installations. This makes it cost-effective and easy to deploy in any building with WiFi coverage. Users only need a smartphone or laptop with WiFi capabilities.
Robust to Signal Variability
WiFi signals naturally fluctuate due to people moving, doors opening/closing, and other environmental factors. The machine learning approach proved robust to these variations by learning patterns from multiple training samples and using ensemble methods.
Technical Challenges
Signal Noise and Variability
WiFi signals are inherently noisy and can vary significantly even at the same location. Solutions included:
- Collecting multiple samples per location during training
- Using ensemble methods like Random Forest to average out noise
- Implementing signal smoothing and outlier detection
- Temporal averaging of predictions for more stable results
Access Point Visibility
Not all access points are visible from all locations, creating sparse feature vectors. This was addressed by:
- Imputing missing values with appropriate defaults (e.g., very weak signal)
- Feature selection to focus on consistently visible access points
- Using algorithms that handle missing data well (Random Forest, KNN)
Scalability to Large Buildings
As the number of locations increases, the classification problem becomes more complex. Strategies included:
- Hierarchical classification (building → floor → room)
- Clustering similar locations to reduce the search space
- Efficient data structures for fast nearest-neighbor search
- Incremental learning to add new locations without full retraining
Lessons Learned
Building this indoor localization system taught me valuable lessons about real-world machine learning applications:
- Data quality matters more than algorithms: Careful data collection and preprocessing had a bigger impact than model selection
- Environmental factors are significant: Time of day, number of people, and even weather affected signal patterns
- Ensemble methods are powerful: Random Forest consistently outperformed single models by averaging out noise
- Real-world deployment requires robustness: The system needed to handle edge cases like new access points appearing or existing ones disappearing
Future Enhancements
Potential improvements to the system include:
- Combining WiFi with other sensors (accelerometer, compass) for sensor fusion
- Implementing particle filters for smoother trajectory tracking
- Using deep learning to automatically learn optimal features from raw signals
- Building a mobile app for real-time indoor navigation
- Exploring transfer learning to adapt models across different buildings
Applications
This technology enables various practical applications:
- Indoor Navigation: Guiding users through complex buildings like hospitals or airports
- Asset Tracking: Locating equipment or inventory in warehouses
- Emergency Response: Helping first responders locate people in buildings
- Retail Analytics: Understanding customer movement patterns in stores
- Smart Buildings: Automating lighting and climate control based on occupancy
References
- Bahl, P., & Padmanabhan, V. N. (2000). RADAR: An in-building RF-based user location and tracking system. Proceedings IEEE INFOCOM 2000, 2, 775-784. doi:10.1109/INFCOM.2000.832252
- Youssef, M., & Agrawala, A. (2005). The Horus WLAN location determination system. Proceedings of the 3rd International Conference on Mobile Systems, Applications, and Services, 205-218. doi:10.1145/1067170.1067193
- Kaemarungsi, K., & Krishnamurthy, P. (2004). Modeling of indoor positioning systems based on location fingerprinting. IEEE INFOCOM 2004, 2, 1012-1022. doi:10.1109/INFCOM.2004.1356988
- Brunato, M., & Battiti, R. (2005). Statistical learning theory for location fingerprinting in wireless LANs. Computer Networks, 47(6), 825-845. doi:10.1016/j.comnet.2004.09.004
- Xiao, J., Zhou, Z., Yi, Y., & Ni, L. M. (2016). A survey on wireless indoor localization from the device perspective. ACM Computing Surveys, 49(2), 1-31. doi:10.1145/2933232