Indoor Localization Using WiFi Signal Fingerprinting

GPS for Indoors: Tracking Location with WiFi Signals

Abstract

This project develops an indoor positioning system using WiFi signal fingerprinting and machine learning to provide room-level localization accuracy without requiring additional hardware infrastructure. The system operates in two phases: a training phase where WiFi signal strength measurements (RSSI) are collected at known locations to build a reference database, and a localization phase where real-time measurements are compared against this database to estimate position. Multiple machine learning algorithms including K-Nearest Neighbors, Random Forest, Support Vector Machines, and Neural Networks were evaluated for classification performance. The implementation leverages existing WiFi infrastructure, making it a cost-effective alternative to GPS for indoor environments. Results demonstrate reliable room-level accuracy while remaining robust to signal variability and environmental changes.

The Problem

GPS works great outdoors, but fails indoors where satellite signals can't penetrate buildings. This creates challenges for indoor navigation in shopping malls, airports, hospitals, and large office buildings. Traditional indoor positioning systems require expensive hardware installations like Bluetooth beacons or specialized sensors, making them impractical for many applications.

The Solution

This project creates a "GPS for indoors" by recording how WiFi signal strength varies across different locations. By building a fingerprint database of WiFi signals at known positions, the system can determine a user's location by comparing their current WiFi readings against this database. The approach leverages existing WiFi infrastructure, requiring no additional hardware.

How It Works

Signal Fingerprinting

The core concept is that WiFi signal strength (RSSI - Received Signal Strength Indicator) from multiple access points creates a unique "fingerprint" for each location. The system works in two phases:

Training Phase: Walk through the building recording WiFi signals at known locations to build a reference database
Localization Phase: Compare real-time WiFi readings against the database to estimate current position

Machine Learning Classification

The system uses a Python script to process WiFi signal data and predict locations. Multiple machine learning algorithms were evaluated to find the optimal approach:

K-Nearest Neighbors (KNN) for pattern matching based on signal similarity
Random Forest for handling noisy signal data and feature importance
Support Vector Machines (SVM) for classification in high-dimensional signal space
Neural Networks for learning complex signal patterns

Technical Implementation

Data Collection

WiFi signal data was collected across multiple rooms and locations within a building. For each position, the system recorded:

RSSI values from all visible WiFi access points
MAC addresses of access points for unique identification
Timestamp and ground truth location labels
Multiple samples per location to account for signal variability

Feature Engineering

Raw WiFi signals are noisy and variable. Several preprocessing techniques improved accuracy:

Signal smoothing using moving averages to reduce noise
Normalization to handle different access point power levels
Feature selection to identify the most informative access points
Handling missing values when access points aren't visible from certain locations

Model Training and Evaluation

The machine learning pipeline was built using Python with scikit-learn:

Train-test split to evaluate generalization performance
Cross-validation to ensure robust accuracy estimates
Hyperparameter tuning to optimize model performance
Confusion matrix analysis to identify problematic locations

Key Achievements

Room-Level Accuracy

The system achieved reliable room-level localization, correctly identifying which room a user is in with high accuracy. This level of precision is sufficient for many practical applications like indoor navigation, asset tracking, and location-based services.

No Additional Hardware Required

By leveraging existing WiFi infrastructure, the solution requires no new hardware installations. This makes it cost-effective and easy to deploy in any building with WiFi coverage. Users only need a smartphone or laptop with WiFi capabilities.

Robust to Signal Variability

WiFi signals naturally fluctuate due to people moving, doors opening/closing, and other environmental factors. The machine learning approach proved robust to these variations by learning patterns from multiple training samples and using ensemble methods.

Technical Challenges

Signal Noise and Variability

WiFi signals are inherently noisy and can vary significantly even at the same location. Solutions included:

Collecting multiple samples per location during training
Using ensemble methods like Random Forest to average out noise
Implementing signal smoothing and outlier detection
Temporal averaging of predictions for more stable results

Access Point Visibility

Not all access points are visible from all locations, creating sparse feature vectors. This was addressed by:

Imputing missing values with appropriate defaults (e.g., very weak signal)
Feature selection to focus on consistently visible access points
Using algorithms that handle missing data well (Random Forest, KNN)

Scalability to Large Buildings

As the number of locations increases, the classification problem becomes more complex. Strategies included:

Hierarchical classification (building → floor → room)
Clustering similar locations to reduce the search space
Efficient data structures for fast nearest-neighbor search
Incremental learning to add new locations without full retraining

Lessons Learned

Building this indoor localization system taught me valuable lessons about real-world machine learning applications:

Data quality matters more than algorithms: Careful data collection and preprocessing had a bigger impact than model selection
Environmental factors are significant: Time of day, number of people, and even weather affected signal patterns
Ensemble methods are powerful: Random Forest consistently outperformed single models by averaging out noise
Real-world deployment requires robustness: The system needed to handle edge cases like new access points appearing or existing ones disappearing

Future Enhancements

Potential improvements to the system include:

Combining WiFi with other sensors (accelerometer, compass) for sensor fusion
Implementing particle filters for smoother trajectory tracking
Using deep learning to automatically learn optimal features from raw signals
Building a mobile app for real-time indoor navigation
Exploring transfer learning to adapt models across different buildings

Applications

This technology enables various practical applications:

Indoor Navigation: Guiding users through complex buildings like hospitals or airports
Asset Tracking: Locating equipment or inventory in warehouses
Emergency Response: Helping first responders locate people in buildings
Retail Analytics: Understanding customer movement patterns in stores
Smart Buildings: Automating lighting and climate control based on occupancy

References

Bahl, P., & Padmanabhan, V. N. (2000). RADAR: An in-building RF-based user location and tracking system. Proceedings IEEE INFOCOM 2000, 2, 775-784. doi:10.1109/INFCOM.2000.832252
Youssef, M., & Agrawala, A. (2005). The Horus WLAN location determination system. Proceedings of the 3rd International Conference on Mobile Systems, Applications, and Services, 205-218. doi:10.1145/1067170.1067193
Kaemarungsi, K., & Krishnamurthy, P. (2004). Modeling of indoor positioning systems based on location fingerprinting. IEEE INFOCOM 2004, 2, 1012-1022. doi:10.1109/INFCOM.2004.1356988
Brunato, M., & Battiti, R. (2005). Statistical learning theory for location fingerprinting in wireless LANs. Computer Networks, 47(6), 825-845. doi:10.1016/j.comnet.2004.09.004
Xiao, J., Zhou, Z., Yi, Y., & Ni, L. M. (2016). A survey on wireless indoor localization from the device perspective. ACM Computing Surveys, 49(2), 1-31. doi:10.1145/2933232