Introduction
Feast is an open-source feature store that enables efficient management and serving of machine learning (ML) features for real-time applications. It provides a unified interface for storing, discovering, and accessing features, which are the individual measurable properties or characteristics of the data used for ML modeling. Feast follows a distributed architecture that consists of several components working together. These include the Feast Registry, Stream Processor, Batch Materialization Engine, and Stores.
Feast supports offline and online stores. While an offline store works with historical time-series feature values that are stored in data sources, Feast uses online stores to serve features at low latency. Feature values are loaded from data sources into the online store through materialization, which can be triggered through the materialize
command.
One of the supported online stores in Feast is Redis, which is an open-source, in-memory data structure store. This article explains how to use a RCS Managed Database for Redis as an online feature store for Feast.
Advantages of Redis as an online feature store
High latency can harm model performance and the overall user experience. One of the crucial factors in the success of a feature store is the ability to serve features at low latency. Using Redis as an online feature store attracts several advantages such as:
Elimination of the need for disk I/O operations that can introduce delays.
Features can be retrieved and served quickly, resulting in faster response times.
Machine learning models can offer efficient and timely predictions.
Data is stored directly in-memory instead of the on-disk saving server resources and improving the overall processing times.
Prerequisites
To follow the instructions in this article, make sure you:
Deploy a RCS Managed Database for Redis.
When deployed, copy your RCS Managed Database for Redis instance connection information, and take note of the
host
,password
, andport
to establish a connection to the database.Deploy a Ubuntu 22.04 Management server on RCS.
Use SSH to access the server as a non-root sudo user.
Update the server packages.
Using RCS Managed Database for Redis as an online feature store for Feast
Install Dependecies
To successfully connect to a RCS Managed Database for Redis and install Feast, you need to set up Python, Redis CLI, and install the Feast SDK as described in this section.
Install Python
3.10
on the server.$ sudo apt-get install python3.10
Install the
Pip3
Python package manager.$ sudo apt-get -y install python3-pip
Install the Redis CLI tool.
$ sudo apt-get install redis
Install the Feast SDK and CLI.
$ pip install feast
To use Redis as the online store, install the
redis
dependency.$ pip install 'feast[redis]'
Create a feature repository
Using Feast, bootstrap a new feature repository.
$ feast init feast_RCS_redis
Output:
Creating a new Feast repository in <full path to your directory>
Switch to the newly added directory.
$ cd feast_RCS_redis/feature_repo
Using a text editor such as
Nano
, edit thefeast_RCS_redis/feature_repo/feature_store.yaml
file.$ nano feast_RCS_redis/feature_repo/feature_store.yaml
Add the following contents to the file. Replace
RCS_REDIS_HOST
,RCS_REDIS_PORT
, andRCS_REDIS_PASSWORD
with your actual database details.project: feast_RCS_redis registry: data/registry.db provider: local online_store: type: redis connection_string: "RCS_REDIS_HOST:RCS_REDIS_PORT,ssl=true,password=RCS_REDIS_PASSWORD"
Save and close the file.
Register feature definitions and deploy a feature store
To register feature definitions, run the following command.
$ feast apply
The apply
command scans Python files in the current directory (example_repo.py
in this case) for feature view/entity definitions, registers the objects, and deploys infrastructure.
When successful, your output should look like the one below.
....
Created entity driver
Created feature view driver_hourly_stats_fresh
Created feature view driver_hourly_stats
Created on demand feature view transformed_conv_rate
Created on demand feature view transformed_conv_rate_fresh
Created feature service driver_activity_v1
Created feature service driver_activity_v3
Created feature service driver_activity_v2
Generate training data
Create a new file
generate_training_data.py
.$ nano `generate_training_data.py`
Add the following code to the file.
from datetime import datetime import pandas as pd from feast import FeatureStore entity_df = pd.DataFrame.from_dict( { # entity's join key -> entity values "driver_id": [1001, 1002, 1003], # "event_timestamp" (reserved key) -> timestamps "event_timestamp": [ datetime(2021, 4, 12, 10, 59, 42), datetime(2021, 4, 12, 8, 12, 10), datetime(2021, 4, 12, 16, 40, 26), ], # (optional) label name -> label values. Feast does not process these "label_driver_reported_satisfaction": [1, 5, 3], # values we're using for an on-demand transformation "val_to_add": [1, 2, 3], "val_to_add_2": [10, 20, 30], } ) store = FeatureStore(repo_path=".") training_df = store.get_historical_features( entity_df=entity_df, features=[ "driver_hourly_stats:conv_rate", "driver_hourly_stats:acc_rate", "driver_hourly_stats:avg_daily_trips", "transformed_conv_rate:conv_rate_plus_val1", "transformed_conv_rate:conv_rate_plus_val2", ], ).to_df() print("----- Feature schema -----\n") print(training_df.info()) print() print("----- Example features -----\n") print(training_df.head())
Save and close the file.
Generate training data.
$ python3 generate_training_data.py
Load batch features to your online store
Serialize the latest values of features to prepare for serving:
$ CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S") &&\ feast materialize-incremental $CURRENT_TIME
When feature data is stored using Redis as the online store, Feast uses it as a two-level map with the help of Redis Hashes. The first level of the map contains the Feast project name and entity key. The entity key is composed of entity names and values. The second level key (in Redis terminology, this is the "field" in a Redis Hash) contains the feature table name and the feature name, and the Redis Hash value contains the feature value.
In a new terminal window, paste your RCS Managed Database for Redis connection string to establish a connection to the database.
$ redis-cli -u rediss://default:[DATABASE_PASSWORD]@[DATABASE_HOST]:[DATABASE_PORT]
Replace
DATABASE_PASSWORD
,DATABASE_HOST
, andDATABASE_PORT
with your actual RCS Managed Database values.When connected, your shell prompt changes to
>
. Run the following command to view all stored keys.keys "*"
Your output should look like the one below:
1) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_RCS_redis" 2) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xec\x03\x00\x00feast_RCS_redis" 3) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xeb\x03\x00\x00feast_RCS_redis" 4) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xe9\x03\x00\x00feast_RCS_redis" 5) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xea\x03\x00\x00feast_RCS_redis"
Check the Redis data type:
> type "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_RCS_redis"
Output:
hash
Verify the contents of the
hash
.> hgetall "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_RCS_redis"
Your output should look like the one below.
1) "_ts:driver_hourly_stats" 2) "\b\xd0\xa4\xb5\xa5\x06" 3) "a`\xe3\xda" 4) "5\xf20Q?" 5) "\xfa^X\xad" 6) "5\x83\x7f\xcb>"
Fetch feature vectors for inference
At inference time, you can read the latest feature values for different drivers from the online feature store using get_online_features()
. In this section, fetch feature vectors for inference as described below.
Create a new
fetch_feature_vectors.py
file.$ nano `fetch_feature_vectors.py`
Add the following code to the file.
from pprint import pprint from feast import FeatureStore store = FeatureStore(repo_path=".") feature_vector = store.get_online_features( features=[ "driver_hourly_stats:conv_rate", "driver_hourly_stats:acc_rate", "driver_hourly_stats:avg_daily_trips", ], entity_rows=[ # {join_key: entity_value} {"driver_id": 1004}, {"driver_id": 1005}, ], ).to_dict() pprint(feature_vector)
Save and close the file.
Fetch feature vectors, run:
$ python3 fetch_feature_vectors.py
Your output should look like the one below.
{ 'acc_rate': [0.1056235060095787, 0.7656288146972656], 'avg_daily_trips': [521, 45], 'conv_rate': [0.24400927126407623, 0.48361605405807495], 'driver_id': [1004, 1005] }
Conclusion
In this article, you used Feast for feature retrieval, and discovered why Redis is a good fit using a RCS Managed Database for Redis as the online store. For more information about Feast, visit the official documentation.