Unexpected NaN values in TensorFlow model training with LSTM on time series data
I'm having a hard time understanding I'm converting an old project and I've been struggling with this for a few days now and could really use some help... This might be a silly question, but I'm currently working on a time series forecasting question using LSTM with TensorFlow version 2.8.0. During the training process, I noticed that some of the loss values are becoming NaN, which is really perplexing. I've tried normalizing my data and adjusting the learning rate, but the scenario continues. Hereβs the relevant part of my code: ```python import numpy as np import pandas as pd import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Dense # Prepare dataset data = pd.read_csv('time_series_data.csv') # Make sure this CSV contains no NaN values values = data['value'].values normalized_data = (values - np.mean(values)) / np.std(values) # Create sequences def create_dataset(data, time_step=1): X, y = [], [] for i in range(len(data) - time_step - 1): X.append(data[i:(i + time_step)]) y.append(data[i + time_step]) return np.array(X), np.array(y) X, y = create_dataset(normalized_data, time_step=10) X = X.reshape(X.shape[0], X.shape[1], 1) # Building the LSTM model model = Sequential() model.add(LSTM(50, return_sequences=True, input_shape=(X.shape[1], 1))) model.add(LSTM(50, return_sequences=False)) model.add(Dense(1)) model.compile(optimizer='adam', loss='mean_squared_error') # Training the model model.fit(X, y, epochs=100, batch_size=32) ``` Despite the above configuration, I frequently see this warning in my logs: ``` RuntimeError: NaN values found in loss tensor. ``` I tried reducing the batch size and using a lower learning rate (1e-4 instead of the default 1e-3), but it hasn't resolved the scenario. My dataset is relatively clean, and I'm not sure if there are any hidden anomalies or issues with the LSTM architecture that I might be overlooking. Has anyone else experienced similar issues or have suggestions on how to debug this question effectively? Is there a better approach? For context: I'm using Python on macOS. I'm working on a REST API that needs to handle this. Is there a better approach?