CodexBloom - Programming Q&A Platform

Difficulty merging time series data with different time zones in R using lubridate

👀 Views: 47 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-12
r dplyr lubridate R

I'm working on a project and hit a roadblock. Hey everyone, I'm running into an issue that's driving me crazy. I've been banging my head against this for hours. I'm relatively new to this, so bear with me. Quick question that's been bugging me - I'm trying to combine two time series data frames that have timestamps in different time zones, but I'm encountering unexpected results when using `dplyr::full_join`. The timestamps in `df1` are in UTC, while the timestamps in `df2` are in America/New_York. Here's a simplified version of my data frames: ```r library(dplyr) library(lubridate) # df1: UTC timestamps df1 <- data.frame( timestamp = as.POSIXct(c('2023-10-01 12:00:00', '2023-10-01 13:00:00'), tz = 'UTC'), value = c(10, 20) ) # df2: New York timestamps df2 <- data.frame( timestamp = as.POSIXct(c('2023-10-01 08:00:00', '2023-10-01 09:00:00'), tz = 'America/New_York'), value = c(5, 15) ) # Attempting to merge merged_data <- full_join(df1, df2, by = 'timestamp') ``` However, after performing the `full_join`, I noticed that the timestamps from `df2` get converted to UTC, which is expected, but the merged data frame shows a mismatch in the number of rows and contains `NA` values for the `value` column from `df2` for every row in `df1`. The output looks like this: ```r # Example output # timestamp value.x value.y # 1 2023-10-01 12:00:00 10 NA # 2 2023-10-01 13:00:00 20 NA # 3 2023-10-01 12:00:00 NA 5 # 4 2023-10-01 12:00:00 NA 15 ``` The expected behavior is to see the correct matching of timestamps and corresponding values. I've tried using `with_tz` from `lubridate` to convert time zones before merging, but I'm still encountering issues. Here's what I attempted: ```r # Attempting timezone conversion before merge df2$timestamp <- with_tz(df2$timestamp, 'UTC') merged_data <- full_join(df1, df2, by = 'timestamp') ``` This still leads to `NA` values and unexpected duplicates in the merged data. Does anyone have insights on how to properly merge these data frames while accounting for the differing time zones? Any help would be appreciated! My development environment is macOS. I'd really appreciate any guidance on this. This is part of a larger service I'm building. How would you solve this? The stack includes R and several other technologies. Thanks for taking the time to read this!