The original WISE mission, launched in 2009, performed an all sky survey for infrared bands of 3.4, 4.6, 12, 22 microns. The second phase of the mission is called NEOWISE because its primary goal was to document Near Earth Objects (NEOs). According to the NEOWISE (Near-Earth Object Wide-Field Infrared Survey Explorer) the telescope has supplemented other data bases with full sky surveys that observe each point multiple times in roughly six month intervals in the 3.4 and 4.6 micron wavelengths. The data from the mission are found on an accessible NEOWISE database. This database was then matched with a list of candidate Young Stellar Objects, or YSOs, located in the Large Magellanic Cloud (LMC), which is a galaxy around 160 million light years from our own. The table of sources was downloaded from a VizieR database. From there, an additional catalog, the SAGE and SAGE-Spec Spitzer and Ancillary Data (SAGE standing for Surveying the Agents of Galaxy Evolution), was used to supplement the previously found data. This survey sought to identify and study variable stars. Some variable stars are unstable, and have more dramatic fluctuations in brightness than others, which makes them interesting to study.
The data processing up until this point was courtesy to my Dad, who is an astronomer at the Center for Astrophysics | Harvard & Smithsonian. Most of his research focuses on studying star formation and Near-Earth Objects in the infrared wavelengths. Both my data science projects and this course’s project have been inspired by his office- studying traffic patterns that make the commute demanding, and now, learning more about stars. There is a discussion about how young stars develop and accrete mass. Studying the variability of the YSOs can give information about the rate of accretion and the growth of these young stars. I am certainly no expert in astronomy, but I have attended several talks and have always been impressed by the level of detail that graphs can include. I thought this would be a good opportunity to see if I can somehow try to create graphs with the skills that I have learned so far. This project will be challenging in a conceptual way because I am not familiar with astronomical units and creating multiple different plots with stylistic changes will build on what we learn in class.
library(tidyverse)
library(patchwork)
library(janitor)
library(lomb)
library(gganimate)
library(corrplot)
library(ggcorrplot)
I used the standard tidyverse
to tidy and then manipulate my data, also with the clean_names()
function in the janitor
package. I used patchwork
to place one plot of the observation distribution each for w1mpro
and w2mpro
side by side. I used the gganimate
package to create the animated plots for certain objects. Finally, loading corrplot
and ggcorrplot
allowed me to make a correlation table of several variables.
# Reading in Data
lmc_neowise_var <- read_csv("data/LMC_NEOWISE_VAR.csv")
# Minor Data Wrangling
lmc_neowise_var_class <- lmc_neowise_var %>%
mutate(ar_class = as.factor(trunc((mjd - 56700) / 180.125))) %>%
clean_names()
# Filtering Data
lmc_neowise_var_class_corr <- lmc_neowise_var_class %>%
select(c(w1mpro, w2mpro, ar_class, matches("e\\d_"))) %>%
mutate(ar_class = as.double(ar_class),
across(where(is.character), ~na_if(., "null")),
across(where(is.character), ~as.numeric(.))
) %>%
select(-c(contains("u"), ar_class))
lmc_neowise_var_class_corr <- na.omit(lmc_neowise_var_class_corr)
# Creating corrplot
ggcorrplot(lmc_neowise_var_class_corr %>% cor(),
hc.order = TRUE,
lab = TRUE,
lab_size = 2,
type = "lower",
method = "circle",
title = "Correlogram of Objects",
ggtheme = theme_bw,
colors = c("skyblue", "white", "#C20B0B")
) +
theme(
plot.title = element_text(hjust = 0.5)
)
The first step following reading in the data involved creating an arbitrary class variable (class as a variable already existed in the data, so my arbitrary class became ar_class
) that grouped the observations together based on observations taken about six months apart rather than classify the objects in a scientific sense. I wanted to turn the existing mjd
as a double into a factor because I thought that the coloring scheme of labeling points using a gradient was confusing, especially with 12 different time points. For me, very light blue versus slightly light blue are too close to be able to clearly understand a graphic. I also wanted to get a general sense of how some of the variables were related. w1mpro
and w2mpro
are in units of magnitude and are both very correlated with each other (it is reasonable for an object to get brighter or dimmer in more than one band), but negatively correlated with the other values like e1_36
, which are in units of flux. The flux is a direct measurement of how bight something is- the greater the flux, the brighter the object. Since magnitude is inversely proportional to flux, it makes sense that w1mpro
and w2mpro
are negatively correlated with measurements in flux. This initial step helps reassure that there is nothing immediately wrong with the data. There is nothing inherently wrong with doing the analysis in terms of flux, so if I wanted to recreate my work using flux measurements, that would have been fine.
w1mpro_hist <- ggplot(lmc_neowise_var_class, aes(x = w1mpro)) +
geom_histogram(fill = "blue", color = "black", alpha = 0.5) +
facet_wrap(~ar_class) +
theme_minimal() +
labs(
y = "Count",
title = "w1mpro Distribution"
) +
theme(
plot.title = element_text(family = "serif", size = 16, vjust = 0.5),
strip.background = element_blank(),
panel.grid.minor = element_blank(),
strip.text = element_text(face = "bold", size = 14)
)
w2mpro_hist <- ggplot(lmc_neowise_var_class, aes(x = w2mpro)) +
geom_histogram(fill = "red", color = "black", alpha = 0.5) +
facet_wrap(~ar_class) +
theme_minimal() +
labs(
y = "Count",
title = "w2mpro Distribution"
) +
theme(
plot.title = element_text(family = "serif", size = 16, vjust = 0.5),
strip.background = element_blank(),
panel.grid.minor = element_blank(),
strip.text = element_text(face = "bold", size = 14)
)
w1mpro_hist + w2mpro_hist
The point of including this was to show that the data for magnitude are fairly normal. Although these are variable stars, the distribution of measurements does not suggest anything inherent about the data.
for (i in lmc_neowise_var_class %>% distinct(seq)){
new_seq <- as.double(i)
}
output <- vector("double", length(new_seq))
for (i in seq_along(new_seq)){
long <- i
access <- new_seq[i]
output[[i]] <- new_seq[[i]]
}
for (i in seq_along(output)){
lmc_neowise_facet_plot_background <- dplyr::select(lmc_neowise_var_class%>%
filter(seq == output[i]), -ar_class)
lmc_neowise_var_class %>%
filter(seq == output[i]) %>%
ggplot(aes(x = w1mpro, y = w2mpro, color = ar_class)) +
geom_point(data = lmc_neowise_facet_plot_background, color = "grey80", size = 1) +
geom_point() +
facet_wrap(~ar_class)
ggsave(filename = paste(output[i], "lmc.png", sep = ""))
}
I first made a facet_wrap()
plot for all objects to identify which ones to construct animations for. I needed to use photoshop to create the animations, since the gganimate
output does not knit very well, and the less objects, the better. I first extracted all of the object names (belonging to the seq
column) and looped through all of the objects. Then I used that information to create basic plots and include the object name in the title. I selected one non-variable object (315) and two variable objects (226 and 735).
animation_plot_base <- function(seq_interest){
return(lmc_neowise_var_class %>%
filter(seq == seq_interest) %>%
ggplot() +
(aes(x = w1mpro, y = w2mpro, color = ar_class)) +
geom_point(data = dplyr::select(lmc_neowise_var_class, -ar_class) %>%
filter(seq == seq_interest),
color = "grey80",
size = 1
) +
geom_point(alpha = 0.5) +
coord_cartesian(
xlim = c(range(lmc_neowise_var_stats_plain %>%
filter(seq == seq_interest) %>%
pull(median_w1mpro) - 1,
lmc_neowise_var_stats_plain %>%
filter(seq == seq_interest) %>%
pull(median_w1mpro) + 1)),
ylim = c(range(lmc_neowise_var_stats_plain %>%
filter(seq == seq_interest) %>%
pull(median_w2mpro) - 1,
lmc_neowise_var_stats_plain %>%
filter(seq == seq_interest) %>%
pull(median_w2mpro) + 1))
) +
theme_minimal() +
theme(
legend.position = c(0.95, 0.05),
legend.justification = c(1, 0),
legend.direction = "horizontal"
) +
guides(
fill = guide_legend(title.position = "bottom",
title.hjust = 0.5
)
) +
transition_states(ar_class,
transition_length = 5,
state_length = 5) +
labs(
color = "Class Code",
title = paste("Comparing Magnitudes of Object", seq_interest)
)
)
}
list_objects <- list(226, 263, 270, 290, 301, 314, 315, 327, 330,
353, 393, 407, 408, 439, 461, 481, 492, 500,
535, 541, 559, 611, 656, 735, 741, 753, 799,
874, 1108)
# It takes about 50 seconds to run each animation plot
for (i in list_objects){
animation_plot_base(i) %>%
animate(renderer = file_renderer(dir = paste("gganimate/lmc_",
i,
sep = ""
),
prefix = "gganim_frame")
)
}
lmc_neowise_facet_plot_background <- dplyr::select(lmc_neowise_var_class%>%
filter(seq == output[i]), -ar_class)
lmc_neowise_var_class %>%
filter(seq == output[i]) %>%
ggplot(aes(x = w1mpro, y = w2mpro, color = ar_class)) +
geom_point(data = lmc_neowise_facet_plot_background, color = "grey80", size = 1) +
geom_point() +
facet_wrap(~ar_class)
ggsave(filename = paste(output[i], "lmc.png", sep = ""))
I wanted to create animations for objects 315, 226, and 735 as well as any other objects I deemed as interesting. I created an animation_plot_base()
function that would plot any sequence I told it to. Part of the plot is dedicated to selecting the range for the axes to either zoom in or zoom out depending on the spread of the data for an object. I used the range()
function to identify the median values and subtract one to find the lower bound and add one to find the upper bound of the axes for w1mpro
and w2mpro
. I also made sure to include the background points because it helps illustrate where each group is in relation to the rest of the observations. Finally, I defined a list of objects I wanted to make animations for and used ggsave()
to save the images to be used in a photoshop animation. Each plot took about 50 seconds to render, with the photoshop process also taking several minutes per object.
This first object is to show what a non-variable object looks like. The points are clustered in a nearly uniform globular shape with observations that seem to be evenly scattered. The size of the scatter (less than half a magnitude) in each band is consistent with the measurement errors in the observations.
The first object of the variable sources has two clusters of points where the groups of observations per time point bounce around. The observations seem to have a more linear shape and vary by a scale of about one magnitude in each band. The observations done within each six month period are very tightly grouped, indicating that the measurement error is small. Object 226 is much brighter than object 315, and so the measurement error in magnitude is smaller. There seems to be some sort of periodicity, but without the Lomb-Scargle plots, I cannot be sure.
Similar to the first object, 226, object 735 has two clusters of points where the groups of observations per time point bounce around. There seems to be some sort of periodicity, but without the Lomb-Scargle plots, I cannot be sure.
Many of the examples for this online appear complicated to me because I am not familiar with the method or its applications. After talking to my Dad about how he uses Lomb-Scargle diagrams in his research and how it can be used to make sense of messy data, I sketched out what I thought he was describing on paper. Making the diagram helped me learn more about the process, so I thought I would digitize my sketch so others could have an easier time understanding the concept in a generalized sense. The input data that a Lomb-Scargle diagram uses is assumed to be periodic in nature, but taken at infrequent times. Instead of sitting on one section of the sky for years at a time, the survey images multiple regions of the sky as the telescope orbits around the Sun. The observations are also taken about every six months, and not six months exactly. Therefore, the data are taken infrequently. Using the value for At Period
after using the lomb
package, this gives the program’s estimate of the the periodicity of the observations. A plot is produced showing where the peak power is in the spectrum. A dotted line indicates the approximate level above which the peaks are statistically significant. Then, using a formula involving the trunc()
function, I can shift each point to construct one period of observational data.
Using the At Period
values, I could shift the data to attempt to visualize the periodicity of observations for each object.
# Establishing Data
lmc_neowise_var_class_filter <- lmc_neowise_var_class %>%
filter(seq == 315) %>%
select(c(mjd, w1mpro, seq, ar_class))
# Getting Lomb-Scargle plot
lmc.spec <- lsp(lmc_neowise_var_class_filter$w1mpro, times = lmc_neowise_var_class_filter$mjd, from = 5, ofac = 5, type = "period")
# Obtaining the Peak At value
lmc.spec %>%
summary()
## Value
## Time lmc_neowise_var_class_filter$mjd
## Data lmc_neowise_var_class_filter$w1mpro
## n 784
## Type period
## Oversampling 5
## From 5.0021
## To 2000.8
## # frequencies 1996
## PNmax 18.416
## At period 172.49
## At frequency 0.0057975
## P-value (PNmax) 8.0193e-06
# Folding the Light Curve
lmc_neowise_var_class_adj <- lmc_neowise_var_class_filter %>%
mutate(adj_mjd = mjd - min(lmc_neowise_var_class_filter %>%
select(mjd)) - trunc((mjd - min(lmc_neowise_var_class_filter %>%
select(mjd)))/173)*173
)
# Defining My Own Color Palette
selected_colors <- list("#3D0202", "#D91212", "#E33D1F", "#5F4706", "#F2980E",
"#E8C20A", "#0AE8E0", "#0AA7CF", "#0A54CF", "#BB40DC",
"#6440DC", "#500863")
# Creating Folded Light Curve
lmc_neowise_var_class_adj %>%
ggplot(aes(x = adj_mjd, y = w1mpro, fill = ar_class)) +
geom_point(shape = 21, color = "black", size = 3, alpha = 0.5) +
scale_fill_manual(
values = selected_colors
) +
theme_minimal() +
labs(
fill = "Class Code",
title = "Folded Light Curve for Object 315",
x = "Adjusted mjd"
) +
theme(
plot.title = element_text(size = 14, family = "serif", face = "bold"),
axis.title = element_text(family = "serif")
)
I chose not to include this plot in the Final Project section to avoid confusion. This folded light curve does not look like a curve because the object is not a variable source. A non-periodic object plotted in a periodic manner does not look right. The Lomb-Scargle plot shows many peaks that fall below the level of significance, indicating that none of the periods accurately represent the data. It is likely that the source of the variation in the observations are due to noise in the measurements.
lmc_neowise_var_class_filter <- lmc_neowise_var_class %>%
filter(seq == 226) %>%
select(c(mjd, w1mpro, seq, ar_class))
lmc.spec <- lsp(lmc_neowise_var_class_filter$w1mpro, times = lmc_neowise_var_class_filter$mjd, from = 5, ofac = 5, type = "period")
lmc.spec %>%
summary()
## Value
## Time lmc_neowise_var_class_filter$mjd
## Data lmc_neowise_var_class_filter$w1mpro
## n 691
## Type period
## Oversampling 5
## From 5.0017
## To 2002.7
## # frequencies 1998
## PNmax 314.29
## At period 476.83
## At frequency 0.0020972
## P-value (PNmax) 2.5573e-134
lmc_neowise_var_class_adj <- lmc_neowise_var_class_filter %>%
mutate(adj_mjd = mjd - min(lmc_neowise_var_class_filter %>%
select(mjd)) - trunc((mjd - min(lmc_neowise_var_class_filter %>%
select(mjd)))/477)*477
)
lmc_neowise_var_class_adj %>%
ggplot(aes(x = adj_mjd, y = w1mpro, fill = ar_class)) +
geom_point(shape = 21, color = "black", size = 3, alpha = 0.5) +
scale_fill_manual(
values = selected_colors
) +
theme_minimal() +
labs(
fill = "Class Code",
title = "Folded Light Curve for Object 226",
x = "Adjusted mjd"
) +
theme(
plot.title = element_text(size = 14, family = "serif", face = "bold"),
axis.title = element_text(family = "serif")
)
The Lomb-Scargle plot looks much clearer than the previous plot of object 315. The major peaks are well above the significance level with the peak having the greatest peak at around 477 days. The At Peak
value was used to fold the light curve and there seems to be a clear periodic trend in the data because the data look like a sine curve. There are some interesting points for Class Code 0 observations, but overall, the bulk of the data are fairly concentrated and indicate this object is a variable source.
# Making an Adjusted Light Curve for Object 735 ----
lmc_neowise_var_class_filter <- lmc_neowise_var_class %>%
filter(seq == 735) %>%
select(c(mjd, w1mpro, seq, ar_class))
lmc.spec <- lsp(lmc_neowise_var_class_filter$w1mpro, times = lmc_neowise_var_class_filter$mjd, from = 5, ofac = 5, type = "period")
lmc.spec %>%
summary()
## Value
## Time lmc_neowise_var_class_filter$mjd
## Data lmc_neowise_var_class_filter$w1mpro
## n 907
## Type period
## Oversampling 5
## From 5.0023
## To 2001.9
## # frequencies 1997
## PNmax 427.15
## At period 435.2
## At frequency 0.0022978
## P-value (PNmax) 2.4724e-183
lmc_neowise_var_class_adj <- lmc_neowise_var_class_filter %>%
mutate(adj_mjd = mjd - min(lmc_neowise_var_class_filter %>%
select(mjd)) - trunc((mjd - min(lmc_neowise_var_class_filter %>%
select(mjd)))/435)*435
)
lmc_neowise_var_class_adj %>%
ggplot(aes(x = adj_mjd, y = w1mpro, fill = ar_class)) +
geom_point(shape = 21, color = "black", size = 3, alpha = 0.5) +
scale_fill_manual(
values = selected_colors
) +
theme_minimal() +
labs(
fill = "Class Code",
title = "Folded Light Curve for Object 735",
x = "Adjusted mjd"
) +
theme(
plot.title = element_text(size = 14, family = "serif", face = "bold"),
axis.title = element_text(family = "serif")
)
The Lomb-Scargle plot looks similar to the plot for object 226. The major peaks are well above the significance level with the peak having the greatest peak at around 435 days. The At Peak
period value of 435 days was used to fold the light curve. There seems to be a clear periodic trend in the data because the data look like a sine curve shifted by pi/2 radians. There are some interesting points for Class Code 3, 10, and 11 observations, but overall, the bulk of the data are fairly concentrated. However, unlike object 226, this does not appear to be a consistent variation because the right half of the data seem to align well, but the left half of the data have a slight shift between points. Perhaps this object is a variable source without a strictly constant periodicity.
In addition to learning a lot about making graphs from STAT 302, I also learned a lot about astronomy and other data analysis techniques in the process. I used a lot of specifications within the labs()
function to clean up labels, used the theme()
function to specify fonts and text sizes for the titles and labels, and learned how to plot background points behind a particular colorized subset of data to show where subsets were in relation to the rest of the data to help classify objects as constant or variable. Although the data was in a tidy format and the creation of the ar_class
and the adj_mjd
did not require many lines of code, it was a conceptually difficult calculation because I had never interacted with Lomb-Scargle plots or Folded Light Curves before. I was not anticipating working with a Lomb-Scargle plot, but talking it over with my Dad and learning more about how he and other researchers in his field use the method helped me understand its utility. I was glad that I was able to automate most of the repetitive tasks like making faceted plots for every object to identify variable sources and constructing animations for the objects I deemed interesting. If I were to take this one step further, I would also find a way to automate the process of creating a Lomb-Scargle plot and determine if the object was variable or constant. Then I would create a Folded Light Curve for the objects deemed to be variable and see what kinds of results that I get.