Cornellius Yudha Wijaya
2024-08-05 10:00:54
www.kdnuggets.com
Time series data is unique because they depend on each other sequentially. This is because the data is collected over time in consistent intervals, for example, yearly, daily, or even hourly.
Time series data are important in many analyses because can represent patterns for business questions like data forecasting, anomaly detection, trend analysis, and more.
In Python, you can try to analyze the time series dataset with NumPy. NumPy is a powerful package for numerical and statistical calculation, but it can be extended into time series data.
How can we do that? Let’s try it out.
Time Series data with NumPy
First, we need to install NumPy in our Python environment. You can do that with the following code if you haven’t done that.
Next, let’s try to initiate time series data with NumPy. As I have mentioned, time series data have sequential and temporal characteristics, so we would try to create them with NumPy.
import numpy as np
dates = np.array(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'], dtype="datetime64")
dates
Output>>
array(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
'2023-01-05'], dtype="datetime64[D]")
As you can see in the code above, we set the data time series in NumPy with the dtype
parameter. Without them, the data would be considered string data, but now it is considered time series data.
We can create the NumPy time series data without writing them individually. We can do that using the certain method from NumPy.
date_range = np.arange('2023-01-01', '2025-01-01', dtype="datetime64[M]")
date_range
Output>>
array(['2023-01', '2023-02', '2023-03', '2023-04', '2023-05', '2023-06',
'2023-07', '2023-08', '2023-09', '2023-10', '2023-11', '2023-12',
'2024-01', '2024-02', '2024-03', '2024-04', '2024-05', '2024-06',
'2024-07', '2024-08', '2024-09', '2024-10', '2024-11', '2024-12'],
dtype="datetime64[M]")
We create monthly data from 2023 to 2024, with each month’s data as the values.
After that, we can try to analyze the data based on the NumPy datetime series. For example, we can create random data with as much as our date range.
data = np.random.randn(len(date_range)) * 10 + 100
Output>>
array([128.85379394, 92.17272879, 81.73341807, 97.68879621,
116.26500413, 89.83992529, 93.74247891, 115.50965063,
88.05478692, 106.24013365, 92.84193254, 96.70640287,
93.67819695, 106.1624716 , 97.64298602, 115.69882628,
110.88460629, 97.10538592, 98.57359395, 122.08098289,
104.55571757, 100.74572336, 98.02508889, 106.47247489])
Using the random method in NumPy, we can generate random values to simulate time series analysis.
For example, we can try to perform a moving average analysis with NumPy using the following code.
def moving_average(data, window):
return np.convolve(data, np.ones(window), 'valid') / window
ma_12 = moving_average(data, 12)
ma_12
Output>>
array([ 99.97075433, 97.03945458, 98.20526648, 99.53106381,
101.03189965, 100.58353316, 101.18898821, 101.59158114,
102.13919216, 103.51426971, 103.05640219, 103.48833188,
104.30217122])
Moving average is a simple time series analysis in which we calculate the mean of the subset number of the series. In the example above, we use window 12 as the subset. This means we take the first 12 of the series as the subset and take their means. Then, the subset moves by one, and we take the next mean subset.
So, the first subset is this subset where we takes the mean:
[128.85379394, 92.17272879, 81.73341807, 97.68879621,
116.26500413, 89.83992529, 93.74247891, 115.50965063,
88.05478692, 106.24013365, 92.84193254, 96.70640287]
The next subset is where we slide the window by one:
[92.17272879, 81.73341807, 97.68879621,
116.26500413, 89.83992529, 93.74247891, 115.50965063,
88.05478692, 106.24013365, 92.84193254, 96.70640287,
93.67819695]
That’s what the np.convolve
does as the method would move and sum the series subset as much as the np.ones
array number. We use the valid option only to return the amount that can be calculated without any padding.
Nevertheless, moving averages are often used to analyze time series data to identify the underlying pattern and as signals such as buy/sell in the financial field.
Speaking of patterns, we can simulate the trend data in time series with NumPy. The trend is a long-term and persistent directional movement in the data. Basically, it is the general direction of where the time series data would be.
trend = np.polyfit(np.arange(len(data)), data, 1)
trend
Output>>
array([ 0.20421765, 99.78795983])
What happens above is we fit a linear straight line to our data above. From the result, we get the slope of the line (first number) and the intercept (second number). The slope represents how much data changes per step or temporal values on average, while the intercept is the data direction (positive is upward and negative is downward).
We can also have detrended data, which are the components after we remove the trend from the time series. This data type is often used to detect fluctuation patterns in the trend data and anomalies.
detrended = data - (trend[0] * np.arange(len(data)) + trend[1])
detrended
Output>>
array([ 29.06583411, -7.81944869, -18.46297706, -2.71181657,
15.66017371, -10.96912278, -7.2707868 , 14.29216727,
-13.36691409, 4.61421499, -8.98820376, -5.32795108,
-8.56037465, 3.71968235, -5.00402087, 12.84760174,
7.8291641 , -6.15427392, -4.89028352, 18.41288776,
0.6834048 , -3.33080706, -6.25565918, 1.98750918])
The data without their trend are shown in the output above. In a real-world application, we would analyze them to see which one deviates too much from the common pattern.
We can also try to analyze seasonality from the time series data we have. Seasonality is the regular and predictable patterns that occur at specific temporal intervals, such as every 3 months, every 6 months, and others. Seasonality is usually affected by external factors such as holidays, weather, events, and many others.
seasonality = np.mean(data.reshape(-1, 12), axis=0)
seasonal_component = np.tile(seasonality, len(data)//12 + 1)[:len(data)]
Output>>
array([111.26599544, 99.16760019, 89.68820205, 106.69381124,
113.57480521, 93.4726556 , 96.15803643, 118.79531676,
96.30525224, 103.4929285 , 95.43351072, 101.58943888,
111.26599544, 99.16760019, 89.68820205, 106.69381124,
113.57480521, 93.4726556 , 96.15803643, 118.79531676,
96.30525224, 103.4929285 , 95.43351072, 101.58943888])
In the code above, we calculate the average for each month and then extend the data to match its length. In the end, we get the average for each month in the two-year interval, and we can try to analyze the data to see if there is seasonality worth mentioning.
That’s all the basic method we can do with NumPy for time series data and analysis. There are many advanced methods, but the above is the basic we can do.
Conclusion
The time series data is a unique data set as it represents in a sequential manner and has temporal properties. Using NumPy, we can set the time series data while performing basic time series analysis such as moving averages, trend analysis, and seasonality analysis. data while performing basic time series analysis such as moving averages, trend analysis, and seasonality analysis.
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.
Support Techcratic
If you find value in our blend of original insights (Techcratic articles and Techs Got To Eat), up-to-date daily curated articles, and the extensive technical work required to keep everything running smoothly, consider supporting Techcratic with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to future updates and improvements. I am committed to continually enhancing the site and staying at the forefront of trends to provide the best possible experience. Your generosity and commitment are deeply appreciated. Thank you!
Bitcoin Address:
bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending any funds to ensure your donation is directed correctly.
Bitcoin QR Code
Your contribution is vital in supporting my efforts to deliver valuable content and manage the technical aspects of the site. To donate, simply scan the QR code below. Your generosity allows me to keep providing insightful articles and maintaining the server infrastructure that supports them.
Privacy and Security Disclaimer
- No Personal Information Collected: We do not collect any personal information or transaction details when you make a donation via Bitcoin. The Bitcoin address provided is used solely for receiving donations.
- Data Privacy: We do not store or process any personal data related to your Bitcoin transactions. All transactions are processed directly through the Bitcoin network, ensuring your privacy.
- Security Measures: We utilize industry-standard security practices to protect our Bitcoin address and ensure that your donations are received securely. However, we encourage you to exercise caution and verify the address before sending funds.
- Contact Us: If you have any concerns or questions about our donation process, please contact us via the Techcratic Contact form. We are here to assist you.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.