How To Calculate Technical Indicators For Trading Using Pandas
Yaser Rahmati | یاسر رحمتی
Last updated
Yaser Rahmati | یاسر رحمتی
Last updated
No need to be an expert
Some knowledge about classic technical indicators (RSI, moving average ...)
Maybe you are already experienced in basic Pandas functionalities
A quick idea on machine learning predictions in the Forex or stocks Chaos
Download prices data from an online source
Load the data into our framework
Check and Clean data content
Quick statistical analysis to reveal hidden patterns
Is it possible to make predictions based on technical indicators? Stats will answer
The US30 Index, often referred to as the Dow Jones Industrial Average (DJIA), is one of the most well-known stock market indices in the United States. Here’s a brief overview:
Composition: The US30 Index tracks 30 of the largest and most influential publicly traded companies in the U.S. It includes major corporations across various industries, such as technology, finance, healthcare, and consumer goods.
Historical Significance: Established in 1896 by Charles Dow, it was one of the first indices to represent the overall performance of the U.S. stock market. The DJIA has a long history and is often used as a barometer for the health of the U.S. economy.
Price-Weighted: Unlike market-capitalization-weighted indices, the DJIA is price-weighted. This means that stocks with higher prices have more influence on the index's movement than those with lower prices.
Purpose: Investors and analysts use the DJIA to gauge the overall performance of the stock market and the U.S. economy. It provides a snapshot of how the largest companies in the country are performing.
Updates: The components of the DJIA are periodically reviewed and adjusted. Changes are made to reflect shifts in the economy and industry dynamics.
Above dataset is a CSV file containing candlestick data for the USA30 index (also known as the Dow Jones Industrial Average, DJIA) on a 4-hour interval basis, covering the period from August 24, 2023, to August 24, 2024. Here's a summary of the dataset:
Local time: The timestamp of each data point in the format DD.MM.YYYY HH:MM:SS.SSS GMT+0330
.
Open: The opening price of the index for the 4-hour interval.
High: The highest price of the index during the 4-hour interval.
Low: The lowest price of the index during the 4-hour interval.
Close: The closing price of the index for the 4-hour interval.
Volume: The trading volume during the 4-hour interval.
Number of Entries: 2,197 rows
Data Types:
The Local time
column is of type object
(string).
The Open
, High
, Low
, Close
, and Volume
columns are of type float64
.
These commands are used to install Python packages via pip, the package installer for Python. Here’s what each one does:
python -m pip install pandas
: This installs the pandas
library, which is a powerful data analysis and manipulation library for Python. It provides data structures like DataFrames and Series, and functions to manipulate and analyze data.
python -m pip install numpy
: This installs the numpy
library, which is fundamental for numerical computing in Python. It provides support for arrays, matrices, and many mathematical functions to operate on these data structures.
python -m pip install pandas_ta
: This installs the pandas_ta
library, which is a technical analysis library for pandas
. It adds a wide range of technical indicators and trading strategy tools to pandas
, making it useful for financial and trading applications.
These packages are often used together for data analysis and financial modeling.
The code reads a CSV file containing candlestick data (price data for financial instruments over 4-hour intervals) into a DataFrame and then displays the last 5 rows of this data. The purpose of this code snippet is likely to check the last few entries in the dataset to see if they were loaded correctly or to inspect the most recent data.
The code you provided is written in Python and uses the pandas library, which is commonly used for data manipulation and analysis. Here's a breakdown of what each line does:
import pandas as pd
: This line imports the pandas
library and gives it the alias pd
. This is a standard convention to make the code more concise when using pandas
functions.
df = pd.read_csv("USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv")
: This line reads a CSV (Comma-Separated Values) file named "USA30.IDXUSD_Candlestick_4_Hour_BID_24.08.2023-24.08.2024.csv"
into a DataFrame (df
). A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure in pandas
, similar to a table in a database or an Excel spreadsheet. The pd.read_csv()
function is used to load the data from the CSV file into this DataFrame.
df.tail()
: This line returns the last 5 rows of the DataFrame df
by default. The tail()
function is useful to quickly inspect the end of a dataset, which might show you the most recent data points or the conclusion of the dataset.
The code filters and removes all rows where the Volume is 0 from the DataFrame df
.
The last line would check for any remaining rows where Volume
is 0
(there should be none after the drop
operation).
indexZeros = df[df['Volume'] == 0].index
:
This line identifies all rows in the DataFrame df
where the value in the Volume
column is 0
.
df['Volume'] == 0
creates a boolean mask where each entry is True
if the corresponding row's Volume
is 0
, and False
otherwise.
df[df['Volume'] == 0]
filters the DataFrame to only include rows where the Volume
is 0
.
The .index
attribute retrieves the indices (row labels) of these filtered rows.
The resulting indices (where Volume
is 0
) are stored in the variable indexZeros
.
df.drop(indexZeros, inplace=True)
:
This line removes all rows from the DataFrame df
that have indices listed in indexZeros
.
indexZeros
contains the indices of all rows where Volume
is 0
.
The inplace=True
argument means that the DataFrame df
is modified directly, without creating a copy.
As a result, all rows with Volume
equal to 0
are permanently removed from df
.
df.loc[(df['Volume'] == 0)]
:
This line is used to select and display any rows in the DataFrame df
where the Volume
column is still 0
.
df['Volume'] == 0
is a boolean condition that checks which rows have a Volume
of 0
.
df.loc[(df['Volume'] == 0)]
uses .loc[]
to filter the DataFrame based on this condition.
After the previous drop
operation, ideally, there should be no rows where Volume
is 0
. Thus, this line is likely used to confirm that all such rows were correctly removed.
df.isna().sum()
returns a Series where each index is a column name from df
, and each value is the count of missing values in that column.
Here's a step-by-step explanation of what each part does:
df.isna()
:
The isna()
function (also available as isnull()
) is a pandas
method used on a DataFrame (df
) to identify missing values.
It returns a DataFrame of the same shape as df
, where each element is a boolean (True
or False
):
True
indicates that the value is missing (NaN
).
False
indicates that the value is not missing.
.sum()
:
The sum()
function, when applied to a DataFrame with boolean values (True
as 1 and False
as 0), sums up the True
values along the specified axis.
By default, sum()
operates column-wise (i.e., along axis=0).
Therefore, df.isna().sum()
calculates the total number of missing values (NaN
s) in each column of the DataFrame df
.
numpy
: Used for numerical operations, particularly for handling arrays.
pandas_ta
: A library that provides technical analysis indicators for pandas
DataFrames.
scipy.stats.linregress
: A function used to perform linear regression and compute the slope.
ATR
(Average True Range): Measures market volatility over the past 20 periods.
RSI
(Relative Strength Index): Measures the speed and change of price movements, default length is 14 periods if not specified.
Average
: Calculates the midpoint price of each period (default length is 1, so it will be the average of the high and low for each period).
MA40
, MA80
, MA160
: Simple Moving Averages over 40, 80, and 160 periods, respectively.
get_slope
Function:This function calculates the slope of a line fitted to the given array
using linear regression. However, there's a logical error here: both x
and y
are set to the same array
, so this will always result in a slope of 1. Instead, x
should be an index range or a sequence of numbers representing the positions of the elements in the array.
Defines the size of the rolling window for applying the get_slope
function.
For each of the columns (MA40
, MA80
, MA160
, Average
, RSI
), a rolling window of size backrollingN
is applied.
The get_slope
function is used to compute the slope of the moving average, average price, or RSI over each window.
raw=True
ensures that the rolling window provides raw NumPy arrays to the get_slope
function.
Displays the last few rows of the DataFrame to inspect the results.
Let's break down and interpret the given Python code, which is used to determine trend categories based on historical price data.
pipdiff = 500 * 1e-5
: This sets the threshold for the price difference to 0.025. This is used to determine the point at which a trend is considered significant.
SLTPRatio = 2
: This ratio is used to calculate the Stop Loss (SL) level based on the pipdiff
.
mytarget(barsupfront, df1)
Input Parameters:
barsupfront
: This parameter defines how many bars (data points) ahead to consider for trend analysis.
df1
: A DataFrame containing historical price data with columns 'High'
, 'Low'
, 'Close'
, and 'Open'
.
Initial Setup:
length = len(df1)
: The number of rows in the DataFrame.
high
, low
, close
, open
: Lists containing the high, low, close, and open prices from the DataFrame.
trendcat
: A list initialized with None
values to store the trend category for each row.
Main Loop:
The loop iterates from 0
to length - barsupfront - 2
. This ensures that we have enough data points for analysis within the bounds of the DataFrame.
valueOpenLow
and valueOpenHigh
are initialized to track the maximum and minimum values of differences between the current open price and the future low and high prices.
Inner Loop:
Iterates from 1
to barsupfront + 1
. For each iteration:
value1
is the difference between the current open price and the future low price.
value2
is the difference between the current open price and the future high price.
valueOpenLow
tracks the maximum of value1
across all iterations.
valueOpenHigh
tracks the minimum of value2
across all iterations.
Trend Detection:
If valueOpenLow >= pipdiff
and -valueOpenHigh <= (pipdiff / SLTPRatio)
, it suggests a downtrend, and trendcat[line]
is set to 1
.
If valueOpenLow <= (pipdiff / SLTPRatio)
and -valueOpenHigh >= pipdiff
, it suggests an uptrend, and trendcat[line]
is set to 2
.
If neither condition is met, it indicates no clear trend, and trendcat[line]
is set to 0
.
Return Value:
The function returns the trendcat
list, which contains the trend category for each row in the DataFrame.
This code snippet creates a figure and plots histograms for multiple columns from the DataFrame df
. Each histogram shows the distribution of values in the specified columns, helping visualize the spread and frequency of the data for each feature.
This code generates histograms to visualize the distribution of RSI
values based on trend categories from your DataFrame. By plotting histograms for RSI
values corresponding to uptrends, downtrends, and unclear trends, you can compare the distributions and observe how RSI
values vary across different trend categories.