arbitragelab.util.data_importer

This module is a user data helper wrapping various yahoo finance libraries.

Module Contents

Classes

DataImporter

Wrapper class that imports data from yfinance and yahoo_fin.

class DataImporter

Wrapper class that imports data from yfinance and yahoo_fin.

This class allows for fast pulling/mangling of information needed for the research process. These would include; ticker groups of various indexes, pulling of relevant pricing data and processing said data.

static get_sp500_tickers() list

Gets all S&P 500 stock tickers.

Returns:

(list) List of tickers.

static get_dow_tickers() list

Gets all DOW stock tickers.

Returns:

(list) List of tickers.

static remove_nuns(dataframe: pandas.DataFrame, threshold: int = 100) pandas.DataFrame

Remove tickers with nulls in value over a threshold.

Parameters:
  • dataframe – (pd.DataFrame) Asset price data.

  • threshold – (int) The number of null values allowed.

Return dataframe:

(pd.DataFrame) Price Data without any null values.

static get_price_data(tickers: list, start_date: str, end_date: str, interval: str = '5m') pandas.DataFrame

Get the price data with custom start and end date and interval. For daily price, only keep the closing price.

Parameters:
  • tickers – (list) List of tickers to download.

  • start_date – (str) Download start date string (YYYY-MM-DD).

  • end_date – (str) Download end date string (YYYY-MM-DD).

  • interval – (str) Valid intervals: [1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo].

Returns:

(pd.DataFrame) The requested price_data.

static get_returns_data(price_data: pandas.DataFrame) pandas.DataFrame

Calculate return data with custom start and end date and interval.

Parameters:

price_data – (pd.DataFrame) Asset price data.

Returns:

(pd.DataFrame) Price Data converted to returns.

get_ticker_sector_info(tickers: list, yf_call_chunk: int = 20) pandas.DataFrame

This method will loop through all the tickers, using the yfinance library do a ticker info request and retrieve back ‘sector’ and ‘industry’ information.

This method uses the yfinance ‘Tickers’ object which has a limit of the amount of tickers supplied as a string argument. To go around this, this method uses the chunking approach, where the supplied ticker list is broken down into small chunks and supplied sequentially to the helper function.

Parameters:
  • tickers – (list) List of asset symbols.

  • yf_call_chunk – (int) Ticker values allowed per ‘Tickers’ object. This should always be less than 200.

Returns:

(pd.DataFrame) DataFrame with input asset tickers and their respective sector and industry information.