News API provides historical news data dating back to 2015. You can expect to receive an average of 130+ news articles per day. All news data is currently provided directly by Benzinga.
I see a big chunk of data is missing between 2015-Feb and 2016-Oct
@Plamen There is news for JPM during the time 2015-Feb and 2016-Oct. Nothing seems to be missing.
Here is the code I used to fetch all news for JPM from 2015-01-18 to the current date.
!pip install -q alpaca-py
from alpaca.common.rest import RESTClient
import pandas as pd
ALPACA_API_KEY_ID = 'xxxxx'
ALPACA_API_SECRET_KEY = 'xxxxx'
# instantiate a basic rest client
news_client = RESTClient(base_url='https://data.alpaca.markets',
api_version='v1beta1',
api_key=ALPACA_API_KEY_ID,
secret_key=ALPACA_API_SECRET_KEY)
# create a dataframe to store the news
news_df = pd.DataFrame()
page_token = 'default'
while page_token is not None:
parameters = {'start': pd.to_datetime('2015-01-18T00:00:00Z').isoformat(),
'end': pd.to_datetime('today', utc=True).isoformat(),
'page_token': page_token if page_token!='default' else None,
'symbols': 'JPM',
'limit': 50
}
response = news_client.get('/news', parameters)
news_df = pd.concat([news_df, pd.DataFrame(response.get('news'))])
page_token = response.get('next_page_token')
news_df is a dataframe of all news up to the current day. One can then plot the number of article by month 2015-Feb thru 2016-Oct (I actually plotted a bit larger window). Here’s the code
# set the index to the created_at column to a datatime
news_df.set_index(pd.to_datetime(news_df.created_at, utc=True), inplace=True)
# narrow the window we want to plot
check_start = pd.to_datetime('2015-02-01', utc=True)
check_end = pd.to_datetime('2017-01-01', utc=True)
# plot the qty of articles per month between our check dates
news_df.query('@check_start < index < @check_end').resample('1M').content.count().plot.bar()
Here is the chart showing generally over 20 articles per month throughout 2015-2016.