Business

Getting Covid data into Jupyter from Fidap

Ashish Singal
May 5, 2022

Hi, my name is Ash and I’m the Founder of Fidap. We provide clean data for data scientists. Today, we’re going to go get Covid data into a Jupyter Notebook. Here’s a YouTube video that goes through this post as well —

We’ll answer the following question:

What states have the most total infections?

In order to do this, we’ll use a combination of tools on the data side — Fidap’s data platform, BigQuery, BigQuery Public Datasets, and New York Times’ Covid dataset. We’ll also use Jupyter, Python, pandas and SQL.

Using Fidap’s data catalog

First, let’s navigate over to Fidap’s data catalog and search for Covid.First, let’s navigate over to Fidap’s data catalog and search for Covid.

Search results for "covid"

After we select the Covid NYT dataset, we can see descriptive stats —

Descriptive info about this dataset

Let’s navigate to the Tables tab to see the tables in this dataset -

Tables in Covid dataset

Once we select us_states, we can see details about this table —

Details on the us_states table

We recently added the Explore tab, which gives us detailed exploratory stats on every column (thanks to pandas profiling). For example, we see the distribution of the confirmed_cases column. This can give us a great sense of our data and may alert us to any data quality issues as well.

Querying the data in Fidap

Let’s move on to actually query this. We can do that using Fidap’s query tool

We’ve saved the query here. The query generates the following results —

Getting this in a Jupyter / Colab Notebook

That’s great, but most data scientists would prefer doing this via a Jupyter Notebook than via a web interface. For this reason, Fidap has built a Python package.

Open up a Jupyter Notebook or alternatively, use this Google Colab Notebook. First, we install Fidap via pip:

pip install fidap

Next, let’s instantiate the fidap-client and enter your API key (you can get it from your Account section in Fidap).

import fidap
fc = fidap.fidap_client(api_key='xxx')

Finally, let’s run the query —

fc.sql("""select * from bigquery-public-data.covid19_nyt.us_states where date = CAST('2021-06-28' AS DATE) order by confirmed_cases desc limit 10""")

We get back a pandas DataFrame with our result —

The total number of Covid cases, as expected, are highly correlated with just the states with the largest populations like California and Texas. This isn’t tremendously interesting.

Next time, we’ll do some more complex stuff.

Ashish Singal
Ash is the founder / CEO of Fidap. Previously, he was at Google and Bloomberg. He loves chocolate, puppies, and clean data.

Our latest news

Find our company news, product announcements, and in depth data analysis on our blog.

Ready to get started?

Start for Free