1. Deploy the server

Hydra is available as open source and a fully managed cloud database. To spin up a cloud database, navigate to https://www.hydra.so/ to join our cloud waitlist. For a detailed walkthrough, navigate to the Setting up Hydra Cloud guide.

2. Create analytics schema and table

We will connect to Hydra using psql, the terminal-based front-end to Postgres, but you can use any Postgres client you’d prefer and achieve the same result. We will start by creating our analytics schema with a table “analytics.rides” which we will populate with sample data from the New York Taxi data in step 3.

CREATE EXTENSION pg_duckdb;
CREATE SCHEMA analytics;
CREATE TABLE analytics.rides
(
	vendor_id text,
	pickup_datetime timestamp,
	dropoff_datetime timestamp,
	passenger_count int,
	trip_distance float,
	pickup_longitude float,
	pickup_latitude float,
	rate_code text,
	store_and_fwd_flag text,
	dropoff_longitude float,
	dropoff_latitude float,
	fare_amount float,
	surcharge float,
	mta_tax float,
	tip_amount float,
	tolls_amount float,
	total_amount float
)
USING duckdb;

3. Insert sample data

Let’s insert the New York Taxi data into our rides table in the analytics schema. Below we’ve written example queries you can copy / paste.

INSERT INTO analytics.rides
SELECT *
FROM
read_parquet('<https://idl.uw.edu/mosaic-datasets/data/nyc-rides-2010.parquet>')
AS
(
	vendor_id text,
	pickup_datetime timestamp,
	dropoff_datetime timestamp,
	passenger_count int,
	trip_distance float,
	pickup_longitude float,
	pickup_latitude float,
	rate_code text,
	store_and_fwd_flag text,
	dropoff_longitude float,
	dropoff_latitude float,
	fare_amount float,
	surcharge float,
	mta_tax float,
	tip_amount float,
	tolls_amount float,
	total_amount float
);

4. Run example queries

Run each statement by clicking the green triangle next to each query.

SELECT count(*) FROM analytics.rides;
SELECT * FROM analytics.rides LIMIT 10;
SELECT
   vendor_id,
   count(*) AS count,
   sum(fare_amount) AS revenue
FROM analytics.rides
GROUP BY vendor_id
ORDER BY count DESC
LIMIT 10;

Query 0: SELECT COUNT(*) … counts the number of rows in the analytics.rides table. Query 1: SELECT * FROM … retrieves up to 10 rows of data from the analytics.rides table. Query 2: SELECT vendor_id … is list of the top 10 vendors, displaying the number of rides and total revenue for each, ordered by the greatest number of rides first.

5. Insert existing data

Now let’s test Hydra with your own data. If you have existing data in PostgreSQL, S3, Google Cloud Storage, or local files it’s a good idea to drop the “analytics.rides” sample data that was inserted earlier before loading your own files.

To delete the sample data, run the following:

DROP TABLE analytics.rides;

Note: Browse the codebox tabs to steps for S3, Postgres, and/or local data.

What’s Next?

We recommend