1. Install locally or deploy the server

Create analytics schema and table

Insert sample data

Run example queries

2. Create analytics schema and table

We will connect to Hydra using psql, the terminal-based front-end to Postgres, but you can use any Postgres client you’d prefer and achieve the same result. We will start by creating our analytics schema with a table “analytics.rides” which we will populate with sample data from the New York Taxi data in step 3.

CREATE SCHEMA analytics;
CREATE TABLE analytics.rides
(
	vendor_id text,
	pickup_datetime timestamp,
	dropoff_datetime timestamp,
	passenger_count int,
	trip_distance float,
	pickup_longitude float,
	pickup_latitude float,
	rate_code text,
	store_and_fwd_flag text,
	dropoff_longitude float,
	dropoff_latitude float,
	fare_amount float,
	surcharge float,
	mta_tax float,
	tip_amount float,
	tolls_amount float,
	total_amount float
)
USING duckdb;

3. Insert sample data

Let’s insert the New York Taxi data into our rides table in the analytics schema. Below we’ve written example queries you can copy / paste.

INSERT INTO analytics.rides
SELECT 
    r['vendor_id']::text,
    r['pickup_datetime']::timestamp,
    r['dropoff_datetime']::timestamp,
    r['passenger_count']::int,
    r['trip_distance']::float,
    r['pickup_longitude']::float,
    r['pickup_latitude']::float,
    r['rate_code']::text,
    r['store_and_fwd_flag']::text,
    r['dropoff_longitude']::float,
    r['dropoff_latitude']::float,
    r['fare_amount']::float,
    r['surcharge']::float,
    r['mta_tax']::float,
    r['tip_amount']::float,
    r['tolls_amount']::float,
    r['total_amount']::float
FROM read_parquet('https://idl.uw.edu/mosaic-datasets/data/nyc-rides-2010.parquet') AS r;

4. Run example queries

When using cloud hosted Hydra run each statement by clicking the green triangle next to each query in the SQL Editor.

5. Insert existing data

Now let’s test Hydra with your own data.

It’s a good idea to drop the “analytics.rides” sample data that was inserted earlier before loading your own files from S3, PostgreSQL, Google Cloud Storage, or local files.

To delete the sample data:

DROP TABLE analytics.rides;

Note: Switch the codebox tabs for steps on S3, Postgres, and/or local data.

-- Connect bucket, Session token is optional
INSERT INTO duckdb.secrets
 (type, key_id, secret, session_token, region)
 VALUES ('S3', 'access_key_id', 'secret_access_key', 'session_token', 'us-east-1');

-- example: create a table in the analytics schema from a file in S3.
CREATE TABLE analytics.your_table_name 
USING duckdb
AS
SELECT *
FROM read_parquet('s3://your_bucket_name/your_file_name.parquet') AS
(
   -- specify columns and types here, e.g.:
   -- id int,
   -- name text, ...
);

What’s Next?

We recommend