We will connect to Hydra using psql, the terminal-based front-end to Postgres, but you can use any Postgres client you’d prefer and achieve the same result. We will start by creating our analytics schema with a table “analytics.rides” which we will populate with sample data from the New York Taxi data in step 3.
SELECT COUNT(*) … counts the number of rows in the analytics.rides table.
Q1: SELECT * FROM
Copy
Ask AI
SELECT * FROM analytics.rides LIMIT 10;
SELECT * FROM … retrieves up to 10 rows of data from the analytics.rides table.
Q2: SELECT vendor_id
Copy
Ask AI
SELECT vendor_id, count(*) AS count, sum(fare_amount) AS revenueFROM analytics.ridesGROUP BY vendor_idORDER BY count DESCLIMIT 10;
SELECT vendor_id … is list of the top 10 vendors, displaying the number of rides and total revenue for each, ordered by the greatest number of rides first.
When using cloud hosted Hydra run each statement by clicking the green triangle next to each query in the SQL Editor.
It’s a good idea to drop the “analytics.rides” sample data that was inserted earlier before loading your own files from S3, PostgreSQL, Google Cloud Storage, or local files.
To delete the sample data:
Copy
Ask AI
DROP TABLE analytics.rides;
Note: Switch the codebox tabs for steps on S3, Postgres, and/or local data.
Copy
Ask AI
-- Connect bucket, Session token is optionalINSERT INTO duckdb.secrets (type, key_id, secret, session_token, region) VALUES ('S3', 'access_key_id', 'secret_access_key', 'session_token', 'us-east-1');-- example: create a table in the analytics schema from a file in S3.CREATE TABLE analytics.your_table_name USING duckdbASSELECT *FROM read_parquet('s3://your_bucket_name/your_file_name.parquet') AS( -- specify columns and types here, e.g.: -- id int, -- name text, ...);