You can download the notebook from my Github repository: London Bikes

The London bicycle hires dataset is a collection of information about the usage of the Santander Cycle Hire Scheme in London, which is a public bicycle sharing system that allows people to rent bikes from different locations across the city.

Here the data includes the number of hires, the start and end dates and stations, the bike IDs, and the trip duration in minutes. The data can be used for various purposes, such as analyzing the patterns and trends of bike sharing, identifying the most popular routes and stations, and evaluating the impact of bike sharing on transport and environment.


Data Preparation

Create new project in Google Cloud Platform > Set up Billing IAM > Find account associated with your project > Add the "BigQuery Data Viewer" permissions with "Owner"


Search for "bigquery-public-data" datasets and click the starred icon


Expand the view and select london_bikes > (cycle_hire, cycle_stations) > Copy to a dataset in your project



Data Wrangling

Go to DataPrep > Design a workflow ingesting the 'cycle_hire' dataset > Edit Recipe > Data Transformation



Remove missing values: end_station_id
Sort Order: start_station_name
Replace 'Clapham Comm', Clapham Common North Side' to 'Clapham Common Northside: start_station_name
Set column to Integer: start_station_id, end_station_id

Select the original dataset as the destination (Note: Ensure the region in dataset is same as original dataset) > Run the flow > Completed



Perform the following SQL query to join the tables > Save as 'query.sql'
with h as (SELECT * FROM `bigquery-ids.london_bikes.cycle_hire`),
ss as (SELECT * FROM `bigquery-ids.london_bikes.cycle_stations`),
es as (SELECT * FROM `bigquery-ids.london_bikes.cycle_stations`)
select rental_id, duration, bike_id, bike_model,
start_station_id, start_station_name, start_date, end_station_id, end_station_name, end_date,
ss.installed start_installed, es.installed end_installed,
ss.latitude start_latitude, ss.longitude start_longitude,
es.latitude end_latitude, es.longitude end_longitude
from h
left join ss on start_station_id = ss.Id
left join es on end_station_id = es.Id limit 10000;


Results

On average,