AI Self-Driving Cars (using CARLA)
Overview: This 180-day curriculum uses the CARLA simulator as the primary platform for autonomous driving RL. Trainees will start with imitation learning (behavioral cloning) and progress through deep RL algorithms (DDPG, SAC, PPO) to master lane following, obstacle avoidance, and intersection handling. Emphasis is on vision-based navigation, planning, and control integration, with engineering best practices (logging, CI/CD, reproducibility). By the end, you’ll deliver a real-time agent driving safely in a virtual city under varied weather and traffic conditions.
- Tech Stack: CARLA simulator (Python API), alternative sims (TORCS, Gym-Duckietown, LGSVL, PyTorch/TensorFlow for neural nets, OpenAI Gym or Stable-Baselines for RL wrappers, ROS (optional for control), Docker for environment reproducibility.
- Key Algorithms & Papers: Behavioral Cloning (Bojarski et al. 2016 for end-to-end CNN driving), DDPG (Lillicrap et al. 2015 for continuous control), SAC (Haarnoja et al. 2018), PPO (Schulman et al. 2017). CARLA original paper (Dosovitskiy et al. 2017) for simulator design 1.
- Project Repos & Tools: CARLA GitHub, Duckietown Gym (open-source lane-following sim), TORCS RL examples, Weights & Biases or TensorBoard for logging, pyGA or OpenCV for any vision pre-processing.
Block 1 (Days 1–15): Simulator Setup & Foundations
- Day 1-5: Install CARLA (and dependencies) and verify it runs 1 1. Set up version control (git repo) and CI testing (e.g., a CARLA smoke test). Familiarize with CARLA’s Python API: spawning a car, reading sensors (camera, LiDAR). Deliverable: A simple script that drives a car manually or in autopilot mode around a track, logging sensor data.
- Day 6-10: Overview of autonomous driving pipeline vs. end-to-end RL. Study Behavioral Cloning (end-to-end CNN mapping images to steering) used by NVIDIA. Collect a small driving dataset in CARLA (e.g. use CARLA autopilot or drive with keyboard to record images and steering). Set up an experiment tracker (TensorBoard/W&B) for logging training metrics (loss, reward).
- Day 11-15: Train a behavioral cloning model (supervised learning) on the collected data for lane following. Evaluate it in CARLA on the same route. Discuss covariate shift and intro to DAgger (Dataset Aggregation) to improve imitation. Deliverable: A CNN policy that can drive along a simple road by mimicking the dataset (demonstrating end-to-end imitation learning).
Block 2 (Days 16–30): Lane Following via Imitation & Basic RL
- Day 16-20: Expand training data for imitation: drive through curves, slight traffic, different lighting. Implement data augmentation (random brightness, slight camera angle shifts) to improve robustness. Retrain the behavioral cloning model. Deliverable: Improved cloned driver that stays in lane on varied sections (straight, curved) under nominal conditions.
- Day 21-25: Introduce Reinforcement Learning basics in simulation. Define a reward for lane keeping (e.g., +1 for staying in lane, -100 for collision/off-road). Wrap CARLA in a Gym environment for a simple road scenario. Use a low-dimensional state initially (e.g. lateral offset and heading) to validate the RL loop. Train a DQN or DDPG agent on this simplified state to center the car in the lane.
- Day 26-30: Transition to image-based RL. Use the front-camera image as state (possibly downsampled/grayscale to speed training). Apply a deep RL algorithm (start with DDPG or PPO via Stable-Baselines). Train the agent to drive between lane markings on an empty road. Monitor training with episode reward plots. Deliverable: An RL policy that achieves basic lane following from camera input (comparable or better than the imitation policy).
Block 3 (Days 31–45): Deep RL for Obstacle Avoidance
- Day 31-35: Introduce dynamic obstacles. Configure CARLA to spawn other vehicles or pedestrians on the road. Design a reward: e.g. +0.1 per timestep alive, -1,000 for collision, +50 for episode completion without crash. Implement a safety bubble sensor (front distance) as additional input to help agent sense obstacles. 16
- Day 36-40: Train using PPO (robust on-policy) for obstacle avoidance and lane-keeping combined. Tweak hyperparameters (learning rate, reward scaling) to stabilize training. Use Curriculum Learning: start with few obstacles at low speed, then gradually increase traffic density and speed limit as the agent improves.
- Day 41-45: Evaluate the RL agent’s emergent behavior: it should slow down or change lane (if possible) to avoid collisions. Compare performance of DDPG vs PPO vs SAC on this task (e.g., SAC’s stability advantages in continuous control). Deliverable: An agent that can navigate around randomly moving obstacles without accidents in a simple scenario (straight road with traffic) – demonstrating learned collision avoidance.
Block 4 (Days 46–60): Intersection Handling and Decision-Making
- Day 46-50: Set up an urban grid map in CARLA with intersections (4-way stops or traffic lights). Introduce high-level traffic rules: stopping at red lights, yielding to cross traffic. Expand state input with traffic light state info (or use image recognition of light color). Reward: +100 for completing a route correctly, -200 for running a red or collision.
- Day 51-55: Implement a hierarchical RL approach: a high-level policy decides at intersections (stop, go, turn), and a low-level policy controls steering/throttle within lanes. Alternatively, use a single policy but with additional inputs (traffic light status, proximity of crossing vehicles) so it can learn when to brake. Use policy shaping – e.g., penalize high speed when approaching an intersection.
- Day 56-60: Train the agent in a multi-intersection environment. Possibly pre-train on a rule-based policy for stopping (to speed up convergence). Evaluate if the agent learns to obey signals and handle cross-traffic. Deliverable: Agent successfully driving through several intersections, stopping appropriately and proceeding safely (passing a “traffic law compliance” test in simulation).
Block 5 (Days 61–75): Advanced Algorithms & Tuning
- Day 61-65: Deep dive into advanced RL algorithms: try Twin-Delayed DDPG (TD3) or Soft Actor Critic (SAC) for better stability. SAC’s entropy maximization can help exploration. Implement SAC using stable-baselines or RLlib on the lane-keeping task and compare learning curves to PPO/DDPG.
- Day 66-70: Apply Generative Adversarial Imitation Learning (GAIL) or DAGGER on driving data to combine strengths of imitation and RL. For example, use the earlier imitation model as a teacher policy and refine it with RL (this can speed up training in complex scenarios).
- Day 71-75: Hyperparameter optimization: systematically vary parameters (learning rate, batch size, reward weights) using optuna or sweep tool to maximize driving performance. Establish a reproducible training pipeline: fixed random seeds, script that can recreate the model from scratch, and CI tests that validate training doesn’t diverge (e.g., ensure average reward after 10k steps > X). Deliverable: A report on comparative performance of RL algorithms on the driving tasks, with an optimized configuration that yields the best safety and comfort (smooth control) metrics.
Block 6 (Days 76–90): Integrating Perception and Sensor Fusion
- Day 76-80: Introduce a modular perception component. Instead of raw pixels only, incorporate an object detection model (e.g., YOLOv5) to identify vehicles or pedestrians ahead. The agent’s state can include positions of nearby objects (relative distance/angle) in addition to raw image or instead of it. Alternatively, use CARLA’s depth or semantic segmentation camera to feed a processed representation to the RL policy.
- Day 81-85: Implement sensor fusion: combine camera with a simulated LiDAR or radar. Use a PyTorch model that processes camera images and LiDAR point cloud (or a occupancy grid) to form a richer state. This block teaches how to handle multiple sensor modalities and the concept of late fusion vs early fusion in the network.
- Day 86-90: Retrain the driving agent with the new sensor inputs. The expectation is improved robustness (e.g., detection of a pedestrian in rain might fail on camera but LiDAR helps). Evaluate under scenarios like nighttime or glare where one sensor might be less effective. Deliverable: A modified driving policy that uses fused sensor data and shows fewer failure cases in edge conditions (e.g., can avoid a suddenly appearing obstacle more reliably than vision-only agent). Documentation of how perception module improves safety.
Block 7 (Days 91–105): Robustness in Diverse Conditions
- Day 91-95: Configure CARLA to randomize environment conditions each episode: weather (rain, fog, bright sun) and time of day 19 20. Also randomize traffic density and behaviors. This domain randomization training will force the agent to generalize 4. Log performance across scenarios (maintain a test set of fixed scenarios to measure generalization).
- Day 96-100: Incorporate failure case analysis. For instance, if the agent struggles in heavy rain at night, collect those episodes and analyze sensor data. Potentially augment training with those specific scenarios (targeted domain randomization). Introduce noise in sensors (blurred images, dropped LiDAR points) to simulate sensor failures and make the policy tolerant.
- Day 101-105: Safety constraints: ensure the policy never accelerates aggressively in poor visibility. Implement a rule-based safety layer that caps speed when needed (this can run in parallel as an override – an example of safe RL or rule-constrained RL). By the end, the agent should handle a wide range of conditions. Deliverable: A robustness test suite results – e.g., 100 simulation runs across varied weather/traffic with zero crashes. The agent demonstrating consistent lane keeping and compliant stops despite environment variations.
Block 8 (Days 106–120): Planning and Map Integration
- Day 106-110: Shift focus to route planning. Introduce global planning by using CARLA’s map/ waypoint API or a simple A* on road network. The agent will receive a sequence of waypoints or a high-level instruction (e.g., “turn left at next intersection”) in addition to sensory input. This mimics a GPS navigation input.
- Day 111-115: Combine classical planning with RL: use the waypoints as sub-goals for the RL policy. For example, the planner might give the next turn or next lane change needed, and the RL agent’s reward includes reaching that waypoint. This teaches how to integrate a deterministic planner with a learned controller (hybrid approach).
- Day 116-120: Scenario: end-to-end city navigation. Task the agent to drive from point A to B across town, following traffic rules and dynamic obstacles. Evaluate success rate and path efficiency. If the agent deviates, implement a correction mechanism (like re-planning a path if off-route). Deliverable: Agent completes point-to-point drives in the simulator, correctly executing a given route. Demonstration could show the car starting in one neighborhood and autonomously navigating to another via planned waypoints, with the RL policy handling local decisions (a fusion of planning and learned control).
Block 9 (Days 121–135): Testing, Evaluation & Safety Metrics
- Day 121-125: Establish rigorous evaluation metrics: average distance traveled between interventions (ADTI), number of infractions (collisions, red-light violations) per hour 15, completion rate of routes, comfort metrics (steering jitter, acceleration spikes). Write scripts to automatically run the agent through a battery of scenarios and log these metrics.
- Day 126-130: Compare the RL agent to baseline controllers: CARLA’s built-in autopilot or a simple PID controller for lane keeping. If available, also compare to an end-to-end supervised model’s performance. Analyze where RL excels (e.g., handling surprise obstacles) and where it might still fail.
- Day 131-135: Conduct a safety audit. Ensure the agent follows the “10 Commandments” of safe driving (maintaining safe distance, speed limit adherence, etc.). If any specific unsafe behavior is observed (e.g., rolling stops), adjust reward function or add a penalty and retrain briefly to correct it. Incorporate an external “safety checker” that monitors for rule violations in real-time and logs them (could use CARLA’s event callbacks). Deliverable: An evaluation report with quantitative metrics and qualitative observations, demonstrating that the agent meets predefined safety performance thresholds (e.g., >98% of simulated miles without collision, 100% stop compliance at red lights).
Block 10 (Days 136–150): Deployment Preparation (Code & Model)
- Day 136-140: Refactor code for clarity and reliability. Separate the training code, policy inference code, and evaluation scripts. Add documentation/comments throughout. Build a Docker image containing CARLA and the trained model, so that others can easily run the agent. Use this to test run the simulation on a different machine to ensure portability.
- Day 141-145: Implement continuous integration (CI) tests: for example, a test that spins up CARLA in headless mode, runs the agent for 100 steps, and verifies no crashes (both software and vehicle). Also include unit tests for any utility functions (e.g., reward computation, sensor data normalization). This ensures future changes don’t break core functionality.
- Day 146-150: Set up model versioning and CI/CD for model updates. For instance, use DVC (Data Version Control) or similar to handle large model files, and a simple script to reload the latest model into the simulator. Optionally, connect this to a cloud simulation service to run nightly regression tests (simulate a few episodes each night and track if performance deviates). Deliverable: A production-ready repository: dockerized, with CI passing, and clear instructions to deploy the trained self-driving agent on any machine with CARLA.
Block 11 (Days 151–165): User Interface & Demo
- Day 151-155: Develop a minimal GUI to visualize the agent’s driving in real-time. This could be as simple as overlaying the agent’s chosen path on CARLA’s camera feed or a dashboard showing speed, throttle, and detection of traffic lights. Leverage CARLA’s spectator camera to present a cinematic view in a demo video. 4
- Day 156-160: Incorporate a manual override / teleoperation mode in the GUI (for example, the user can take control via keyboard and give control back to the agent). This mimics real-world testing where a safety driver can intervene. It also helps in demonstrations to show how the agent recovers when control is returned.
- Day 161-165: Prepare a polished demonstration scenario: a 10-minute drive through the virtual city with diverse elements (rain in one section, construction detour in another, busy traffic in another). Record this run. Develop presentation slides summarizing the project, including key graphs (learning curve, safety metrics) and screenshots. Deliverable: A final demo video of the agent driving autonomously in the city, accompanied by a dashboard of real-time stats (speed, detected objects, etc.), ready to show to stakeholders.
Block 12 (Days 166–180): Final Review & Project Hand-off
- Day 166-170: Conduct a full review of the project against initial goals. Verify that the agent can: (1) follow lanes consistently, (2) avoid obstacles and collisions, (3) handle intersections and traffic lights, and (4) drive in various conditions. Any remaining shortcomings are noted for future work. 2
- Day 171-175: Write the final report and documentation. This should include the training approach, challenges encountered (and how they were solved), hyperparameters used, and instructions on how to retrain or adapt the agent to new maps. Also include references to academic work that inspired design choices (e.g., how reward shaping was guided by prior surveys ). Ensure all citation links (papers, GitHub) are compiled for readers.
- Day 176-180: Handoff and next steps. Package the trained model and code for the “customer” (this could be an internal team or open-source release). Conduct a knowledge transfer session explaining the system architecture (perception module, policy network, planning integration). Suggest future enhancements, e.g. transferring the policy to a real RC car (sim-to-real using Duckietown) or participating in the CARLA Autonomous Driving leaderboard. Deliverable: Final project package – including a comprehensive manual and the safe-driving agent that meets the success criteria in the CARLA virtual city, completing the 6-month training program.
Resources
- Self-Driving Car Resources:
- Simulators:
- Algorithms: