Table of Contents
We’re going to build an RSS feed aggregator in Go! It’s a web server that allows clients to:
- Add RSS feeds to be collected
- Follow and unfollow RSS feeds that other users have added
- Fetch all of the latest posts from the RSS feeds they follow
RSS feeds are a way for websites to publish updates to their content. You can use this project to keep up with your favorite blogs, news sites, podcasts, and more!
Caveat: There is a more updated version of this course that’s more interactive over on Boot.dev here. If you’re looking for a more interactive experience, I recommend checking it out!
Pre-requisites:
This project assumes that you’ve already taken our “Learn Web Servers” course. If you haven’t, go take it! It will give you a solid foundation for this project.
Learning goals:
- Learn how to integrate a Go server with PostgreSQL
- Learn about the basics of database migrations
- Learn about long-running service workers
Setup:
Before we dive into the project, let’s make sure you have everything you’ll need on your machine.
- An editor. I use VS code, you can use whatever you like.
- A command line. I work on Mac OS/Linux, so instructions will be in Bash. I recommend WSL 2 if you’re on Windows so you can still use Linux commands.
- The latest Go toolchain.
- If you’re in VS Code, I recommend the official Go extension.
- An HTTP client. I use Thunder Client, but you can use whatever you like.
If you’re ready, move on to the next step!
Optional video walkthrough:
Try to build this project on your own! Use this video if you get stuck, or to compare your architecture and coding patterns to mine.
Subscribe to my YouTube channel if this video was helpful!
Boilerplate 🔗
Before we get to the app-specific stuff, let’s scaffold a simple CRUD server, hopefully, you’re already familiar with how to do this from the “Learn Web Servers” course! That said, I’ll provide a quick refresher.
It might be a good idea to use your “Learn Web Servers” code as a reference while building this project!
Assignment 🔗
- Create a new project. You should know how to do this by now! My process is:
- Create a repo on GitHub (initialized with a README).
- Clone it onto your machine.
- Create a new Go module with
go mod init
. - Create a
main.go
file in the root of your project, and add afunc main()
to it.
- Install the godotenv package using
go get github.com/joho/godotenv
. - Create a gitignore’d
.env
file in the root of your project and add the following:
PORT="8080"
The .env
file is a convenient way to store environment (configuration) variables.
- Use godotenv.Load() to load the variables from the file into your environment at the top of
main()
. - Use os.Getenv() to get the value of
PORT
.
-
Create a router and server
- Create a ServeMux using http.NewServeMux
- Create a new http.Server and add the port and your multiplexer to it.
- Start the server
-
Create some JSON helper functions:
-
respondWithJSON(w http.ResponseWriter, code int, payload interface{})
-
respondWithError(w http.ResponseWriter, code int, msg string)
(which callsrespondWithJSON
with error-specific values)
-
You used these in the “Learn Web Servers” course, so you should be able to figure out how to implement them again. They’re simply helper functions that write an HTTP response with:
- A status code
- An
application/json
content type - A JSON body
- Add a readiness handler. It should handle
GET /v1/healthz
requests. It should return a 200 status code and a JSON body:
{
"status": "ok"
}
The purpose of this endpoint is for you to test your respondWithJSON
function.
- Add an error handler.
Add a handler for GET /v1/err
requests. It should return a 500 status code and a JSON body:
{
"error": "Internal Server Error"
}
The purpose of this endpoint is for you to test your respondWithError
function.
- Run and test your server.
go build -o out && ./out
Once it’s running, use an HTTP client to test your endpoints.
PostgreSQL 🔗
PostgreSQL is a production-ready, open-source database. It’s a great choice database for many web applications, and as a back-end engineer, it might be the single most important database to be familiar with.
How does PostgreSQL work? 🔗
Postgres, like most other database technologies, is itself a server. It listens for requests on a port (Postgres’ default is :5432
), and responds to those requests. To interact with Postgres, first you will install the server and start it. Then, you can connect to it using a client like psql or PGAdmin.
- Install Postgres.
Mac OS with brew
brew install postgresql@15
Linux / WSL (Debian). Here are the docs from Microsoft, but simply:
sudo apt update
sudo apt install postgresql postgresql-contrib
- Ensure the installation worked. The
psql
command-line utility is the default client for Postgres. Use it to make sure you’re on version 14+ of Postgres:
psql --version
- (Linux only) Update postgres password:
sudo passwd postgres
Enter a password, and be sure you won’t forget it. You can just use something easy like postgres
.
- Start the Postgres server in the background
- Mac:
brew services start postgresql
- Linux:
sudo service postgresql start
- Connect to the server. I recommend simply using the
psql
client. It’s the “default” client for Postgres, and it’s a great way to interact with the database. While it’s not as user-friendly as a GUI like PGAdmin, it’s a great tool to be able to do at least basic operations with.
Enter the psql
shell:
- Mac:
psql postgres
- Linux:
sudo -u postgres psql
You should see a new prompt that looks like this:
postgres=#
- Create a new database. I called mine
blogator
:
CREATE DATABASE blogator;
- Connect to the new database:
\c blogator
You should see a new prompt that looks like this:
blogator=#
- Set the user password (Linux only)
ALTER USER postgres PASSWORD 'postgres';
For simplicity, I used postgres
as the password. Before, we altered the system user’s password, now we’re altering the database user’s password.
- Query the database
From here you can run SQL queries against the blogator
database. For example, to see the version of Postgres you’re running, you can run:
SELECT version();
If everything is working, you can move on. You can type exit
to leave the psql
shell.
Create Users 🔗
In this step, we’ll be adding an endpoint to create new users on the server. We’ll be using a couple of tools to help us out:
- database/sql: This is part of Go’s standard library. It provides a way to connect to a SQL database, execute queries, and scan the results into Go types.
- sqlc: SQLC is an amazing Go program that generates Go code from SQL queries. It’s not exactly an ORM, but rather a tool that makes working with raw SQL almost as easy as using an ORM.
- Goose: Goose is a database migration tool written in Go. It runs migrations from the same SQL files that SQLC uses, making the pair of tools a perfect fit.
- Install SQLC
SQLC is just a command line tool, it’s not a package that we need to import. I recommend installing it using go install
. Installing Go CLI tools with go install
is easy and ensures compatibility with your Go environment.
go install github.com/sqlc-dev/sqlc/cmd/sqlc@latest
Then run sqlc version
to make sure it’s installed correctly.
- Install Goose
Like SQLC, Goose is just a command line tool. I also recommend installing it using go install
:
go install github.com/pressly/goose/v3/cmd/goose@latest
Run goose -version
to make sure it’s installed correctly.
- Create the
users
migration
I recommend creating an sql
directory in the root of your project, and in there creating a schema
directory.
A “migration” is a SQL file that describes a change to your database schema. For now, we need our first migration to create a users
table. The simplest format for these files is:
number_name.sql
For example, I created a file in sql/schema
called 001_users.sql
with the following contents:
-- +goose Up
CREATE TABLE ...
-- +goose Down
DROP TABLE users;
Write out the CREATE TABLE
statement in full, I left it blank for you to fill in. A user
should have 4 fields:
- id: a
UUID
that will serve as the primary key - created_at: a
TIMESTAMP
that can not be null - updated_at: a
TIMESTAMP
that can not be null - name: a string that can not be null
The -- +goose Up
and -- +goose Down
comments are required. They tell Goose how to run the migration. An “up” migration moves your database from its old state to a new state. A “down” migration moves your database from its new state back to its old state.
By running all of the “up” migrations on a blank database, you should end up with a database in a ready-to-use state. “Down” migrations are only used when you need to roll back a migration, or if you need to reset a local testing database to a known state.
- Get your connection string
A connection string is just a URL with all of the information needed to connect to a database. The format is:
protocol://username:password@host:port/database
Here are examples:
- Mac OS (no password, your username):
postgres://wagslane:@localhost:5432/blogator
- Linux (password from last lesson, postgres user):
postgres://postgres:postgres@localhost:5432/blogator
Test your connection string by running psql
, for example:
psql "postgres://wagslane:@localhost:5432/blogator"
It should connect you to the blogator
database directly. If it’s working, great. exit
out of psql
and save the connection string.
- Run the migration.
cd
into the sql/schema
directory and run:
goose postgres CONN up
Where CONN
is the connection string for your database.
Run your migration! Make sure it works by using psql
to find your newly created users
table:
psql blogator
\dt
- Add your connection string to your
.env
file. When using it withgoose
, you’ll use it in the format we just used. However, here in the.env
file it needs an additionalsslmode=disable
query string:
protocol://username:password@host:port/database?sslmode=disable
Your application code needs to know to not try to use SSL locally.
- Configure SQLC. You’ll always run the
sqlc
command from the root of your project. Create a file calledsqlc.yaml
in the root of your project. Here is mine:
version: "2"
sql:
- schema: "sql/schema"
queries: "sql/queries"
engine: "postgresql"
gen:
go:
out: "internal/database"
We’re telling SQLC to look in the sql/schema
directory for our schema structure (which is the same set of files that Goose uses, but sqlc
automatically ignores “down” migrations), and in the sql/queries
directory for queries. We’re also telling it to generate Go code in the internal/database
directory.
- Write a query to create a user. Inside the
sql/queries
directory, create a file calledusers.sql
. Here is mine:
-- name: CreateUser :one
INSERT INTO users (id, created_at, updated_at, name)
VALUES ($1, $2, $3, $4)
RETURNING *;
$1
, $2
, $3
, and $4
are parameters that we’ll be able to pass into the query in our Go code. The :one
at the end of the query name tells SQLC that we expect to get back a single row (the created user).
Keep the SQLC docs handy, you’ll probably need to refer to them again later.
- Generate the Go code. Run
sqlc generate
from the root of your project. It should create a new package of go code ininternal/database
. - Import a PostgreSQL driver
We need to add and import a Postgres driver to use it in our code. Install it in your module:
go get github.com/lib/pq
Add this import to the top of your main.go
file:
import _ "github.com/lib/pq"
This is one of my least favorite things working with SQL in Go currently. You have to import the driver, but you don’t use it directly anywhere in your code. The underscore tells Go that you’re importing it for its side effects, not because you need to use it.
- Open a connection to the database, and store it in a config struct
If you recall from the web servers project, it’s common to use a “config” struct to store shared data that HTTP handlers need access to. We’ll do the same thing here. Mine looks like this:
type apiConfig struct {
DB *database.Queries
}
At the top of main()
load in your database URL from your .env
file, and then .Open() a connection to your database:
db, err := sql.Open("postgres", dbURL)
Use your generated database
package to create a new *database.Queries
, and store it in your config struct:
dbQueries := database.New(db)
- Create an HTTP handler to create a user
Endpoint: POST /v1/users
Example body:
{
"name": "Lane"
}
Example response:
{
"id": "3f8805e3-634c-49dd-a347-ab36479f3f83",
"created_at": "2021-09-01T00:00:00Z",
"updated_at": "2021-09-01T00:00:00Z",
"name": "Lane"
}
Use Google’s UUID package to generate a new UUID for the user’s ID. Both created_at
and updated_at
should be set to the current time. If we ever need to update a user, we’ll update the updated_at
field.
I’m a fan of a convention where every table in my database has:
- An
id
field that is a UUID (if you’re curious why, read this) - A
created_at
field that indicates when the row was created - An
updated_at
field that indicates when the row was last updated
- Test your handler with an HTTP client!
C’mon, you know what to do.
API Key 🔗
- Add an “api key” column to the users table
Use a new migration file in the sql/schema
directory to add a new column to the users
table. I named my file 002_users_apikey.sql
.
The “up” migration adds the column, and the “down” migration removes it.
Use a VARCHAR(64)
that must be unique and not null. Using a string of a specific length does two things:
- It ensures we don’t accidentally store a key that’s too long (type safety)
- It’s more performant than using a variable length
TEXT
column
Because we’re enforcing the NOT NULL
constraint, and we already have some users in the database, we need to set a default value for the column. A blank default would be a bit silly: that’s no better than null! Instead, we’ll generate valid API keys (256-bit hex values) using SQL. Here’s the function I used:
encode(sha256(random()::text::bytea), 'hex')
When you’re done, use goose postgres CONN up
to perform the migration.
- Create an API key for new users. Update your “create user” SQL query to use the same SQL function to generate API keys for new users.
- Add a new SQL query to get a user by their API key. This query can live in the same file as the “create user” query, or you can make a new one - it’s up to you.
- Generate new Go code. Run
sqlc generate
to generate new Go code for your queries. - New endpoint: Add a new endpoint that allows users to get their own user information. You’ll need to parse the header and use your new query to get the user data.
Endpoint: GET /v1/users
Request headers: Authorization: ApiKey <key>
Example response body:
{
"id": "3f8805e3-634c-49dd-a347-ab36479f3f83",
"created_at": "2021-09-01T00:00:00Z",
"updated_at": "2021-09-01T00:00:00Z",
"name": "Lane",
"api_key": "cca9688383ceaa25bd605575ac9700da94422aa397ef87e765c8df4438bc9942"
}
Test your endpoints with an HTTP client before moving on!
Don’t forget that each time you update your queries or schema you’ll need to regenerate your Go code with sqlc generate
. If you update the schema you’ll also need to migrate your database up (and maybe down).
Create a Feed 🔗
An RSS feed is just a URL that points to some XML. Users will be able to add feeds to our database so that our server (in a future step) can go download all of the posts in the feed (like blog posts or podcast episodes).
- Create a feeds table
Like any table in our DB, we’ll need the standard id
, created_at
, and updated_at
fields. We’ll also need a few more:
name
: The name of the feed (like “The Changelog, or “The Boot.dev Blog”)url
: The URL of the feeduser_id
: The ID of the user who added this feed
I’d recommend making the url
field unique so that in the future we aren’t downloading duplicate posts. I’d also recommend using ON DELETE CASCADE on the user_id
foreign key so that if a user is deleted, all of their feeds are automatically deleted as well.
Write the appropriate migrations and run them.
-
Add a new query to create a feed, then use
sqlc generate
to generate the Go code. -
Create some authentication middleware. Most of the endpoints going forward will require a user to be logged in. Let’s DRY up our code by creating some middleware that will check for a valid API key.
Now, I’m not a fan of how some frameworks handle stateful middleware using context (middleware that passes data down to the next handler). I prefer to create custom handlers that accept extra values. You can add middleware however you like, but here are some examples from my code.
A custom type for handlers that require authentication 🔗
type authedHandler func(http.ResponseWriter, *http.Request, database.User)
Middleware that authenticates a request, gets the user and calls the next authed handler 🔗
func (cfg *apiConfig) middlewareAuth(handler authedHandler) http.HandlerFunc {
///
}
Using the middleware 🔗
v1Router.Get("/users", apiCfg.middlewareAuth(apiCfg.handlerUsersGet))
- Create a handler to create a feed
Create a handler that creates a feed. This handler and the “get user” handler should use the authentication middleware.
Endpoint: POST /v1/feeds
Example request body:
{
"name": "The Boot.dev Blog",
"url": "https://blog.boot.dev/index.xml"
}
Example response body:
{
"id": "4a82b372-b0e2-45e3-956a-b9b83358f86b",
"created_at": "2021-05-01T00:00:00Z",
"updated_at": "2021-05-01T00:00:00Z",
"name": "The Boot.dev Blog",
"url": "https://blog.boot.dev/index.xml",
"user_id": "d6962597-f316-4306-a929-fe8c8651671e"
}
- Test your handler using an HTTP client, then use your database client to make sure the data was saved correctly.
Get all feeds 🔗
Create a new endpoint to retrieve all of the feeds in the database. This endpoint should not require authentication.
You should be familiar with all of the steps to make this happen by now, use your other endpoints as a reference.
Feed Follows 🔗
Aside from just adding new feeds to the database, users can specify which feeds they want to follow. This will be important later when we want to show users a list of posts from the feeds they follow.
Add support for the following endpoints, and update the “create feed” endpoint as specified below.
What is a “feed follow”? 🔗
A feed follow is just a link between a user and a feed. It’s a many-to-many relationship, so a user can follow many feeds, and a feed can be followed by many users.
Creating a feed follow indicates that a user is now following a feed. Deleting it is the same as “unfollowing” a feed.
It’s important to understand that the ID
of a feed follow is not the same as the ID
of the feed itself. Each user/feed pair will have a unique feed follow id.
- Create a feed follow
Endpoint: POST /v1/feed_follows
Requires authentication
Example request body:
{
"feed_id": "4a82b372-b0e2-45e3-956a-b9b83358f86b"
}
Example response body:
{
"id": "c834c69e-ee26-4c63-a677-a977432f9cfa",
"feed_id": "4a82b372-b0e2-45e3-956a-b9b83358f86b",
"user_id": "0e4fecc6-1354-47b8-8336-2077b307b20e",
"created_at": "2017-01-01T00:00:00Z",
"updated_at": "2017-01-01T00:00:00Z"
}
- Delete a feed follow
Endpoint: DELETE /v1/feed_follows/{feedFollowID}
- Get all feed follows for a user
Endpoint: GET /v1/feed_follows
Requires authentication
Example response:
[
{
"id": "c834c69e-ee26-4c63-a677-a977432f9cfa",
"feed_id": "4a82b372-b0e2-45e3-956a-b9b83358f86b",
"user_id": "0e4fecc6-1354-47b8-8336-2077b307b20e",
"created_at": "2017-01-01T00:00:00Z",
"updated_at": "2017-01-01T00:00:00Z"
},
{
"id": "ad752167-f509-4ff3-8425-7781090b5c8f",
"feed_id": "f71b842d-9fd1-4bc0-9913-dd96ba33bb15",
"user_id": "0e4fecc6-1354-47b8-8336-2077b307b20e",
"created_at": "2017-01-01T00:00:00Z",
"updated_at": "2017-01-01T00:00:00Z"
}
]
- Automatically create a feed follow when creating a feed
When a user creates a new feed, they should automatically be following that feed. They can of course choose to unfollow it later, but it should be there by default.
The response of this endpoint should now contain both entities:
{
"feed": { the feed object },
"feed_follow": { the feed follow object }
}
- Test. As always, test all of your endpoints and make sure they work. Additionally, make sure that they return the proper error codes when they receive invalid inputs.
Scraper 🔗
This is going to be a fairly large step. I recommend breaking it down into smaller pieces and functions, and testing each piece as you go.
Here are some different strategies I use depending on the situation:
- Write a unit test for a function that has simple inputs and outputs
- Edit
main.go
to call a function so I can quickly test it by running the whole program. Remove the call after testing and plug it into its proper place - Put the code in a package, then write a separate
main
package (just a littlemain()
script) that I can use to independently test the code in the package
Commit your code each time you get a new piece working.
- Add a
last_fetched_at
column to thefeeds
table.
We need to keep track of when we last fetched the posts from a feed. This should be a nullable timestamp.
The sql.NullTime
type is useful for nullable timestamps on the database side, but it’s not great for marshaling into JSON. It results in a weird nested object. I’d recommend converting it to a *time.Time
before returning it across the HTTP response.
I map all of my database structs to a different struct that has the intended JSON structure. This is a good way to keep your database and HTTP APIs separate.
For example: func databaseFeedToFeed(feed database.Feed) Feed
- Add
GetNextFeedsToFetch()
query to the database.
It should return the next n
feeds that need to be fetched, ordered by last_fetched_at
, but with NULL
values first. We obviously want to fetch the feeds that have never been fetched before or the ones that were fetched the longest time ago.
- Add a
MarkFeedFetched()
query to the database.
It should update a feed and set its last_fetched_at
to the current time. Don’t forget to also update the updated_at
field because we’ve updated the record.
- Write a function that can fetch data from a feed.
This function should accept the URL of a live RSS feed, and return the parsed data in a Go struct.
You can test with these ones:
https://blog.boot.dev/index.xml
https://wagslane.dev/index.xml
And any other blogs you enjoy that have RSS feeds.
Please be careful not to DDOS any of the sites you’re fetching from. Don’t send too many requests!
You can parse the returned XML with the encoding/xml package, it works very similarly to encoding/json
. Define the structure of an RSS feed as a Go struct, then unmarshal the XML into that struct.
- Write a worker that fetches feeds continuously.
This function should, on an interval (say every 60 seconds or so):
- Get the next
n
feeds to fetch from the database (you can configuren
, I used10
) - Fetch and process all the feeds at the same time (you can use sync.WaitGroup for this)
For now, “process” the feed by simply printing out the titles of each post
I recommend adding a lot of logging messages to this worker so that as it runs you can see what it’s doing!
- Call your worker from
main.go
. Be sure to start the worker in its own goroutine, so that it runs in the background and processes feeds even as it simultaneously handles new HTTP requests.
Posts 🔗
- Add a
posts
table to the database.
A post is a single entry from a feed. It should have:
id
- a unique identifier for the postcreated_at
- the time the record was createdupdated_at
- the time the record was last updatedtitle
- the title of the posturl
- the URL of the post this should be uniquedescription
- the description of the postpublished_at
- the time the post was publishedfeed_id
- the ID of the feed that the post came from
Some of these fields can probably be null, others you might want to be more strict about - it’s up to you.
-
Add a “create post” SQL query to the database. This should insert a new post into the database.
-
Add a “get posts by user” SQL query to the database. Order the results so that the most recent posts are first. Make the number of posts returned configurable.
-
Update your scraper to save posts. Instead of just printing out the titles of the posts, save them to the database! If you encounter an error where the post with that URL already exists, just ignore it. That will happen a lot. If it’s a different error, you should probably log it. Make sure that you’re parsing the “published at” time properly from the feeds. Sometimes they might be in a different format than you expect, so you might need to handle that.
-
Add a “get posts by user” HTTP endpoint.
Endpoint: GET /v1/posts
This is an authenticated endpoint
This endpoint should return a list of posts for the authenticated user. It should accept a limit
query parameter that limits the number of posts returned. The default if the parameter is not provided can be whatever you think is reasonable.
- Start scraping some feeds! Test your scraper to make sure it’s working! Go find some of your favorite websites and add their RSS feeds to your database. Then start your scraper and watch it go to work.
Submit your Git repo 🔗
Your link should look something like https://github.com/github-username/repo-name
.
Ideas for extending the project 🔗
You don’t have to extend this project, but here are just a few ideas if you’re interested:
- Support pagination of the endpoints that can return many items
- Support different options for sorting and filtering posts using query parameters
- Classify different types of feeds and posts (e.g. blog, podcast, video, etc.)
- Add a CLI client that uses the API to fetch and display posts, maybe it even allows you to read them in your terminal
- Scrape lists of feeds themselves from a third-party site that aggregates feed URLs
- Add support for other types of feeds (e.g. Atom, JSON, etc.)
- Add integration tests that use the API to create, read, update, and delete feeds and posts
- Add bookmarking or “liking” to posts
- Create a simple web UI that uses your backend API
Solution 🔗
If you get lost at any point, I’ve uploaded my solution repo to GitHub here.