Web Scraping With Golang



Python language is more focused on writing web applications. Go needs more code to perform the same number of actions. Python needs fewer code compares to Go. 28.5 K Github stars: 67.5 K Github stars. Go developer ranges from approximately $64,089 per year: The average salary for a Python Developer is $120,359 per year in the United States.

Today, we’re looking at how you can build your first web application in Go, so open up your IDEs and let’s get started.

GoLang Web App Basic Setup

We’ll have to import net/http, and setup our main function with placeholders.

  • Web Scraping A web scraping system aiming to simplify data extraction from the web ferret is a web scraping system aiming to simplify data extraction from the web for such things like UI testing, machine learning and analytics.
  • Jul 30, 2020 Golang Books. Go is a statically typed programming language that is expressive, concise, clean, and efficient.

http.HandleFunc is a function that handles the paths in a url. For example http://www.golangdocs.com/2020/08/23.

  • Here, the index page is linked to the homepage of our site.
  • ListenAndServe listens to the port in the quotes, which is 8000. Once we run our web app, you can find it at localhost:8000.

Next, we need to configure the index page, so let’s create that:

Similar to if you’ve ever worked on Django, our function for index page takes input as a request to a url, and then responds with something. Replace the inside of the index_page function with anything of your choice (the w implies we want to write something), say,

Web Scraping With Golang In Windows

Save this file as “webApp.go”, and we can run the following command in the terminal:

The following page comes up at localhost:8000 –

ResponseWriter Output HTML too

With Golang ResponseWriter, it is possible to directly format the text using HTML tags.

and that gives us the desired output:

This is still just one page, but say you wanted to make your site so that it will not return an error when you type localhost:8000/something_else.

Let’s code for that !

Output:

Voila !

Gorilla Mux for ease of web app development

Let me introduce you to a package named Gorilla Mux, which will make your web development much easier. So first, let’s install it using go get in the terminal.

We’ll do a few changes to our above code and use gorilla mux router instance instead of our indexHandler:

GoLang web application HTML templates

The hard coded design is quite plain and unimpressive. There is a method to rectify that – HTML templates. So let’s create another folder called “templates”, which will contain all the page designs.

We’ll also add a new file here called “index.html”, and add something simple:

Let’s switch back to our main .go file, and import our “html/template” package. Since our templates must be accessible from all handlers, let’s convert it to a global object:

Now we need to tell golang to parse our index.html for the template design and instantiate into our templates object:

Then modify the indexPage handler to contain:

And now if we run it, we’ll have exactly what we wanted.

Using Redis with Go web app

As a brief introduction to Redis, which we’ll be using as our database, they describe themselves best:

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams.

https://redis.io/Web

So first download and install Redis: https://redis.io/download

Import the go-redis package and declare a global object:

Instantiate the redis client in main function:

and we need to grab some data from the redis server:

and then render into the index.html file:

We’re done configuring our html, which will take the elements from the comments array in our redis client, and place them in our web app.

So now we can open our command line and type in redis-cli to enter the redis shell, where we can push comments into the empty array:

Then if we run our app, you can see that it is now fetching the comments from the server. It would be able to do the same for, say, an AWS server.

Ending Notes

Making a web application can take anywhere from a few days to a few months depending on the complexity of the application. For every button or functionality, there is help in the official Golang documentation, so definitely check that out.

References

Follow me on twitch!

In this article we’re going to have a look at how to mock http connections while running your tests on your golang application.

Since we do not want to call the real API or crawl an actual site to make sure that our application works correctly, we want to make sure that our application works within our defined parameters.

There's a great module that can help us with the task of mocking HTTP responses for tests called httpmock

HTTP mocks for web scraping

Let's say we have a component in our application that will do some web scraping, so we might use something like goquery.

In the below example we'll use a simple function that visits a website and extracts the content of the <title> tag.

filename: scrape.go

Now if we are to write a unit test for that, we can do that as follows:

filename: scrape_test.go

In the test we run the function and compare the title we expect with the title that was scraped by the function.

Now the problem with this test is, that when ever we run go test it will actually go to my website and read the title. This means two things:

Web Scraping With Golang Command

  1. Our tests will be slower and more error prone than they could be
  2. I can never change my website title without changing the tests for this project
  3. Most important: We introduced a dependency outside our control for our program that doesn't have any relation to it

To fix this we commonly use mocks, a way of faking http responses, but to actually have the exchange of information happen on the computer where the tests are run, without having to rely on an external webserver or API backend to be available.

HTTP mocks for API requests

In Golang we can use httpmock to intercept any http requests made and pin the responses in our tests. This way we can verify that our program works correctly, without having to actually send a requests over the network.

Web Scraping With Go

To install httpmock we can add a go.mod file:

and running go mod download.

Rewriting our scrape_test.go would look like this:

after which we can run go test and it should produce the following output:

Let's go over the most important changes ot the file:

  • myMockPage :=... sets up our example response, a piece of plain text that our function will parse into a HTML and look for the title
  • httpmock.Activate() activates the mocking, before this no requests can be intercepted
  • httpmock.RegisterResponder() defines the METHOD and the URL, so GET or POST and an address at which we fake an http response
  • httpmock.NewStringResponder will need a status code and a string to respond with instead of what actually lives at that URL
  • httpmock.DeactivateAndReset() stops mocking responses for the rest of the test

If you instead want to mock an API response you can use something like this:

That's it! Our client consuming the string should take care of the JSON parsing.

Web Scraping With Golang

If you're familiar with mocking http connections in node.js you may have heard of the nock library, which is pretty popular when building JavaScript projects.

Hope you enjoyed this little post about mocking in GO, let me know what you're building in the comments!

Thank you for reading! If you have any comments, additions or questions, please leave them in the form below! You can also tweet them at me

If you want to read more like this, follow me on feedly or other rss readers