This tutorial is meant for someone who’s finished a basic Hello World project with Node and Express and ready to take another step. In short, I’ll show you how to search/scrape/crawl Craigslist using AJAX along with Node and Express. I assume you’re working from a Unix-based environment, already have Node and Express installed, and understand how to navigate the terminal.
You can find the finished code on this repo if wish to bypass the tutorial altogether.
Let’s get right to it.
Here is an index of all the articles in the series that have been published to date:
- Part 1: Scraping Craigslist << CURRENT
- Part 2: Adding Handlebars
- Part 3: User Authentication with Passport and MongoDB
- Part 4: Refactoring, Adding styles
- Part 5: Saving Jobs
Navigate to where you’d like your project to reside, then run the following command:
CD into the newly created directory and install the node modules:
Aside for the installed node modules/dependencies, your project structure should look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13
Next, install Supervisor if you don’t already have it:
Finally, run a sanity check by testing out your app:
Point your browser to http://localhost:3000/ and you should see the simple “Welcome to Express” message:
Still with me? Good. Let’s set up our first route.
app.js, comment out the following two dependencies since we’ll be defining all our views directly in
Comment out the following routes as well:
You probably noticed the
'express'dependency. Remember that we also used Express to generate our project structure. To clarify, Express is both a framework and a command line tool. Just be aware that you can use the Express framework without generating the project structure, although you would probably want to follow another boilerplate structure to speed up the development process.
Let’s set up our routes. Routes bind a URL to a specific function. In other words, when a request is sent by the end user, it’s handled by a specific URL. Different requests are handled by different URLs within our application. Hence the need for different routes.
Our app we’ll have two routes. (1) The first renders the search page where the end user directly searches Craigslist. (2) The result of the search is then passed to another route via AJAX, which processes the request on the server side.
Let’s look at the former.
Index Route (client-side)
Add the following code to ‘app.js’:
Essentially, when a user sends an HTTP GET request to
index.jade view is rendered. Let’s test this out. Update the code in the
1 2 3 4
Double check in the terminal that your server is still running (and that there are no errors), and then return to your browser and refresh the page. You should see the updated H1 tag:
Next, we’ll update the view.
Index View (client-side)
1 2 3 4 5 6 7 8
Then add the following styles to “style.css”:
1 2 3 4 5 6 7
Refresh your browser. See the updates?
This should be straightforward.
While we’re still on the client side, let’s go ahead and add the event handler within the
1 2 3 4 5 6 7 8 9 10
Here we’re capturing the search results after the end user presses (and releases) the ENTER button (keycode
13), and then storing them in the variable
val. This data is then sent to the server where the response will then be handled, passed back to the client, and finally added to the document via another event handler:
That’s a lot to take in. Let’s stop for a minute and look at the workflow from the user perspective.
- User navigates to main page. The page loads.
- On the page load, the user sees the search box.
- User can then enter some text to search for.
- After the user presses ENTER results are displayed.
Thus far, we’ve finished numbers 1, 2, and 3. Most of the action is handled in step 4, though. We already passed the results back to the server to the
/searching/ route. We need to then scrape Craigslist, send the results back, and then display them, of course.
Now might be a good time to take out a piece of paper and a pen and write a diagram of what has happened thus far.
Ready? Let’s move back to the server-side.
app.js (server-side) redux
Back on the sever side, we need a route to handle
1 2 3
res.send(). I just want to test that the route works. Point your browser to http://localhost:3000/searching. If all is well, you should see “WHEEE” in the top-left of the screen. We now know the route is working.
With the route working, we need to accomplish a number of things -
- Set the returned value from the search box to a variable;
- Pass the variable into the YQL search URL (YQL handles the actual scraping);
- Use the request module to process the YQL URL and return the results; and,
- Pass the results back to the client side.
We’ll handle all this in increments, testing as we go.
Set the returned value from the search box to a variable:
1 2 3 4 5 6 7 8 9 10
So after you add this code, return to your main route in the browser - http://localhost:3000/ - and place your terminal next to the browser. Now enter a search term and press ENTER. You should see the
console.log() in the terminal:
In the above example, I first searched for “Ruby” and then “hello, wonderful”.
Moving on, let’s add another step.
Pass the variable into the YQL search URL.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Again, open your browser along with your terminal and search for something relevant. (This specific YQL URL searches for all jobs in the San Francisco Bay Area.)
I searched for Ruby then Python in the example above. If you want to see the actual JSON results, grab the URL from the terminal and paste it into your browser’s address bar.
Use the request module to process the YQL URL and return the results. To do this we need to first install this module by running this command from your terminal:
Now add the following code to the route function:
1 2 3 4 5 6 7 8 9 10 11 12
app.js looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
In this code, we pass the YQL URL to the request module (
request(url…), then we grab the callback (
body), which is a string, and then parse it as JSON (
body = JSON.parse(body);). Next, we test to see if any results are returned. If so, we assign the value
craig, and if not, we assign the value
"No results found. Try again." to
If you run into problems here, stop and test.
body. You can also add a
console.loginside the conditional statement to see if the logic is even working. For example, if you know that no results are being returned, yet when you add
console.log(testing)inside the first conditional and don’t see it in your console when you run the app, you know that your logic is incorrect.
When in doubt always, always, ALWAYS test with
Finally, let’s pass the results back to the client side:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
So, we’re simply taking
craig, which could contain results or a string stating that no results could be found and sending it back to the client using
And with that, we’re finished with the server side.
Main.js (client-side) redux
main.js and look at the following code:
This is the actual AJAX request. As I stated before, we pass the
parameters (which is an object) to the server side. We also have a handler setup to process the response from the server, which then takes the returned results and adds the data to the
<h2> tag with the
results in the JADE template.
Now the end user should see the results.
Test it out using a number of inputs. Remember: this searches jobs in the SF Bay Area, so use appropriate keywords.
Wait. Why did this return just the URL? Well, go back to
app.js and check the conditional logic:
1 2 3 4 5
If the query returns results, then we pass
craig. What is that value? Well, we take the JSON file,
body, and traverse through it. Let’s look at the JSON file real quick. You can grab this from the repo.
Now just look at the value,
body.query.results.RDF.item['about'], and compare it to the JSON file. We move from
RDF to the first item,
item. Finally, when we call the
about key, the URL (the value) is returned. Make sense? See if you can return the
description from the second
Now let’s turn that text URL into an actual clickable URL. See if you can do it yourself before you look at my answer.
That’s all for now. Next time we’ll look at how to return multiple results and loop through them with Handlebars and the separation of concerns. Comment if you have questions. Check out the repo. Cheers!