1

I'm attempting to use scrape-it as a dependency that I downloaded from NPM earlier today. I'm able to get the content back that I want, but need to store the results in a variable rather than handle them via the callback.

Using the 'sample' from scrape-it docs, when I try the following:

var myVar = scrapeIt("http://ionicabizau.net", {
    title: ".header h1"
  , desc: ".header h2"
  , avatar: {
        selector: ".header img"
      , attr: "src"
    }
}).then(page => {
    return page;
});

console.log(myVar);

I get the result: Promise { <pending> }

I've also tried using 'await' prior to calling the scrapeIt() function, but when I do that locally 'Unexpected identifier' syntax error in return.

Mind you, when I've tried this on the RunKit + npm site, it does work there, but it doesn't work for me locally. I've uninstalled and reinstalled the package, and it seems like I have all the necessary dependencies, so not sure what I'm doing wrong. :-\

While this obviously does touch on the asynchronous nature of javascript, the focal point of the question is to better understand interaction with promises.

Reuben V
  • 13
  • 3
  • By "I've tried _this_" you mean the attempt using `await`, because otherwise `myVar` would be a promise as you said. – Patrick Roberts Jul 05 '17 at 18:51
  • 1
    Possible duplicate of [How do I return the response from an asynchronous call?](https://stackoverflow.com/questions/14220321/how-do-i-return-the-response-from-an-asynchronous-call) – Patrick Roberts Jul 05 '17 at 18:52
  • You're trying to evaluate something waaaaay before you have got it. `myVar` is a promise, so you should be doing your work with the results of the call in the `then` func. You may think you should block the current function execution until you have the result, but you don't. Simply split off the remaining logic and call it from within the `then` func. –  Jul 05 '17 at 18:54
  • Yeah, I'm starting to see that. I had hoped I could store the results in a variable someway, but no success. :-\ While try to work within the then function and see if I can pull off what I'm looking for. – Reuben V Jul 05 '17 at 19:09

1 Answers1

1

The scrapeIt method is asynchronous due to the async nature of the request module.

scrapeIt.scrapeHTML is sync, but it expects the HTML as a string.

So, as long you can get the HTML from somewhere, you can do something very similar to what you did:

var myVar = scrapeIt.scrapeHTML("<h1>Hello</h1>", {
    title: "h1"
});
console.log(myVar);
// { title: "h1" }

You can think of something async as something that you have to wait for (e.g. downloading the HTML of the page—that takes time). That's why there are callbacks and promises and so on.

scrapeIt("http://ionicabizau.net", {
    title: ".header h1"
  , desc: ".header h2"
  , avatar: {
        selector: ".header img"
      , attr: "src"
    }
}).then(myVar => {
   // Use myVar, only once it's ready
   // Once the page is downloaded and parsed
   // this function is triggered
   console.log(myVar)

   // Here you can do something with myVar
});

// At this point the page is not downloaded yet.
Ionică Bizău
  • 109,027
  • 88
  • 289
  • 474
  • So other than perhaps pulling the body content from a page on my own and then using scrape-it to pull the info I'm looking for, is it possibly using the standard scrapeIt method and somehow store the results in a variable? – Reuben V Jul 05 '17 at 19:11
  • @ReubenV You *do* store the data in a variable using the `then` method. But you have to understand the async nature of the whole thing. Why do you want it to get immediately? – Ionică Bizău Jul 05 '17 at 19:13
  • Ooooh... I see how it works now within the .then method. My apologies for being so dense on that. I should be able to work with it fine within .then now that I understand the flow, better. My apologies as I'm still relatively new to Promises. Your edited explanation cleared it up for me! – Reuben V Jul 05 '17 at 19:18