I made an application where I scrape a page, on that page I have a script like this

var myData = { Time: '10:46:29 am', car1: 'Volvo', car2: 'Ferarri', car3: 'VW' };

With cheerio and request node module I get the script but I need to get the value of the car1, car2 and car3.

request('http://my-url.com', function(error, response, body) {
    var $ = cheerio.load(body);
    var htmlData = $('body script').last().prev().html();

I’ve tried to use JSON.parse(htmlData) but I get the following errors SyntaxError: Unexpected token T.

Is there any way to parse the javascript from the script, or can someone explain me how to get the values for car1 and car2 via regex?


I would recommend doing a series of string replacements and then do JSON.load, to get the JavaScript object, like this

var data = "{ Time: '10:46:29 am', car1: 'Volvo', car2: 'Ferarri', car3: 'VW' };";
var obj = JSON.parse(data
  .replace(/((?:[A-Za-z_][wd])+):/g, '"$1":')
  .replace(/'/g, '"')
  .replace(/;s*$/, ''));
console.log(obj.car1, obj.car2, obj.car3);
// Volvo Ferarri VW


.replace(/((?:[A-Za-z_][wd])+):/g, '"$1":')

will replace all the strings matched, of the form (?:[A-Za-z_][wd])+ with the same matched string surrounded by " and followed by :, with "$1":.

And then

.replace(/'/g, '"')

will replace all ' with " (assuming your data will not have ' in them).

And then

.replace(/;s*$/, '')

will replace the ; followed by whitespace characters at the end, with empty string (basically we remove them).

At this point, the string will look like this

{ "Time": "Friday", "car1": "Volvo", "car2": "Ferarri", "car3": "VW" }

and now we simply parse it as JSON string, with JSON.parse to get the JavaScript object.

Write a comment

Your email address will not be published. Required fields are marked *