I made an application where I scrape a page, on that page I have a script like this

<script>
var myData = { Time: '10:46:29 am', car1: 'Volvo', car2: 'Ferarri', car3: 'VW' };
<script>

With cheerio and request node module I get the script but I need to get the value of the car1, car2 and car3.

request('http://my-url.com', function(error, response, body) {
    
    var $ = cheerio.load(body);
   
    var htmlData = $('body script').last().prev().html();
    console.log(data);  
        
});

I’ve tried to use JSON.parse(htmlData) but I get the following errors SyntaxError: Unexpected token T.

Is there any way to parse the javascript from the script, or can someone explain me how to get the values for car1 and car2 via regex?

Answer

I would recommend doing a series of string replacements and then do JSON.load, to get the JavaScript object, like this

var data = "{ Time: '10:46:29 am', car1: 'Volvo', car2: 'Ferarri', car3: 'VW' };";
var obj = JSON.parse(data
  .replace(/((?:[A-Za-z_][wd])+):/g, '"$1":')
  .replace(/'/g, '"')
  .replace(/;s*$/, ''));
console.log(obj.car1, obj.car2, obj.car3);
// Volvo Ferarri VW

Here,

.replace(/((?:[A-Za-z_][wd])+):/g, '"$1":')

will replace all the strings matched, of the form (?:[A-Za-z_][wd])+ with the same matched string surrounded by " and followed by :, with "$1":.

And then

.replace(/'/g, '"')

will replace all ' with " (assuming your data will not have ' in them).

And then

.replace(/;s*$/, '')

will replace the ; followed by whitespace characters at the end, with empty string (basically we remove them).

At this point, the string will look like this

{ "Time": "Friday", "car1": "Volvo", "car2": "Ferarri", "car3": "VW" }

and now we simply parse it as JSON string, with JSON.parse to get the JavaScript object.