HeadlessChrome: a solution for server-side rendering of JS sites [Previous] [Translation] Introducing HeadlessChrome pre-rendered pages
Link to original article: https://developers.google.com/web/tools/puppeteer/articles/ssr
note: Due to limited English proficiency, No verbatim translation, You can choose to read the original article directly
tips:Headless browser can be used as an alternative to server-side rendering, converting js sites to static html pages on the server side; running Headless browser on a webserver can pre-render modern js mode applications, increasing responsiveness and being more SEO friendly
The techniques covered in this piece show how to get the most out of theGoogle Headless framework(puppteer) toward aExpress web server Add server-side rendering capabilities, The application-friendly pair is, Basically no code changes are required; All the jobs basically havepuppteer assume (responsibility for), With a few simple lines of code you can render almost any page on the server side。
Here is a small piece of code that will be involved:
1 import puppeteer from 'puppeteer'; 2 3 async function ssr(url) { 4 const browser = await puppeteer.launch({headless: true}); 5 const page = await browser.newPage(); 6 await page.goto(url, {waitUntil: 'networkidle0'}); 7 const html = await page.content(); // web pagehtml elements 8 await browser.close(); 9 return html; 10 }
Note: The code in this post is based on es modules and requires node 8.5+ with --experimental-modules enabled
If you needseo, You logged in to read this article for one of two reasons: firstly, You have created aweb application, But it's not indexed by search engines, Your application may be aSPA、PWA application。 Or actually applications created by the technology stack, It doesn't really matter what technology stack you're using; The important thing is that, You've spent a lot of time creating great apps, But the user can't find it。 second, You may have noticed from other sites that server-side rendering can improve performance somewhat。 You can can reap the rewards here of how to reducejavascript Start-up costs and how to improve first screen rendering。
tips:Some frameworks like (Preact) already support server-side rendering, so if the framework you're using has a server-side rendering solution, then just stick with it, there's no need to introduce a new tool.
Search engines primarily crawl static html tags to work, but modern web applications have evolved to be more complex. Javascript based applications, the content is transparent to the web crawler as its content is mostly rendered on the client side via js. Some crawlers like google's crawlers are also getting smart. google's crawlers use Chrome41 to execute Javascript to get the final page, but this solution is still not very mature and perfect. For example, some of the new features of ES6, for example, still cause Js errors in older browsers. For the other search engines, hell, I wonder how they do it? O(∩_∩)O ha!
All crawlers understandHTML, So what we need to address is how to implementJS, come up withHTML。 What if I told you there was such a tool, What do you think??
Sounds good, right?? This tool is the browser!
Headless Chrome doesn't care what libraries, frameworks, or toolchains are used; it eats in Javascript for breakfast and spits out static HTML for lunch. Of course we hope it will be a lot faster than that process - Eric
If you use Node, Puppteer is a relatively simple way to operate headless Chrome.The API it provides is a client-side application supporting server-side rendering capabilities. Here is a simple example.
We take a person who has passedjs dynamic generationHTML The example of a dynamic page starts with:
public/index.html
1 <html> 2 <body> 3 <div id="container"> 4 <!-- Populated by the JS below. --> 5 </div> 6 </body> 7 <script> 8 function renderPosts(posts, container) { 9 const html = posts.reduce((html, post) => { 10 return `${html} 11 <li class="post"> 12 <h2>${post.title}</h2> 13 <div class="summary">${post.summary}</div> 14 <p>${post.content}</p> 15 </li>`; 16 }, ''); 17 18 // CAREFUL: assumes html is sanitized. 19 container.innerHTML = `<ul id="posts">${html}</ul>`; 20 } 21 22 (async() => { 23 const container = document.querySelector('#container'); 24 const posts = await fetch('/posts').then(resp => resp.json()); 25 renderPosts(posts, container); 26 })(); 27 </script> 28 </html>
next, A simple implementationssr approach
ssr.mjs
import puppeteer from 'puppeteer'; // Memory cache,key:url value:html elements const RENDER_CACHE = new Map(); async function ssr(url) { if (RENDER_CACHE.has(url)) { return {html: RENDER_CACHE.get(url), ttRenderMs: 0}; } const start = Date.now(); const browser = await puppeteer.launch(); const page = await browser.newPage(); try { // networkidle0 waits 500ms When there are no other requests. // The page's JS has likely produced markup by this point, but wait longer // if your site lazy loads, etc. await page.goto(url, {waitUntil: 'networkidle0'}); await page.waitForSelector('#posts'); // Wait and confirm #posts Already present indom in, If it already exists, then immediately implement. } catch (err) { console.error(err); throw new Error('page.goto/waitForSelector timed out.'); } const html = await page.content(); // after being serializedHTML elements await browser.close(); const ttRenderMs = Date.now() - start; console.info(`Headless rendered page in: ${ttRenderMs}ms`); RENDER_CACHE.set(url, html); // cache rendered page. return {html, ttRenderMs}; } export {ssr as default};
Main code logic.
lastly, By aExpress server Tying it all together。 Hey look directly at the code, The code is commented out。
server.mjs
import express from 'express'; import ssr from './ssr.mjs'; const app = express(); app.get('/', async (req, res, next) => { // invoke It's written.ssr approach, transmitted inwardsurl, pass (a bill or inspection)headless chrome Return the rendered result after rendering const {html, ttRenderMs} = await ssr(`${req.protocol}://${req.get('host')}/index.html`); // Add Server-Timing! See https://w3c.github.io/server-timing/. res.set('Server-Timing', `Prerender;dur=${ttRenderMs};desc="Headless render time (ms)"`); return res.status(200).send(html); // Serve prerendered page as response. }); app.listen(8080, () => console.log('Server started. Press Ctrl+C to quit'));
or so, Response receivedHTML That's the way it should be.:
<html> <body> <div id="container"> <ul id="posts"> <li class="post"> <h2>Title 1</h2> <div class="summary">Summary 1</div> <p>post content 1</p> </li> <li class="post"> <h2>Title 2</h2> <div class="summary">Summary 2</div> <p>post content 2</p> </li> ... </ul> </div> </body> <script> ... </script> </html>
This is the end of the first part. Stay tuned for the next part and the middle part.