Welcome to the first in a series of posts about the technical challenges we ran into building Yext Pages. Yext Pages is a full-service web publishing product that enables businesses to convert the facts and content that they manage in the Yext Knowledge Graph into a web presence for efficiently answering their customers’ questions. We launched Pages in 2014 and have been continuously evolving it ever since. In terms of organization, a couple teams of software engineers develop the underlying systems that generate and serve pages, and our consulting team designs & implements sites for clients using those systems.
As a concrete example, Taco Bell (customer story) uses two sets of Yext Pages to answer their consumers’ questions:
- Taco Bell publishes their business locations at locations.tacobell.com. When a store manager updates their hours, Yext publishes that update on their pages and listings.
- Taco Bell publishes a recruiting site at hiringparties.tacobell.com to highlight upcoming “hiring parties” at their business locations. These events are stored in their Knowledge Graph and incorporate information from the linked business locations.
These sites are built using three types of pages:
- a locator that searches the entities being published, using a geographic location or other criteria, including custom attributes.
- a directory, allowing users to browse the entity pages by geographic location.
- a unique page for each entity that provides information like store hours, a phone number, directions, and some transactional support – in this case, starting an order online.
Yext Pages can be used to answer whatever questions a consumer is asking, whether you want to publish a location, recruiting, menu, catering or service page.
Like most things, from the outside it seems simple: produce HTML by applying each data record to a template. What’s the big deal?
Full-service means that Yext provides everything from the design and maintenance of the page templates to global serving and operations for the backing applications. Here’s an overview of the different areas that we will drill in on in the coming weeks.
- Design, Accessibility, & SEO
- Technical Architecture
- Global distribution
- High availability & On-call support
These posts will be written for a technical audience. I hope that anyone building something similar can benefit from this experience report.
Let’s dive in.
Design, Accessibility, & SEO
Designing a web page that looks good, is easy to navigate, answers the user’s question, meets accessibility standards, and ranks highly in search is hard! Over the years we’ve designed, implemented, and proactively maintain hundreds of sites for global brands encompassing millions of daily page views, so we’ve really been able to go deep, hone our craft, and see what works in practice.
Looks good, easy to navigate, answers the user’s question
As a software engineer, this topic has the least overlap with my Venn diagram of competence, but I’ll do my best.
The two key things to figure out upfront are (a) what information to display and (b) the call to action. As an example, Taco Bell might say that Hours and Location are the most salient details, with a call to action of “Place an order online”. Others in Food may want to highlight their healthy menu items, in Retail they may want to drive users to e-Commerce sites, or in Healthcare they may want to highlight medical specialties or languages spoken.
Web sites include one page per entity (often per business location). Some content is shared, but most content varies between pages, so it must be collected and stored in a Content Management System (CMS). The design must respond to variable data, adjusting the layout when the data profile swells to the maximum or collapsing when the data profile is sparse. Content that does not vary can be managed as one record in the CMS or coded directly in the page’s template. Customers manage their information in the Yext Knowledge Graph, and we iterate on matching the entity schema to what the page needs. In other words, the customer’s Knowledge Graph serves as the primary data source, and Pages can be published for any set of entities they manage in it.
Overall, it’s important to think about where content for each page comes from and how it will be kept up to date. To keep data on each entity up to date and allow customizations, the Yext Knowledge Graph can be configured to enable workflows like a local manager controlling specific fields on their store’s profile. This allows centralized control with local customizations.
In terms of “looking good”, we’ve found that having a cross-functional team (designer, developer, and project manager) that can iterate quickly with the business stakeholder is the silver bullet here. We start by designing & delivering a prototype that’s on-brand and follows best practices and iterate from there.
It goes without saying, but mobile and tablet experiences are of utmost importance to consider from the beginning – we see most customers receiving more than half of their traffic from mobile devices. Designing pages that work well on both desktop and mobile browsers is a huge topic. To learn more, start by reading about Responsive Design Basics.
Here are a few more examples of our work:
|Bayada Home Health Care||Kiehl's||Tranquilidade|
|Massage Envy||Pep Boys||SFR Boutique|
Accessibility is one of those words that means different things to different people. I am using it to mean that the website “complies with WCAG 2.1”. This is important, not only for the moral imperative of enabling everyone’s access to information, but also for some impactful business reasons:
There are a large number of people without a disability that have difficulty reading content presented in small fonts or poor color contrast. This is especially pernicious because these techniques are often beloved for making pages visually attractive!
Because it’s the law (in the United States), and companies get sued over missing alt-text from image tags. According to Seyfarth, 2018 saw ~2250 accessibility lawsuits filed against web sites, largely in NY and FL, with the Dominos case being the most recent high profile one.
There’s no trick here; these are just things you have to account for when implementing the site. Here are some resources or ideas that can help:
The W3C provides an accessibility evaluation section to help you evaluate your site. It’s a good place to start.
The relevant accessibility standard for HTML is ARIA, which provides a way to guide assistive devices, when the HTML structure is not sufficient.
Use automated checks or a third party service like tenon.io to avoid regression. The ideal is that an image tag without alt text results in an item that someone has to resolve.
Go through your site with your keyboard and make sure you can still use everything and navigate around. Not everyone can use a mouse, stylus, or finger to access your site.
Here are a few specific recommendations from our internal style guide:
Link styling should use more than color to distinguish it from text, for visually impaired users. Additional decoration can include underline, capitalization, letter spacing, italics, or an icon placed next to the link.
Overlay text on top of an image should have a 60%-opacity black background to ensure the text has enough contrast with the image. A drop shadow can also add contrast, but they are out of fashion in contemporary designs.
Focused elements that are interactive (buttons, links, tabs) should have distinct styling. Browsers do provide a default, but it’s worthwhile to override them to match the brand’s palette.
A related topic to accessibility is localization, or providing alternate versions of your web site in different languages. Two common methods exist for providing access to localized sites:
Use the Accept-Language header to select the right version of a page to show, among the localizations that are available.
Serve localized pages at different URLs e.g. “domain.com/es/page.html” for the Spanish version of a page, and “domain.com/en/page.html” for the English. If you do this, you should follow Google’s guidelines on linking localized pages. Namely, include
<link rel=”alternate” hreflang>links to alternate language URLs, either in the <head> or in your sitemap.
There is still the question of generating the localized pages. Although the promise was great, we found that automated translation was not nearly good enough to solve the problem for us. Our solution was to integrate with Smartling. Our system extracts text from source files, sends it to Smartling where it’s translated by humans, and downloads the resulting translated data files to incorporate them into alternate page versions. For content that varies by entity, the Yext Knowledge Graph supports localized entity profiles natively.
In the past, SEO has earned a reputation for being a dark art, rife with all sorts of unsavory practices like stuffing hidden text containing tons of keyword variations onto the page. In the past 5-10 years, my perception is that search engines have won the battle and are able to identify and penalize these behaviors. Now pages rank more highly when they are fast and communicate what they are about to search engines.
Our pages have the advantage that they are built by the business to service questions about their entity, often a business location. They are the legitimate heir to the top of the search results, so we focus on speed and accurately and comprehensively marking up pages for the various consumers, notably search engines and social networks.
Here are some examples:
Mark pages up with HTML meta tags, including name, description, and geographic location. This informs search engine results, which can improve your snippets or add your location to a map (for local searches).
Mark pages up with OpenGraph tags so that shares on social networks include rich content.
Mark pages up with schema.org structured data tags to get your facts into search engine knowledge graphs. Customers can use our Knowledge Tags product to dynamically add JSON-LD structured data to an existing site. Read the Intro to Structured Data for more information.
Google’s SEO Guide is also a good resource that covers more general advice for ensuring that you are getting as much value as possible out of your pages.
Using markup best practices can only get you so far, and nothing will substitute for highly relevant content. Knowledge Graph allows our clients to create and manage highly relevant content at a massive scale.
Page Speed & TTFB
Consumers are notoriously impatient creatures. Studies have shown that even small increases in latency result in a measurable decline in conversions or traffic.
A recent Akamai study found that:
- A 100-millisecond delay in website load time can hurt conversion rates by 7 percent
- A two-second delay in web page load time increases bounce rates by 103 percent
- 53 percent of mobile site visitors will leave a page that takes longer than three seconds to load
The challenge of making a low-latency user experience usually divides in half across the “time to first byte” (TTFB). That’s how long it takes the user’s browser to begin receiving the web page.
Prior to the First Byte, the client connects to your server, performs the TLS handshake to establish a secure connection, requests a page, waits for your server to generate it and send back the first byte. Low latency here requires you to be able to generate the HTML very quickly, from a point geographically near to your users.
After the First Byte, the browser processes the HTML and displays the page as quickly as possible – the largest factor here is usually the external assets (JS, CSS) that it has to download and any content that has to be loaded asynchronously from APIs. For example, single-page apps often have a quick TTFB because they return static files, but they may have a worse perceived latency because the user doesn’t get their content until the JS runs and subsequently loads it. In contrast, Wordpress may take longer to return the HTML, but if that HTML has all of the content that the user cares about, it might be a lower perceived latency.
One complicating organizational factor is that optimizing for low latency usually requires efforts and coordination across multiple teams. Often the infrastructure, the application, and the HTML / CSS / JS are all developed by different groups and at different times. Having one stakeholder that cares about performance and coordinates the effort can be instrumental.
Is your web site slow? Getting customer-perceived latency metrics can be harder than it sounds.
Server-side solutions typically present the time it took to return the HTML page as the latency; often that makes it look much faster than what the customer perceives. But it’s an easy number to get, it’s usually steady / without a ton of variance, and optimizations show clear effects.
Client-side solutions like Google Analytics give more accurate information, but those metrics are often very noisy because of extremely varied customer setups: they run different browsers from different locations on different devices with different capabilities. One story along these lines is that YouTube launched a big optimization but subsequently saw latency rise. Upon further study, it was because they saw much more usage from places where their service was too slow to use before. Another confounding factor is XHR requests or images/videos– sometimes they supply critical content for the page, in which case you’d want to count them in the perceived latency, and other times they are peripheral, in which case their latency is not so important to the user experience.
Server-side metrics are reliable but only paint part of the picture; in our system, they are rarely interesting. Client-side metrics are more accurate but very noisy, and they are hard to use for directing work or seeing results. Looking at them filtered to a single browser and device type can allow you to glean more insights if you also discard outliers, but it’s not easy to use and could still be misleading.
Primarily we use PageSpeed’s API for routine monitoring and evaluation of sites. It provides a pretty good list of criteria for what constitutes an optimized web page, although not all of the items are applicable. We surface its data in an operational Grafana dashboard for each domain, and our consulting team gets an alert if a template change causes it to regress.
We will talk more about how we optimize the TTFB in the next post on technical architecture, since reliability and low latency were the primary design goals.
Post-TTFB is covered pretty comprehensively by PageSpeed’s guidelines, so I will try to avoid repeating them here. But here are a few examples of our practices:
Photos - We resize uploaded photos and our system automatically picks the smallest one that fills its slot on a page. We automatically transcode photos to WEBP and serve that to Chrome user agents. We fingerprint the image content and set long-lived cache headers so users never have to download an image more than once.
Inline CSS - CSS usually blocks the page being displayed to the user. We avoid a separate fetch by injecting the minified stylesheet into the HTML. We also engineer our stylesheets to be as minimal as possible in the first place, to make inlining effective.
Async JS - We include scripts using the async and defer attributes, so that the browser can begin loading immediately, while not blocking the page render. If the script is smaller than a threshold, it is automatically inlined.
DNS Prefetch - We add markup that tells the browser to resolve DNS names before it needs them, via dns-prefetch
CDNs - This is a topic for a later post, but we make heavy usage of CDNs to get data as close to users as possible.
Mobile-specific pages - Our system distinguishes Desktop from Mobile browsers and has the ability to serve different designs to each one. This can be valuable for making mobile pages as light as possible, and in some cases it can be easier to maintain separate designs compared to a single responsive design.
If you’ve done everything in this post, you will have a fast, accessible, localized, responsive, attractive page that ranks highly and answers your customers’ questions. The remainder of the posts discuss how we generate, serve, and monitor these pages.