PDF accessibility: beyond tagging and screen reader accessibility

The following is a summary of the talk given by Dig Inclusion director, Ted Page, at the recent PDF Accessibility Days conference in Copenhagen, Denmark. A video of the talk and full transcript are included below.

There were three main strands to the talk:

  • The need to ensure that content is designed in a way that doesn’t compromise accessibility: it’s important to get the content right before you even start to think about the technical aspects of accessibility, such as tagging (5 examples shown)
  • The need to go beyond tagging, beyond PDF/UA compliance and beyond screen reader accessibility to ensure that PDFs are also accessible to the many users of non-screen reader assistive technologies out there (3 examples shown)
  • Whilst no document reading systems are yet fully up to speed accessibility-wise, some areas that PDF needs to address, if it is to fend of competition from other formats, include:
    • access to typographic style: the ability to customise font face, font size, colour, line height and line length
    • the ability to include accessible MathML
    • accessibility on mobiles and tablets

Despite its many advantages, PDF currently lags behind other formats (such as EPUB) in all three of the above areas.


So, in conclusion, in order to make PDFs as accessible as they can be:

  • PDF accessibility professionals need to look back to content authors to eliminate accessibility problems with the content itself
  • They need to look forward, beyond tagging, beyond screen reader accessibility and beyond standards compliance to address the needs of all AT users
  • PDF itself needs to address its current weaknesses, including access to typographic style, accessibility on mobiles/tablets/Macs etc, and incorporating external technologies such as MathML

Video of the talk

Video transcript

English school exams made accessible Ted Page, Dig Inclusion

Thank you for the introduction, and thank you all for showing up after lunch.

Today I am going to talk about accessible exams, but don’t worry, the stuff I’m going to talk about should be relevant for all PDFs, particularly PDF forms, so it’s not just exam-specific. A lot of the issues that have come up over 3 years or so from making accessible exam papers in UK may well be of interest to you all.

Exam papers are essentially forms, whether you are taking an exam with paper and pen or online in PDF or any other format, so I am going to be talking a lot about forms. What I’m not going to be talking about are the basics of how to tag or build a form, but what I do want to talk about a couple of issues that don’t get the coverage they should.

One of those issues is the problem of inaccessible content.

Another issue I want to talk about is the need, certainly in the context of exams, for going beyond standard tagging techniques and beyond screen reader accessibility in order to make PDFs accessible. The reason I say that is that, certainly in the context of exams, the vast majority of children taking exam papers and using assistive technologies are not screen reader users. In fact fewer than 3% are actually screen reader users.

So I’m going to talk a little about the issues that come up for users of other assistive technologies and things you may need to be aware of in that context.

Lastly I’m going to talk a little about how I think things might go in the future.

All 3 of these topics are inter-related in ways you may not imagine, but I will aim to string these together at the end.

My key take away number one is that PDF accessibility professionals should not be passive recipients of whatever content that content authors deem fit to throw at them.

I think we have a really important role to play in educating content authors in how to avoid the sorts of traps that they so often fall into.

I am going to give you some examples. I have actually picked out 5, from exams papers, but as I say they do generalise to other types of PDF as well. And I could have picked 100 if we had all day to talk about it, but I hope these will give you a good indication of how things can go wrong for screen readers, magnifier users and for dyslexic people who may use what are collectively known as literacy software solutions. I’ll show you some of the potential problems as we go.

My second point, it’s vital to go beyond screen reader accessibility. Dyslexic people outnumber screen reader users by about 10 to 1. Those taking exam papers are considerably higher than that.

I agree completely with the points several speakers have made, yesterday and today, that it is really important that assistive technology manufactures come on board with standards like PDF/UA, and that they build their products so they read tags, because their products will be a lot more usable if they did.

However, I will cite a couple of examples at the end where even that won’t solve all problems out there, and other solutions will be needed to be found to accommodate for those non screen reader users.

And I’ll just talk briefly about the challenges that PDF might face from other technologies in the future and what it might need to do to meet those challenges.

Let’s get straight onto the topic of content. In my view of the world you don’t make websites accessible, you make the content accessible. The ‘C’ in WCAG stands for content, of course, and in my view exactly the same thing is true in PDF.

In respect to exam papers, I would say that, well … I have an estimate here (on my second bullet point) that around 25% of all questions that turn up in exam papers have problems with the content itself. On reflection, I think this is quite a serious under estimate and the figure is considerably more than 25% of the content is inaccessible to one extent or another, and I’m going to give you some examples in a moment.

The good news is that most of them are relatively easy to fix, if you know what to do and what they are, and if you can have that conversation with examiners.

The first example is a fairly obvious one but I am going to include it anyway because there are a couple of issues around it. This is a typical front page of an exam paper. There are a couple of form fields at the top for candidates to put their name and candidate number and so forth in, then a title, some general information, and a set of instructions.

The first instruction that a candidate gets, in a digital paper, is “use black ink or ball point pen”. It’s obvious when you point it out, but if you give an instruction to a child that they straight away can’t meet, you have a problem. Now, if the person taking the exam is using a screen reader, you can to some extent get around the problem by replacing this instruction with actual text, or alt text if you want, of something like “enter your answers into the form fields provided”, and that’s what a screen reader user would hear.

However, you need to be aware that, if you do that, other assistive technology users will be disadvantaged. For example, Zoom Text Reader will just skip over that text and ignore it altogether. Magic, likewise (the other main screen magnifier that works with JAWS).

And the two main literacy software solutions used in the UK … I am sure there will be different types of literacy software solutions in different language areas, but in the UK the two main literacy software solutions are Read&Write Gold and Claro Read. One of them will skip over [such content] altogether and not read the alt text, whilst the other will read it out, but won’t put in the normal highlighting that you would expect to see.

So, most of the assistive technologies used by children in the exam room will not read it correctly if you replace text with actual text or alt text, and you need to be aware of that.

Everything I have said is based on the assumption you cannot change the content on the page. But you have to drop that assumption. You cannot accept that content, it has to go. That conversation needs to be had with the people who create the content, so they need to understand why they can’t do that.

That is my first example. The next one is maybe not quite so obvious, and my apologies if you have seen it before; I did actually mention this example in an interview prior to this conference, so apologies if you are seeing it for a second time, but this is an example of quite a common form of wording for a question in an exam paper.

“A cuboid has …….. faces”

OK, now I am sure the people who put these exam papers together have a huge amount of experience and knowledge in their academic areas, and I wouldn’t denigrate that in any way. However, I am not sure there is much awareness that, when you are creating an exam paper, you are creating a form. In the last 20 years an awful lot of research has gone into understanding what does and does not constitute a good form, and that example doesn’t do it. That shows me straight away that the people that put this together have not investigated the dos and don’ts of form building.

Forms were never designed to do that kind of a job. I liken it to trying to eat your spaghetti with a spanner, the tool is just not designed for the job and the outcome is probably lots of sauce down your shirt front. And that is what you get when you listen to something like that in a screen reader. JAWS, for example, would read that question as something like:

“a cuboid has edit a number of edit type in text faces”

Which is just a mess. It takes a really long time to work out what it means, what to do with it. And when you are on a time-limited environment, especially when life chances might depend on your understanding it quickly, that is a major, major problem.

However, just change the wording to: “How many faces does a cuboid have”, and put the form field after it, and it just works, out the box. A screen reader would just read out: “How many faces does a cuboid have, edit type in text”. In other words, question, [followed by] form field and a prompt. It’s that easy, and you haven’t changed the meaning or intent of the question at all.

I can give you any number of similar examples of badly worded questions, but I won’t, as we are limited on time.

That example would be a problem particularly for screen reader users. I will give you an example now of one that will be particularly problematic for a screen magnifier user.

When I train people in how to make accessible forms, one of the first things we talk about is page layout and how to and how not to layout form fields on a page. One of the first diagrams I show them is that [example of screen] which is the least efficient way of laying out a form of all.

Its characteristics of course are that all the questions/labels are left-aligned (down the left margin) and all the form fields are right-aligned. This [layout] is a problem for everybody. It is by far the slowest design for everyone to use including sighted people. Eye tracking tests have shown over and over that this kind of thing slows people down, in trying to understand which label relates to which field, so it’s a problem for everyone.

But it’s particularly a problem for a screen magnifier user because of all the space in between them, and I’ll show you in a moment exactly how much of a problem it is.

Now some of you may be sitting in the audience wondering who would design something as ugly and clumsy as that. Well you’d be surprised because that is the standard exam layout with the label on the left and the form field on the right, except that is actually worse than the one I showed you because, as well as the horizontal gap, you’ve also got a vertical gap between the question and the answer. And I am not showing an extreme case here, I’m showing quite a mild case. I don’t usually show extreme examples to make a point, although I will make an exception to that rule in just a moment, but this is not an extreme case, sometimes you get the question at the top of the page and the form field at the bottom, so it could be 4 or 5 times that much gap, and I will show you in just a moment how much of a problems it is.

[To do so] I could fire up Zoom Text or Magic which are the 2 main screen magnifiers. I won’t now as they are very slow to load and we are on limited time, but you can just take my word for it, they both allow users to magnify a page by up to times 36. I’m not going to show you times 36, I going to show you times 6, which is a relatively modest screen magnification.

It looks like this [example shown on screen]. Here you can just see the last letter character of the question and you just see in the bottom right you corner the beginning of the answer field. Again, this kind of thing will slow users down tremendously; the impact is enormous, trying to find where to go to type in your answer.

Again, the solution is simple and so easy to do. The optimal way to layout your form is to left align the question, left align the label, left align the field directly underneath and the problem goes away for everyone. There is no issue.

The next example I going to show you I will be fairly brief about because I am going to try not to duplicate what other speakers have said before, but this morning Ferass made a good point, that writing alt text is the job of the content author, it is not the job of the PDF accessibility person at the end. I think it’s fine that the accessibility person can give editorial input and guidance, but (1) context is often very important, and (2) sometimes you simply don’t have the expertise to write appropriate content.

I said [earlier] I don’t usually use extreme examples but I will in this case to really hammer home the point. This is from a chemistry paper, the question here is:

“The monomer of the addition polymer poly(propenol) may be represented as CH subscript 3 baseline minus C H equals C H 0 H. The repeat unit of the additional polymer is option A, option B, option C, option D.”

What someone has to do here is write alt text that describes each of those images, but to do it in a way that doesn’t give a screen reader an advantage or disadvantage over a non-screen reader user.

Now I don’t know if anyone in this audience fancies having a go at this. Perhaps we could have a competition if we had enough time? … But I suspect not, unless there’s any chemistry professors here…

This certainly isn’t anything I would want to have a go at, and I consider myself an English editor before anything else, before even any accessibility work. I am quite happy to draft text and edit it, but with this I draw a line.

In this case I simply rang up the chemistry professor and I asked: if you had to ask this of a class without the use of any visual aids at all, how would you do it? He was really up for it, and keen to play the game, and just for fun I’ll tell you what he came up with:

“Option A. Propan 1 ol with the continuation bonds on the first and second carbon atoms”

“Option B: Propan 2 ol with the continuation bonds on the second and third carbon atoms”

and so on.

I know it’s an extreme case, but I think we need to hammer home this idea that it is not the job of the accessibility professional to write alt text: it really should be supplied at source.

OK, my last content example is this one, and I have included it because it is a good example of content authors not being aware or perhaps not being clear about what they are actually testing for.

This question requires the user to draw a graph on a grid, and the only thing provided is the grid itself. And the question is

“Draw a graph of y= 2x+ 3 for values of x from minus 2 to 2”

So, what exactly is being tested for here? Well several things. Mostly, you are being asked to solve an equation for various values of x, and come up with a series of co-ordinates. That’s the bulk of the problem.

That being the case, all you need to do to make it accessible is provide a text field next to the graph in which to do the calculations and to come up with those coordinates. If that is the aptitude that is being tested for, the examiner needs to tell the PDF accessibility person that a text field needs to be provided here to solve this problem.

However, [if the requirement is actually to draw a graph (with pen and paper being the only realistic option here for doing so) then candidates are also being tested for sight as well as for the motor ability to use a pen in this way]. I am sure if they are made aware of that fact, you wouldn’t get questions in this format.

So, just to reiterate one more time, the conversation needs to be had between PDF accessibility professionals (or accessibility professionals in general) and content providers.

I hope that gives you some idea of the sorts of issues with content and where I’m coming from.

I want to move onto my second point now, which concerns using other assistive technologies and going beyond standard tagging techniques and beyond being PDF/UA compliant.

I fully agree with all of the speakers on PDF/UA compliance, except to say I think it’s a necessary condition, I don’t think it’s a sufficient condition for making a document accessible, especially, if I may say so, in a mission-critical environment like exams where someone’s life chances can depend on the outcome.

This is a paper which has been run through the axesPDF QuickFix checker, which is a wonderful tool, it really is, [plug plug, plug!] I have no commercial interest in it, by the way, it’s just a great tool.

But axesPDF QuickFix has run a check on this, and the automated check shows no accessibility issues. If I were to check this document with a screen reader such as JAWS or NVDA, there would also be no problems with it all, it just works—no issues at all.

However, if you bear with me a moment, I will bring up the actual PDF and show you how you can go wrong.

In a moment I am going to run a programme called Read&Write Gold, which is used by about 44% of all children [who use assistive technologies] in exams, in the UK. So this is a significant piece of equipment.

What we will find is that it ignores the sentence “use evidence from the extract to support your answer” which will be completely skipped. Also, this entire content block [a block of seven lines of text on the same page] will be ignored.

Let me run this for you and you will see what I mean.

So, this line here, and this whole block after “From the extract what do you learn about the character of Mercutio”….

[Read&Write Gold starts to read the page out loud, highlighting each line as it goes, but the above mentioned lines are completely ignored.]

This is not a reading order problem, it’s not hiding somewhere. The actual problem is that when the PDF was generated, in the Content panel you have some spans – we have a Container<Span>, and inside that you have a nested Container<P> (paragraph). And that’s what causes the problem. And those are the sorts of issues that I have to check each paper for to make them work.

I can see a couple of technical heads [in the audience] thinking about that one, [inaudible question from the audience concerning the cause of this problem]

I am a content person, I have no idea [as to the cause] I’m afraid. But those sorts of issues turn up all of the time, and they are not the only problems you get with Read&Write Gold. You will get reading order problems as well.

Those reading order problems would be solved if those assistive technologies read tags. However, not all [Read&Write Gold] problems would be solved by this. I will give you an example in a moment.

Let me show you something else that will go wrong with Read&Write Gold. This is a surprising one, and it’s not one that you can get rid of by artifacting.

When this flips from one page to the next, you will get the PDF equivalent of the Northern Lights You will only get it for a couple of seconds so you will have to not blink for a moment… [example on screen of lines and blocks of colour all over the page].

You get the idea. No amount of artifacting will stop that kind of thing happening. Again, if the assistive technology manufactures followed the tags that would probably disappear, but the fact is, right now they don’t. And until they can be persuaded (or harangued), it’s as well to be aware these problems can turn up.

I just want to show you one other example of something for which there is no obvious fix, even if the assistive technology manufacturers did play ball. Normally you would just handle a bar chart by tagging it as a Figure and adding alt text to it, probably a description of the layout, a summary of the trends, patterns, relationships, proportions and anomalies, as well as the data values, and this would all go into the alt text field of the image.

However, that really is no use to a dyslexic person who can see the page and who doesn’t need all that information. It’s actually a disadvantage to make them have to listen to it all. What a dyslexic user needs is the labels to be read out and highlighted, I’ll show you how that works as well:

[Read&Write Gold reads out the questions followed by the bar chart’s axis labelling as follows: “Laura draws a bar chart to show the different blood groups of 50 people. 25, 20, 15, 10, 5, 0, A, B, AB, O”]

That is what a dyslexic person needs. Things like this might actually be one of the reasons why people who make these software packages might be reluctant to make the switch [to reading a PDF’s tags]. Some kind of solution to a problem like that would need to be found. And there may be others but this is one of the most obvious ones.

Just going to wrap up with a few words of where things might go in the future.

Certainly MathML would be fantastic, and I know Olaf is talking about that later so I won’t go in any detail, other than to say it can’t come soon enough.

What I think is probably important to note though, is that one of the shortcomings of PDF, as things currently stand, is that for people with low vision the most important issue of all is access to typographic style.

And this is the kind of thing that other formats like EPUB do out of the box. You can choose the font, line height, line length, the font size and the colour.

PDF can do some of that, but some of it, it can’t, and I think that’s one of the biggest challenges. One of the reasons it can’t is the problem of reflowing a form, which causes some of those issues. PDF forms just don’t reflow at the moment therefore you don’t have kind of control.

Colour-wise, things are getting better. Adobe recently fixed the background colour bug that’s been in there for years. I blogged about back in July and it was fixed about 2 or 3 weeks later. Not sure if the two are related or not! It was fixed though, and now you can customize font and background colours, most of the time. It works, except when you have transparent images on the page where you will still have problems.

The other thing that would be great in terms of typographic style would be the ability to choose your own font and also line height, which is possibly where the biggest challenge from for PDF from other technologies.

I think there might be some subject areas that might be better in other formats: chemistry comes most readily to mind. Making chemistry accessible in PDF is a challenge to say the least, but the good news is most of it can be done easily if you get the content right and the tagging right as other people have spoken about in the last day and a half. All these things can be done to an extremely high standard, but you do need to go beyond past the standard screen reader and tagging to get there.

I guess that is me wrapped up and I am happy to take any questions if you have them.

Question: What do you think Ted, are PDFs really the best choice for exam papers?

Answer: I think it’s too early to say. [PDFs] have a lot going for them, a lot of advantages. The fact PDFs are used for the printed copy means they have a big head start on the field.

Also, it’s easy to take, say, an English or History paper that has large answer areas, and make that accessible to most people, caveat the access to typographic style which could be better. But it [PDF] is a quick easy way to work. I am not sure any other format at the moment would be quicker, they might be as quick.

In an ideal world I see an XML database with all the content in it and just sending it out for different formats as required. I think we might be a few years away from that, but I think that’s also a question for the candidate as there may be one candidate who prefers to have it as a PDF but someone who would get on better if it was in a different file format. So to have that flexibility would be fantastic, but how we get to that point I’m not sure. There are possibly people in this room with a much better idea on how we achieve that than I do.

Question: How often would you encounter this nested Containers issue?

Answer: One in 10 papers, but I do get the papers from different sources, and some people reuse old templates, have done for years. I don’t know what their history is I think people build all kinds of problems on top of problems and when you use axesPDF QuickFix to empty all containers you end up with a grid of pink squares on top of everything, and you get all kinds of nice artistic effects like that out of these fixes

Response comment: If you like I can explain to you one time why this happens if you want and help address it.

Ted: I’d love to know what the fix is, if it isn’t me having to manually take them out. As I say, there are people with much more depth in technical knowledge than I do. I am much more content and end-user person.

Question/comment: [inaudible]

Thank you.


About this post