The point of this patch is to fix a couple of inconsistencies in `viewer.html`, compared to the locale files, such that the viewer would work correctly even without the `l10n/` files present.
*Note:* Since this isn't changing any of the locale files, we should *not* need to update any of the string names.
Looking through the history of the findbar code, it seems that the `findPrevious`/`findNext` buttons have never had a `title` set (note PR 2168, which was the initial findbar implementation).
Furthermore, the `placeholder` of the `findInput` didn't agree 100% with locale file either, so this is also adjusted.
I noticed that we have a `.toolbarButton.group` CSS class, which is currently applied to some buttons in the viewer. Since it attempts to adjust the `margin-right` property, I was initially a bit puzzled as to why there wasn't different rules for LTR/RTL locales.
However, checking the viewer with the DevTools inspector, in both LTR and RTL locales, I quickly found that the rule in question is *always* being overridden by other CSS rules.
It thus seem to me that while this rule was probably useful at some point, it has been dead for years and could now be removed.
The find functionality is currently not available for small devices
because the find dialog is not responsive. This patch fixes that.
To achieve this goal, the HTML is changed to always show the find
button. To prevent issues because of the addition of an extra button for
small views, the previous/next page buttons are hidden if the view
becomes small. These buttons are not useful anyway because on small
devices navigation is usually done via scrolling. The find functionality
is much more useful to have in this case. Moreover, we wrap the existing
elements into separate `div`s so that the browser can position the
elements itself when the view becomes smaller and logically connected
elements stay together when this happens.
In the CSS, extra rules for the find bar have been added to ensure that
the dialog's doorhanger is always below the find button. All findbar
`div`s are forced to be 32 pixels high to prevent the find message text
being aligned under the checkboxes. Finally, the find message is only
visible when there is actually text to display. This prevents wrapping
issues because, by default, the label has padding and margin even if
there is no text.
Instead of just upstreaming the changes from [bug 1345253](https://bugzilla.mozilla.org/show_bug.cgi?id=1345253) as-is, it seemed better to simply get rid of the loops altogether and use the same approach as in `PDFViewer`/`PDFThumbnailViewer`.
First of all, `compatibility.js` is already available in `lib/shared/compatibility.js`. Second of all, as can be seen in 9142301f35 (diff-9432ebaa58e10ab02874fcb86f689caa), the `lib/web/compatibility.js` file cannot work since the `require` statement isn't compatible with the output of `gulp lib`.
*This is something that I noticed while working on PR 8126, which is (more) fallout from PR 6065.*
In general, it's actually *not* correct to return `Dict.empty` as the default value for non-existent properties. Please note that a prior PR, see https://github.com/mozilla/pdf.js/pull/5957#issuecomment-103112698, asked for that behaviour but I don't think that's right.
Obviously for properties that are (or should) be `Dict`s it makes sense, however certain properties can be e.g. Strings or Arrays instead. In the latter case, returning `Dict.empty` is just plain wrong, and it's quite fascinating that this hasn't caused any errors in practice. (The existing validation in the various getters has actually saved us here.)
Also, when looking at this code again, it seemed unnecessary to duplicate the `MAX_LOOP_COUNT` check since we could just return immediately instead.
When using content security headers to restrict connections to the same origin,
you may not make connections to `example.com`. This feature detection also
works with a request to the current location.
It appears that I accidentally broke this in PR 6065, sorry about that!
The issue in this particular PDF file is that there's `/Rotate` entries on different levels of the `/Pages` tree. We're supposed to use the `/Rotate` entry in the `/Page` dict (which is `0`), but because of an incorrect condition we instead ended up with the one from the `/Pages` dict (which is `180`).
Fixes 8125.
Given that the mailing list has now been closed, see [bug 1340296](https://bugzilla.mozilla.org/show_bug.cgi?id=1340296), and that there's no weekly meetings any more, we probably shouldn't mention either of those in CONTRIBUTING.md.
Instead, let's just suggest the IRC channel as a means of communication here.
Ideally, the `Annotation` class should not have anything to do with the
page's operator list. How annotations are added to the page's operator
list is logic that belongs in `src/core/document.js` instead where the
operator list is constructed.
Moreover, some comments have been added to clarify the intent of the
code.
Even though the PDF specification does not state that `Opt` fields are
inheritable, in practice there are PDF generators that let annotations
inherit the options from a parent.
This fixes something that I noticed while working with the code in `Catalog.getPageDict` when debugging issue 8088.
Note that while I don't have an example where this patch really matters, given that e.g. `PartialEvaluator.hasBlendModes` depends on the `objId` to avoid cyclic references this patch could potentially help for some PDF files.
As discussed on IRC, we need to check all nodes at the *bottom* of the tree to ensure that we find the correct `Page` dict.
Furthermore, this patch also gets rid of the caching present in a previous version, since it's not clear if that really helps.
Note that this patch purposely adds an `eq` test, using a reduced test-case, so that we can be sure that the algorithm actually finds the correct `Page` dict for each `pageIndex`.
Fixes 8088.
The download button in pdf.js doesn't work in iOS Chrome.
- It appears to be an issue with URLs from URL.createObjectURL.
The URL is correct, but iOS Chrome won't even load the URL
when `a.click()` is called in `download_manager.js`. Even
if you manually visit the URL, you get a blank page.
- Fix this by detecting iOS Chrome and disabling createObjectURL.
The `download_manager.js` `download` method wasn't checking
`PDFJS.disableCreateObjectURL`, so check it there, too.
- Move the navigator.msSaveBlob check earlier, so that
this doesn't change IE10 / IE11 behavior.
- Remove the !URL check since pdf.js has a URL polyfill
now.
[api-minor] Refactor fetching of built-in CMaps to utilize a factory on the `display` side instead, to allow users of the API to provide a custom CMap loading factory (e.g. for use with Node.js)
Currently the built-in CMap files are loaded in `src/core/cmap.js` using `XMLHttpRequest` directly. For some environments that might be a problem, hence this patch refactors that to instead use a factory to load built-in CMaps on the main thread and message the data to the worker thread.
This is inspired by other recent work, e.g. the addition of the `CanvasFactory`, and to a large extent on the IRC discussion starting at http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53010.
Previously, we had a function called `getDefaultAppearance`. This name,
however, is misleading as the method gets the normal appearance (in the
`N` entry) and not the default appearance (in the `DA` entry). Moreover,
it was not entirely clear how it works just from reading the code. It
primarily lacks comments and explicit error case handling.
This patch improves the situation by fixing the issues mentioned above
and making this function a proper method of the `Annotation` class, just
like e.g., `setColor` and `setBorderStyle`.
This patch basically reverts one aspect of TrueType (3, 1) cmap parsing to the state prior to PR 4259. After that PR, a number of regressions occurred in this particular code-path, which necessitated a number of follow-ups such as PRs 5703, 5743, and 6425.
The empirical data suggests, at least to me, that we should always prefer a (3, 1) cmap for TrueType fonts when they have an encoding, regardless of the Symbolic font flag.
Obviously this patch passes all unit/font/reference tests locally, and I made sure that all the PRs mentioned above landed with test-cases included.
However, in my opinion, there's still a very real possibility that this patch could potentially cause new regressions.
Given that the PDF file in bug 1337429 has been broken for almost *three* years before anyone noticed, and considering that the code-path in question has been the source of numerous regressions, I do *not* intend to request uplift of this patch to previous Firefox versions (assuming that it's even accepted).
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1337429.
*Yet another thing that I unfortunately missed during review of PR 8023.*
Note that previously, in `make.js` this file was being preprocessed, however as far as I can tell that wasn't actually necessary. Hence this patch just copies the file to the proper output directory.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1338395#c8.
Consume the current character when encountering illegal characters in `Lexer.getObject`, in order to prevent infinite loops during reading of streams (issue 8061)
*Please note:* The rendering of the PDF file in issue 8061 first regressed in PR 7039, and then PR 7493 exacerbated the problem even further by causing an infinite loop.
In this particular case, when errors were encountered inside of the `Lexer.getObject` method *itself*, we didn't advance the stream position. This thus caused an inifinite loop in `parseCMap`, since the exact same character was then parsed over and over again.
Fixes 8061.
[Chromium addon] Prevent errors that break the addon, caused by the `DEFAULT_URL` constant being replaced by a `defaultUrl` viewer configuration parameter (PR 8046 follow-up)
If users want to download, they can quickly click on the Download button
in the newly opened viewer.
The blobUrl logic for Firefox relies on `disableCreateObjectURL` is
never false in Firefox. If the assumption is invalid, then PDF
attachments at the attachment view will not correctly be displayed,
because a data-URL will be generated and `?<filename>` is treated as
part of the data:-URL.
Determine the page rotation at the same place as where the page size is
determined. This allows us to implement custom print page rotation logic
in one place, in the future.
Note that I initially tried to add this as a parameter to the `PDFPageProxy.render` method, such that it could be passed to `PartialEvaluator.getOperatorList`.
However, given all the different code-paths that call `getOperatorList` (there's a bunch only in `annotation.js`), this seemed to very quickly become unwieldy and thus difficult to maintain compared to simply using the existing `evaluatorOptions`.
The download method (and the PDF document properties) detect the
file name using `getPDFFileNameFromURL`. The title ought to also
display the PDF filename if available.
This patch lets the AcroForms example make use of the built-in interactive
forms functionality in PDF.js. This makes the example:
- much easier to understand;
- more feature-complete;
- in sync with the core when new functionality is added;
- similar to the other examples in terms of structure.
Fixes extra canvas create calls.
Fixes unnecessary call of `new DOMCanvasFactory`.
Fixes undefined error of DOMCanvasFactory.
Fixes failures in some of the tests.
Fixes expected behaviour.
Remove unused vars.
The `Driver._cleanup` method is removing all stylesheets between test runs, which causes "TypeError: styleElement.parentNode is null" console errors in `FontLoader.clear`.
As can also be seen during various tests, some of the changes I made in PR 7972 unfortunately causes console errors.
It seems that I didn't test this properly, since it *should* have been obvious to me that while tests are triggered using Node.js, the files in question are run within the *browser*.
My apologies for not testing this thoroughly, and for causing unnecessary churn in the code!
The last (and only) usage of `MOZ_CENTRAL` was removed in PR 3036, so it's been unused for almost four years now.
If we need to have different code-paths for `FIREFOX`/`MOZCENTRAL` builds, the preprocessor should (and has) been used instead.
See http://eslint.org/docs/rules/brace-style.
Having the opening/closing braces on the same line can often make the code slightly more difficult to read, in particular for `if`/`else if` statements, compared to using new lines.
This patch also, for consistency with `mozilla-central`, enables the [`no-iterator`](http://eslint.org/docs/rules/no-iterator) rule. Note that this rule didn't require a single code change.
When a blob or data-URL is opened with the extension, viewer.html
rewrites the URL. But when the viewer is refreshed (e.g. F5), Chrome
would fail to display the viewer because the extension router was not
set up to recognize such URLs.
Now it is.
The regular expression incorrectly marked a group as capturing.
For `http://example.com/#file.pdf`, the expected result is "file.pdf",
but instead "document.pdf" was returned.
This property was added all the way back in PR 542, but hasn't actually been relied upon ever since PR 692.
Note that there's a `isStream()` utility function which replaced the property years ago, hence the `isStream` property is now dead code.
Other PDF viewers, e.g. Adobe Reader, seem to append `FileAttachment`s to their attachments views.
One obvious difference in PDF.js is that we cannot append all the annotations on document load, since that would require parsing *every* page. Despite that, it still seems like a good idea to add `FileAttachment`s, since it's thus possible to access all the various types of attachments from a single place.
*Note:* With the previous patch we display a notification when a `FileAttachment` is added to the sidebar, which thus makes appending the contents of these annotations to the sidebar slightly more visible/useful.
A longstanding issue with the viewer is that you cannot tell if a PDF document includes an outline and/or attachments without actually opening the sidebar.
This patch contains a suggested solution for that, by displaying an hide-on-interaction notification on the `sidebarToggle` button (and the relevant sidebar view buttons). Note that this was inspired by e.g. the update notification that is displayed on the menu button in Firefox.
For an initial implementation, I've tried to do this in such a way that the notification isn't too distracting. Without being an UX expert, I don't think that we'd want something too in-your-face, in order to keep the viewer toolbars reasonable clean. (We probably do *not* want e.g. an entire notification bar in these situations, since that would take up unnecessary screen space and require actions from the user to close.)
However it's certainly possible that the current notification might simply be *too* inconspicuous to be truly helpful to users, but we could probably iterate on that if the feature itself is deemed useful.
Please see http://eslint.org/docs/rules/no-unused-vars; note that this patch purposely uses the same rule options as in `mozilla-central`, such that it fixes part of issue 7957.
It wasn't, in my opinion, entirely straightforward to enable this rule compared to the already existing rules. In many cases a `var descriptiveName = ...` format was used (more or less) to document the code, and I choose to place the old variable name in a trailing comment to not lose that information.
I welcome feedback on these changes, since it wasn't always entirely easy to know what changes made the most sense in every situation.
Given the nature of `EOF` and `isEOF`, it seems to me that they really ought to be placed in `core/primitives.js` instead.
In general, it doesn't seem great to have to depend on the entire `core/parser.js` file for such simple primitives/helper functions.
In particular, while `core/ps_parser.js` is completely separate from `core/parser.js` with regards to its function, it still depends on the latter for just *one* primitive.
Note that compared to e.g. PR 7389, this will not reduce the number of dependencies for `core/ps_parser`, however the new dependency IMHO makes more sense.
Given that this patch causes a lot of churn in the addon code, I wouldn't really mind if we ultimately decide against doing this and just add a rule exception in mozilla-central instead.[1]
---
[1] Note that I used the ESLint `--fix` option, hence writing this commit message actually took longer time than the creation of the patch :-)
PR 7322 added the `PdfJsNetwork.jsm` file, instead of the general `src/core/network.js` file for the Firefox addon. However, `make.js` wasn't updated to actually stop including the now obsolete network file.
*This fixes a regression from commit c9a0955c9c, i.e. PR 7738.*
Currently if you quickly rotate a document at least *twice*,[1] such that rendering of a page hasn't finished for the first rotation before the last rotation is triggered, the `cssTransform` method can fail to update the page correctly leading to it looking temporarily distorted.
The reason why things break is that previously we stored the `viewport` on the canvas DOM element, meaning that when it was accessed in `cssTransform` is was guaranteed to point to the `viewport` of the `zoomLayer` canvas.
Generally you want to avoid storing data on DOM elements this way, and during the `PDFPageView` refactoring needed to support SVG rendering, the previous `viewport` was instead stored directly on `PDFPageView`.
However, the problem is first of all that the `paintedViewport` only stores the *last* `viewport` computed, and second of all that there're no guarantees that it actually applies to the current `zoomLayer` canvas.
If a document is rotated slowly enough that rendering finishes *before* the next rotation then this problem doesn't exist, but for sufficiently quick rotations rendering will be cancelled at least once and the `paintedViewport` could thus be bogus.
The solution for the above problems is to ensure that we track the correct `viewport` for each DOM element (canvas or svg),[2] which seemed easist to do with a `WeakMap`.[3]
---
[1] I'm able to reproduce this using the `tracemonkey` file, but please note that for pages with few operations, i.e. that render very quickly, the effect may be hard to spot.
[2] One other possible solution that I briefly considered, was to wait until rendering finished before storing the current `viewport`. However, that would have caused issues with rotating a page before the *first* rendering operation had finished.
[3] This regression took me way longer to both figure out, and fix, than I'd like to admit :-)
*Please note: ignoring whitespace changes is most likely necessary for the diff to be readable.*
This patch addresses all the current, in `mozilla-central`, linting failures in the addon. It should thus be possible to change the `.eslintignore` entry for PDF.js in `mozilla-central` from `browser/extensions/pdfjs/**` to `browser/extensions/pdfjs/build/**` and `browser/extensions/pdfjs/web/**` instead.
Note that we cannot, for backwards compatibility reason of the general PDF.js library, at this time make similar changes for files residing in the `build` and `web` directories in `mozilla-central`.
The main changes in this patch are that we now use [classes](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes) instead of our previous "class-like" functions, and also use the more compact [object shorthand notation](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Operators/Object_initializer#New_notations_in_ECMAScript_2015).
A couple of functions were also converted to [arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions), to reduced usages of `bind(this)` and `var self = this`.
One caveat with ES6 classes is that it's not (yet) possible to define private constants/helper functions within them, which is why the `NetworkManagerClosure` was kept to not change the visibility of those constant/functions.
Besides testing in Firefox Nightly 53, this patch has also been tested in Firefox ESR 45 and SeaMonkey 2.46.
However, I'd gladly welcome help with testing the patch more, to ensure that nothing has gone wrong during the refactoring.
Fixes the first bullet point of issue 7957.
http://eslint.org/docs/rules/no-lone-blocks
Note that we currently have no code that violates this rule in the source files, but it seems that the built files are possibly affected (see issue 7957).
Please see http://eslint.org/docs/rules/spaced-comment.
Note that the exceptions added for `line` comments are intended to still allow use of the old preprocessor without linting errors.
Also, I took the opportunity to improve the grammar slightly (w.r.t. capitalization and punctuation) for comments touched in the patch.
Further adjust the heuristics used to detect OpenType font files with CFF data, to ensure that all Type0 fonts are handled the same way regardless of font Subtype (issue 7901)
Rename `annotation_layer_spec.js` to `annotation_spec.js` to better describe what is actually tested, and simplify the `FileAttachmentAnnotation` unit-test to avoid having to use the entire API in the test
Every other unit-test in `annotation_spec.js` is already only testing the annotation code. Hence it seems unnecessarily convoluted to make use of the API here, when we can (fairly) simply provide the necessary data explicitly as in all the other annotation unit-test.
Currently you have to open the files to be certain which tests each one will run, which we can avoid by appending the directory names to the file names of the tests themselves.
We're currently making use of `uniquePrefix`/`idCounters` in multiple files, to create unique object id's, and adding a new occurrence of them requires some care to ensure that an object id isn't accidentally reused.
Furthermore, having to pass around multiple parameters as we currently do seem like something you want to avoid.
Instead, this patch adds a factory which means that there's only *one* thing that needs to be passed around. And since it's now only necessary to call a method in order to obtain a unique object id, the details are thus abstracted away at the call-sites which avoids accidental reuse of object id's.
To test that this works as expected a very simple `Page` unit-test is added, and the existing `Annotation layer` tests are also adjusted slightly.
Using `else` after `return` is not necessary, and can often lead to unnecessarily cluttered code. By using the `no-else-return` rule in ESLint we can avoid this pattern, see http://eslint.org/docs/rules/no-else-return.
According to https://wiki.mozilla.org/RapidRelease/Calendar#Past_branch_dates: The *last* ESR version of Firefox 38 was released in April this year, and since June the only available ESR version has been based on Firefox 45.
Now that Seamonkey has *finally* released a new version, i.e. 2.46 which should correspond to Firefox 49, there doesn't seem to be any reason to keep the fallback code around in the addon anymore.
[api-minor] Ensure that the `getDocument` Promise is rejected if the `loadingTask` is destroyed, or an `Error` is thrown, inside of the `onPassword` callback (issue 7806)
We're already passing in a, currently unused, `PdfManager` instance when initializing the `XRef`. To avoid having to pass a single `password` parameter around, we could thus simply get the `password` through the `PdfManager` instance instead.
This patch also removes the `UpdatePassword` message, in favour of using the `sendWithPromise` method of `MessageHandler`.
Furthermore, the patch also refactors the `BasePdfManager_updatePassword`/`BasePdfManager_passwordChanged` methods (in pdf_manager.js), and the `pdfManagerReady` function (in worker.js).
I recently happened to look at the code I wrote for PR 5964, which fixed [bug 1157493](https://bugzilla.mozilla.org/show_bug.cgi?id=1157493), and I quickly realized that the solution is way too simplistic.
The fact that only using the `length` of a `Differences` array worked seems more like a happy accident for a particular set of font data, but could just as easily be incorrect for other PDF files.
Note that in practice, the case where the `Encoding` entry is a regular `Dict` (and not a `Ref` or `Name`) is very rare, hence I don't think that we really need to worry about having to reparse this data.
Also, the performance of this code-block is quite a bit better by updating the `hash` with the data from the *entire* `Differences` array, instead of at every loop iteration.
Let `finishPaintTask` in pdf_page_view.js return a promise instead, to avoid having to throw in the `paintTask.promise` rejection handler, and don't reject the `PDFPageView_draw` promise when rendering is `cancelled`
Changing this particular code makes me somewhat nervous about regressions, since PR 5770 necessitated the follow-up PR 6270.
However, the patch passes all tests added in those PRs (and obviously all other tests). Furthermore, I've manually checked all the issues/bugs referenced in PRs 5770 and 6270 without finding any issues.
**Please note:** This patch fixes *only* the font bug, not the SVG conversion, present on pages two and three of the PDF file in issue 7901.
As mentioned on IRC yesterday, we currently throw even when rendering is `cancelled`, which is annoying when the devtools are active. Furthermore, since `cancelled` isn't really an error, rejecting the `PDFPageView_draw` promise seems somewhat strange in that case.
Modern browsers support styling radio buttons and checkboxes with CSS.
This makes the implementation much easier, and the fallback for older
browsers is still decent.
I haven't got an example where the current code breaks, but given all the previous cases we've seen where PDF generators use indirect objects in Arrays it makes sense to fix this pro-actively.
I've modified the relevant unit-tests slightly, and they would *not* pass without the code changes in this patch.
*Note:* `Dict_getArray` only dereferences Array elements on the "top-level", to avoid recursion issues. Furthermore if you have to loop through the Array at the call-site anyway, then using `Dict_get` in combination with `XRef_fetchIfRef` is a tiny bit more efficient.
*Please note that most of the necessary code adjustments were made in PR 7890.*
ESLint has a number of advantageous properties, compared to JSHint. Among those are:
- The ability to find subtle bugs, thanks to more rules (e.g. PR 7881).
- Much more customizable in general, and many rules allow fine-tuned behaviour rather than the just the on/off rules in JSHint.
- Many more rules that can help developers avoid bugs, and a lot of rules that can be used to enforce a consistent coding style. The latter should be particularily useful for new contributors (and reduce the amount of stylistic review comments necessary).
- The ability to easily specify exactly what rules to use/not to use, as opposed to JSHint which has a default set. *Note:* in future JSHint version some of the rules we depend on will be removed, according to warnings in http://jshint.com/docs/options/, so we wouldn't be able to update without losing lint coverage.
- More easily disable one, or more, rules temporarily. In JSHint this requires using a numeric code, which isn't very user friendly, whereas in ESLint the rule name is simply used instead.
By default there's no rules enabled in ESLint, but there are some default rule sets available. However, to prevent linting failures if we update ESLint in the future, it seemed easier to just explicitly specify what rules we want.
Obviously this makes the ESLint config file somewhat bigger than the old JSHint config file, but given how rarely that one has been updated over the years I don't think that matters too much.
I've tried, to the best of my ability, to ensure that we enable the same rules for ESLint that we had for JSHint. Furthermore, I've also enabled a number of rules that seemed to make sense, both to catch possible errors *and* various style guide violations.
Despite the ESLint README claiming that it's slower that JSHint, https://github.com/eslint/eslint#how-does-eslint-performance-compare-to-jshint, locally this patch actually reduces the runtime for `gulp` lint (by approximately 20-25%).
A couple of stylistic rules that would have been nice to enable, but where our code currently differs to much to make it feasible:
- `comma-dangle`, controls trailing commas in Objects and Arrays (among others).
- `object-curly-spacing`, controls spacing inside of Objects.
- `spaced-comment`, used to enforce spaces after `//` and `/*. (This is made difficult by the fact that there's still some usage of the old preprocessor left.)
Rules that I indend to look into possibly enabling in follow-ups, if it seems to make sense: `no-else-return`, `no-lonely-if`, `brace-style` with the `allowSingleLine` parameter removed.
Useful links:
- http://eslint.org/docs/user-guide/configuring
- http://eslint.org/docs/rules/
Move the `Preferences` initialization/fetching code to the top of `PDFViewerApplication.initialize`, to enable using them when initializing e.g. the viewer components
Ideally we'd remove the 'localized' event from the `eventBus`, but for backwards compatibility we keep it in `GENERIC` builds.
Note that while we want to ensure that the direction attribute of the HTML is updated as soon as the `localized` Promise is resolved, we purposely wait until the viewer has been initialized to ensure that the 'localized' event will always be dispatched.
With `bindEvents()` now being called after the viewer has been initialized, we no longer need to have `PDFViewerApplication.initialized` checks in the event handler functions.
Furthermore by moving the `window.addEventListener`s to a helper method, `PDFViewerApplication.initialized` checks are no longer necessary in the event handlers, hence we thus address part of issue 7797 here as well.
Note that in quick testing using `console.time/timeEnd`, both locally and with the Firefox addon, the total run time of the *entire* `PDFViewerApplication.initialize` function does not seem to change with this patch.
This patch refactors the `CropBox` code to combine fetching and
validation code in a getter, like we already did for the `MediaBox`
property. Combined with variable name changes, this improves readability
of the code and makes the `view` getter simpler as well.
I've not actually, thus far, come across a PDF file that this patch fixes. However, given the string of recent patches that has fixed issues with indirect objects in arrays, I think that it makes sense to proactively avoid any issues in this code.
I just realized that none of our current unit-tests cover this particular part of the Page Label parsing code, hence this patch adjusts an existing test PDF to include a "St" entry in the Page Label dictionary.
This patch avoids the creation of extra arrays when initializing an
array with default (zero) values. Doing this additionally makes the code
more readable by allocating enough space for the number of color
components.
It seems that for normal web pages, at least in Firefox, the keyboard shortcuts <kbd>Ctrl</kbd> + <kbd>Up</kbd>/<kbd>Down</kbd> are functionally equivalent to <kbd>Home</kbd>/<kbd>End</kbd>. This is obviously an edge-case, but can be easily implemented by using the same logic as we do for <kbd>Home</kbd>/<kbd>End</kbd>.
Fixes 7852.
*Please note:* I'm finding it slightly difficult to interpret issue 7852, and bug 1285719, since among other things: the title includes the word "reverse" with no other mention of it, and the STR makes reference to print preview which doesn't seem applicable to the PDF viewer.
However, compared to regular web pages in Firefox, I think the behavior of this patch makes sense here.
For Arabic characters, the Unicode character codes are mapped to Unicode
character types using the character codes for indexing. However, the
character code 0x061D is undefined (and therefore invalid) in the
Unicode standard. The imported list does not contain this entry, but not
having it in the list breaks indexing for items after it. Therefore, put
an empty string on its position to make indexing work properly and issue
a warning in the unlikely event that we encounter this character.
This patch moves the user agent checks to the top of the file to reduce
duplication and to provide a clear overview of which user agent we are
detecting.
Moreover, we extract inline user agent checks as well and use existing
checks in more places. Finally, we fix the indenting in one place for
consistency.
*This patch fixes something that I noticed while debugging https://bugzilla.mozilla.org/show_bug.cgi?id=1308536.*
The PDF file contains a font called "NuptialScript", which unfortunately is not embedded. Since that is a non-standard font we will not be able to render it entirely correct. However, by adding "NuptialScript" to the `getNonStdFontMap`, we can at least improve the rendering slightly by using an italic (serif) fallback font.
This patch adds support for non-embedded Arial Black fonts, that use a `Arial-Black...` format for the font names.
Also, this patch changes `canvas.js` such that we always render Arial Black fonts with the maximum weight, which actually improves a number of existing test-cases. This should thus explain the test "failures", which are clear improvements compared with e.g. Adobe Reader.
Fixes 7835.
For consistency, I also renamed the `FIREFOX/MOZCENTRAL` sessionStorage key, but given that sessionStorage is a lot less permanent than localStorage it didn't seem necessary to migrate any existing values.
Fixes 7760.
Currently if you try to enable SVG rendering in the addons, a `TypeError` is thrown by the browser since we have code that depends on what `paintOnCanvas`/`paintOnSvg` (should) return.
For commands with with too few arguments, clear out `args` if it's an Array instead of replacing it with `null` in `EvaluatorPreprocessor_read` (issue 7804)
For `PartialEvaluator_getTextContent`, the same `args` Array should be re-used for every `EvaluatorPreprocessor_read` call. Hence we want to ensure that it's not accidentally replaced with `null` in `EvaluatorPreprocessor_read`, since otherwise corrupt PDF files (with too few arguments for certain commands) will cause errors in `PartialEvaluator_getTextContent`.
Perhaps a micro-optimization, but this patch also changes two `!args` comparisons to `args === null`, since that should be a tiny bit more efficient.
According to e.g. issue 7754, it appears that the current `isSafari` check is failing in newer version of the browser. Despite the fact that checking the userAgent is an anti-pattern, which should be avoided, it's currently the simplest solution.
By only allowing very specific type of `JavaScript` actions, and also utilizing the existing `URL` validation, this patch shouldn't pose too much risk.
Fixes one of the points in issue 3897 (with the PDF file taken from issue 3438).
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=843699 (probably, since that bug doesn't contain a test-case).
Given that the `customScaleOption` should already be hidden, provided that the browser supports that, this patch also prevents it from being accessible via the keyboard.
As far as my testing goes in various browsers, this doesn't seem to have any ill effects, and note that we're already explicitly ignoring the `custom` value in the `select` event handler.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1315608.
This patch resolves the responsiveness issues for the toolbar in the
viewer. Depending on the language (for example the Dutch language),
elements could overlap when the viewport size is reduced.
The main issue here is that the CSS rules are unnecessarily complex and
handle lots of different cases (LTR/RTL, displacements for specific
viewport widths, et cetera). By removing this complexity and letting the
browser handle the responsiveness, we not only get simpler CSS rules and
HTML mark-up, but the responsiveness issues are mostly fixed at the same
time. We no longer have to position the elements manually (by setting
their `left` attribute value) anymore.
It seems that certain bad PDF generators can create badly encoded "Prefix" entries for Page Labels, one example being http://ukjewishfilm.org/wp-content/uploads/2015/09/Jewish-Film-Festival-Programme-ONLINE.pdf.
Unfortunately I didn't come across such a PDF file while adding the API support for Page Labels, but with them now being used in the viewer I just found this issue. With this patch, we now display the Page Labels in the same way as Adobe Reader.
The original code is difficult to read and, more importantly, performs
actions that are not described in the specification. It replaces empty
names with a backtick and an index, but this behavior is not described
in the specification. While the specification is not entirely clear
about what should happen in this case, it does specify that the `T`
field is optional and that multiple field dictionaries may have the same
fully qualified name, so to achieve this it makes the most sense to
ignore missing `T` fields during construction of the field name. This is
the most specification-compliant solution and, judging by opened issue #6623, also the required and expected behavior.
I got tired of staring at a bunch of localization warnings every time that I open the console, hence this patch adds the missing translations to the Swedish locale.
- Renamed startPrint to performPrint to emphasize that the method
does not start the print process (preparing pages for the printer),
but that it does the actual printing (sending pages off to the
printer).
- Put performPrint in the PDFPrintService, so that it can be
overridden if needed.
- Move the global scratchCanvas to PDFPrintService. This is mainly to
make it easier to reason about the state of scratchCanvas. In practice
there is no difference because only one PDFPrintService instance can
be instantiated at any given time.
- Move all logic of using the rendered page to one location.
This makes it easier to replace the printing logic later, when I add
special handling to out-of-process frames in the Chrome extension.
Make sure that the print service is stopped as soon as possible when
aborted, and that it is not possible for a (slow) promise to
accidentally wipe the state of a print job that was started later.
Remove/deprecate specifying a pageNumber directly after the hash symbol (#), to improve compatibility since other PDF viewers don't support this form (issue 7746)
There's no mention of our `#{pagenum}` form in http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_open_parameters.pdf, and Adobe Reader doesn't seem to support it either.
Hence this patch removes support for it in the extensions, but keeps it in the `GENERIC` build with a deprecation warning and a fallback to handle it as a destination.
Fixes 7746.
This patch implements the page label functionality in a similar way as Adobe Reader.
For documents with page labels, if a non-existent page label is entered we'll try to fallback to the page number instead.
The patch also includes a preference (`disablePageLabels`), to make it easy to opt-out of using page labels if the user/implementor so wishes.
The way that `get/set currentPageLabel` is implemented in `PDFViewer`, is as wrappers for the corresponding `get/set currentPageNumber` functions, since that seemed like the cleanest solution.
The page labels are purposely *only* added to the page controls in the viewer UI, and not stored in e.g. the `ViewHistory`. Since doing so would mean adding unnecessary code complexity, without any real added value, and would also mean delaying the inital loading of PDF documents.
Note that this patch will ignore page labels if they are identical to standard page numbering, since in this case displaying the page labels adds no value (but only UI noise). The reason for handling this case specially, is that in practice a surprising number of PDF files include "pointless" page labels.
The following reasoning was used for deciding to remove the "Page: " label, and replace it with a tooltip, from the main toolbar:
- We have no other visible labels in the *main* toolbar (e.g. the Zoom dropdown doesn't have a label, but only a tooltip).
- We already hide the "Page: " label when the viewer is narrow.
- The varying width of the "Page: " label in different locales is already causing issues for many languages, with overlap in the main toolbar as a result.
Trying to create responsive CSS styles that works well in all locales is already difficult, and if we add support for page labels that will only further compound the issues.
- Some PDF viewers (e.g. Adobe Reader, pdfium in Chrome) doesn't show labels in the UI by default.
In some PDF files, the first element (i.e. the one containing either a `Ref` or a `Number` pointing to a page) of the explicit destination Array may be bogus.
One such example is actually the file `pdf.pdf` in the test-suite, where some destinations are incompletely specified. One such example being the `G1.998360` destination whose explicit destination Array contains `[null, /XYZ, 54, 488, null]`, i.e. the destination page is `null`.
Hence this patch adds a bit more validation for that case. It also adds an additional check to ensure that the resulting `pageNumber` is non-negative, and finally a couple more error messages for existing cases of malformed data.
In general we neither want, nor can, support arbitrary `Launch` actions. But in practice, all the cases we've seen so far just contains relative URLs to other PDF files. Building on PR 7689, we can thus at least support basic `Launch` actions.
Note that in `FIREFOX/MOZCENTRAL/CHROME` builds of the standard viewer the `docBaseUrl` parameter will be set by default, since in that case it makes sense to use the current URL as a base.
For the `GENERIC` viewer, or the API itself, it doesn't make sense to try and set the `docBaseUrl` by default. However, custom deployments/implementations may still find the parameter useful.
Note that this will automatically reject any relative URL.
To make the API more useful to consumers, URLs that are rejected will be available via the `unsafeUrl` property in the data object returned by `PDFPageProxy_getAnnotations`.
The patch also adds a bit more validation of the data for `Named` actions.
This not only reduces code duplication, but it also allow us to easily support the same kind of URLs we currently do for Link annotations in the Outline as well.
This requires the `run-sequence` dependency because Gulp does not have a
way to run sequences of tasks inside a Gulp task. Gulp 4.0 will have
support for this, but until that is released this is the recommended way
to implement it.
The outline toggle button has a feature where it can be double-clicked
to expand/collapse all items shown therein. Although this is described
in the FAQ, can go potentially unnoticed. This, however, being a useful
feature, advertise on the tool tip itself.
l10n translation for en-US and IDs updated.
Signed-off-by: Jeenu Viswambharan <jeenuv@gmail.com>
Each well-formed SVG image has the following structure:
SVG element
- Definitions element
- Root group
- Other group 1
- ...
- Other group n
This patch factors out initialization code into a private method in such
as way that the creation of this structure is clear from the code. The
root group is the replacement for the parent group from before. We need
this group as we cannot apply the viewport transform on the SVG element
itself (this caused issues in Chrome). If other code appends groups to
the SVG image, in reality it is appending those groups to the root
group, but this detail is abstracted away by this patch.
This patch ensures that we only create transformation groups when it is
actually required and that we re-use transform groups as much as possible.
It reduces the number of transform groups for the Tracemonkey paper from
2790 to 1271, thereby making the DOM much lighter and rendering/scrolling
smoother. Moreover, it simplifies the code and prevents duplication.
Finally, we issue a warning when an unimplemented graphic state is
encountered. Before, this was ignored silently, making debugging harder.
This patch:
- resolves a warning in the console about missing character encoding;
- makes the viewer use the same background color and PDF file as the
regular viewer;
- simplifies the example to bring it more in line with the other
examples.
Let `Parser_makeFilter` pass in the `DecodeParms` data to various image `Stream`s, instead of re-fetching it in various `[...]Stream.prototype.ensureBuffer` methods
In `Parser_filter` the `DecodeParms` data is fetched and passed to `Parser_makeFilter`, where we also make sure that a `Ref` is resolved to a direct object.
We can thus pass this along to the various image `Stream` constructors, to avoid the current situation where we lookup/resolve data that is already available.
Note also that we currently do *not* handle the case where `DecodeParms` is an Array entirely correct in the various image `Stream`s, and this patch fixes that for free.
While the array argument to TJ should only contain strings and numbers, other
unfortunate items are found in PDFs in the wild, e.g.:
[(Grandes) 0.0 Tc
-250.0 (Client\350les,) 0.0 Tc
-250.0 (Financements) 0.0 Tc
-250.0 (et) 0.0 Tc
-250.0 (March\351s) ] TJ
getOperatorList already properly ignores any non-string, non-numeric values in
TJ arrays; without this patch to getTextContent, returned text items can have
NaN widths due to calculations being applied to those non-numeric values.
For PDF files with multiple `/Filter`s, where the `/Length` entry is zero, we fail to render the file correctly. The reason is that `maybeLength` is `null` for the every filter except the first, and `!maybeLength` is thus truthy.
Hence it seems that we should completely ignore the `/Length` entry and also explicitly check `maybeLength === 0`.
Note that I've not (yet) come across a PDF file with this issue in the wild, but given all the stupid things PDF generators do I wouldn't be surprised if such a file actually exists. In order to prevent a possible future bug, I'm submitting this patch which includes a hand-edited PDF file that we currently cannot render correctly (but e.g. Adobe Reader can).
Directly use the hexadecimal representation, just like the
`AnnotationFlags`, to avoid calculations and to improve readability.
This allows us to simplify the unit tests for text widget annotations as
well.
When opening a PDF file that triggers the browser fallback bar in the Firefox addon/built-in version, e.g. http://web.archive.org/web/20110918100215/http://www.irs.gov/pub/irs-pdf/f1040.pdf (with forms *disabled*), and then reloading the document an error can be thrown.
The reason is that displaying the fallback bar triggers 'resize' events, and they can arrive *before* the viewer has had a chance to run all the necessary initialization code.
Unfortunately PR 7633 missed, and I didn't catch it during review, to update `test/driver.js` such that the `forms` tests takes the correct code-path.
This resultet in the `forms` reference test images looking better than they should, and more problematicly differing from the rendering in the viewer.
With this patch, the tests now correctly skip over any `Appearance` streams.
The `forms` tests now highlights quite clearly (e.g. look at `annotation-tx2.pdf`/`annotation-tx3.pdf`) that we cannot just skip the `Appearance` streams when rendering forms. Hence we're going to have to find a way to fix that *before* enabling forms by default, since both display *and* print would look completely wrong otherwise.
Finally, this patch also uncovers one more existing bug that still needs to be fixed, since the current `rasterizeAnnotationLayer` in `test/driver.js` isn't able to handle the contents of e.g. `<input>` and `<textarea>`.
When debugging issue 7643, I noticed that the `forms` tests currently doesn't look like the rendering in the viewer (with `renderInteractiveForms = true` set).
After scratching my head for a little while, I realized that PR 7633 make the implicit assumption that `Page_getOperatorList` (in `core/document.js`) is called *before* fetching the annotation with `PDFPageProxy_getAnnotations` (in `display/api.js`).
Hence this patch, that changes it so that we instead pass in the `renderInteractiveForms` parameter to `Annotation_appendToOperatorList` to ensure that it's always correctly set.
A user encountered a response that looks like:
URL: some gibberish
Headers:
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="something.pdf"
In the Chrome extension, the "attachment" content disposition is almost
always ignored (i.e. the PDF Viewer will try to view it anyway). So we
need to fall back to the Content-Disposition header if the URL check is
inconclusive.
If interactive forms are enabled, then the display layer takes care of
rendering the form elements. There is no need to draw them on the canvas
as well. This also leads to issues when values are prefilled, because
the text fields are transparent, so the contents that have been rendered
onto the canvas will be visible too.
We address this issue by passing the `renderInteractiveForms` parameter
to the render task and handling it when the page is rendered (i.e., when
the canvas is rendered).
This patch improves the unit tests by testing the support for read-only
and multiline fields. Moreover, we add a reference test to ensure that
the text widgets are not only rendered, but also that their contents are
styled properly.
Finally, we perform minor improvements in `src/core/annotation.js`, for
example adding missing comments.
For embedded Type1 fonts without included `ToUnicode`/`Encoding` data, attempt to improve text selection by using the `builtInEncoding` to amend the `toUnicode` map (issue 6901, issue 7182, issue 7217, bug 917796, bug 1242142)
I intended to provide proper benchmarking results here, as outlined in https://github.com/mozilla/pdf.js/wiki/Benchmarking-your-changes, but after wasting a couple of hours over the weekend getting weird results I gave up.
It appears that there's a lot of, i.e. way too much, variance between subsequent runs of `text` tests for the results to be meaningful.
(Previously I've only benchmarked `eq` tests, so I don't know if the `text` tests has never worked well or if it's a newer problem. For reference, please see the results of back-to-back benchmark runs on the current `master` with a *very* simple manifest file: [link here].)
Instead I used `console.time/timeEnd` in `appendText` and `expandTextDivs` to be able to compare the performance with/without the patch. The entire viewer was (skip-cache) reloaded between measurements, and the result are available here: [link here].
Given the troubles I've had with benchmarking, I've not yet computed any statistics on the results (e.g. mean, variance, confidence intervals, and so on).
However, just by looking at the data I think it's safe to say that this patch first of all doesn't seem to regress the current performance. Secondly it certainly looks *very* likely that this patch actually improves the performance, especially for the one-glyph-per-text-div case (cf. issue 7224).
Re: issue 7584.
Even though this patch passes all tests (unit/font/reference) locally, including the new ones that I added in PR 7621, I'm still a bit nervous about modifying the code that choose the fallback encoding for fonts without an `/Encoding` entry.
Note that over the years this code has been changed on a number of occasions, see a possibly incomplete [list here], to deal with various cases of incorrect font data.
According to the PDF specification, see http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G8.1904184, it seems that we should fallback to the `StandardEncoding` for Nonsymbolic fonts.
There's obviously a risk that fixing this particular issue *could* break other PDF files for which we don't have tests. However I've tried to change the logic as little as possible in this patch, to hopefully reduce possible breakage.
Based on debugging numerous font issue, it seems that a lot of fonts actually set the Symbolic flag, even when they are in fact *not* Symbolic. Fonts actually marked as Nonsymbolic seem to be somewhat less common, which I hope should reduce the risk of the patch somewhat.
Fixes 7580.
Note that in `parseCodestream` I purposly left the `throw new Error` instances inside of the `try` block, since we don't want to throw any `Errors` while in recovery mode.
Finally somewhat unrelated to the rest of the patch, but I moved the `doNotRecover` variable declaration outside of the `try` block to avoid variable hoisting given that it's accessed inside the `catch` block.
Note that in order to prevent any possible issues, this patch does *not* try to amend the `toUnicode` data for Type1 fonts that contain either `ToUnicode` or `Encoding` entries in the font dictionary.
Fixes, or at least improves, issues/bugs such as e.g. 6658, 6901, 7182, 7217, bug 917796, bug 1242142.
Moreover, we refactor the code a bit to extract code that is shared
between the two branches and we only apply text alignment (and create
the array) when it is actually defined, since it's optional and left is
already the default.
I've started to look into how we can fix issue 7580, but quickly became worried that fixing it could easily mean that we'd trade one fixed PDF file for a multitude of broken ones.
Hence I started going through the history of the code that choose the fallback encoding, and noticed that it has been changed a number of times over the years to deal with various cases of weirdness/errors in non-embedded fonts.
To my relief it turned out that almost all the PRs, please see a possibly incomplete [list here], that changed this code actually included `eq` test-cases.
However, in one case it appears that a PR missed to add a test-case. Furthermore since the fallback encoding may also be the only source for creating a `toUnicode` map, changing the encoding could possibly regress only the text-selection despite a PDF file still rendering correctly.
Therefore, this PR adds one new `eq` test, and also a number of additional `text` tests for PDF files already present in the test-suite.
Note that it's obviously possible that there's a certain overlap between the added tests, but I'd be *a whole lot* more concerned with causing regressions.
Don't duplicate the first entry in the `charCodeToGlyphId` map for CIDFontType2 fonts with a `CIDToGIDMap` that already mapped the first entry to a non-zero `glyphId` (issue 7544)
There's really no good reason to attempt to adjust the `max-height` of the `SecondaryToolbar` when it's not visible, so let's not do that anymore.
Also, we can listen for the `resize` event in `SecondaryToolbar`, to avoid having to manually call the `max-height` adjust function from various event handlers in `app.js`. Please note that by always adjusting the `max-height` when the toolbar is opened we no longer need the `localized` event, since it was mainly used to set a correct inital `max-height` value.
Please note that this is a hack, but I think that it should be OK for now to atleast get the preference landed. Refer to the code comment for further information.
Re: issue 7584 and PR 7586.
This patch is the first step towards implementing support for
interactive forms (AcroForms). It makes it possible to render text
widget annotations exactly like Adobe Reader/Acrobat.
Everything we implement for AcroForms is disabled by default using a
preference, mainly because it is not ready to use yet, but has to
implemented in many steps to avoid complexity. The preference allows us
to work with the code while not exposing the behavior by default. Mainly
storing entered values and printing them is still absent, which would be
minimal requirements for enabling this by default.
This patch is yet another instalment in the (never ending) series of patches for PDF files that specify completely incorrect Type/Subtype for its fonts. In this case Type1/Type1C, when in fact OpenType would have been correct.
Fixes 7598.
Currently, we only support text widget annotations (field type 'Tx')
partially. However, the current code does not make this entirely clear
and does not provide a warning when an unsupported field type is
encountered, making it harder to determine why rendering fails.
Moreover, in the display layer we make no distinction between the
various types of widget annotations, causing the code for text widget
annotations to also be executed for other types of widget annotations in
a fallback situation.
This patch improves the structure of the widget annotation code. In the
core layer, we use the same structure we use for non-widget annotations
in the factory and provide a clear warning when an unsupported type is
encountered. In the display layer, we do the same and split the
`WidgetAnnotationElement` class into two classes, namely
`TextWidgetAnnotationElement` for text widget annotations and
`WidgetAnnotationElement` for other unsupported annotations as a
fallback. From this it clear that we only support text widget
annotations and nothing else.
In the case where the document was destroyed, we were rejecting the `Promise` in `JpegDecode` with a string instead of an `Error`. The patch also brings the wording more inline with other such rejections.
Use the `isInt` utility function when validating the `pageNumber` parameter in `WorkerTransport_getPage`, to make it more obvious what's actually happening. There's also a couple more unit-tests added, to ensure that we always fail in the expected way.
Finally, we can simplify the rejection handling in `WorkerTransport_getPageIndexByRef` somewhat. (Note that the only reason for using `catch` here is that since the promise is rejected on the worker side, the `reason` becomes a string instead of an `Error` which is why we "re-reject" on the display side.)
This patch avoids having to calculate the scale twice by saving it in
the properties object.
Moreover, we remove a temporary variable and place parentheses around a
calculation inside a string concatenation.
This allows us to remove the `try/catch` statements used in `src/core/stream.js` when parsing JPEG images.
As far as I can tell, the only reason for the current usage of plain `throw` is that `jpg.js` originally was external code. Given that this code now lives in our repo, this patch brings the JPEG code more in line with e.g. `src/core/jpx.js` and `src/core/jbig2.js`.
Assign the `quantizationTables` after parsing the entire JPEG image, to prevent issues when the DQT (Define Quantization Tables) marker is encountered after SOF{n} (Start of Frame) markers (issue 7406)
This patch improves performance by avoiding unnecessary type
conversions, which also help the JIT for optimizations.
Moreover, this patch fixes issues with the div expansion code where
`textScale` would be undefined in a division. Because of the `dataset`
usage, other comparisons evaluated to `true` while `false` would have
been correct. This makes the expansion mode now work correctly for cases
with, for example, each glyph in one div.
The polyfill for `WeakMap` has been provided by @yurydelendik.
Move the `Preferences` initialization/fetching code to the top of `PDFViewerApplication.initialize`, and add a `enhanceTextSelection` preference to the viewer
We pass many parameters to `appendText` while we might as well pass the
`task` object that contains them. This saves a few lines of code and
makes the signature of `appendText` more clear. We do the same for
`expand`, which is useful for the next commit in which we replace
`div.dataset` with a `WeakMap`.
Furthermore, this patch adds a missing parameter to a comment block to
make it clear which parameters remain.
When clicking on the `pageNumber` input, or when using the keyboard shortcut <kbd>Ctrl</kbd>+<kbd>Alt</kbd>+<kbd>G</kbd>, the element isn't just focused but its contents is actually *selected* so that the user doesn't need to clear the input before entering a new `pageNumber`.
However, the `GoToPage` named action is only using `focus`, so let's change this to be consistent.
The following document can be used for testing: http://www2.informatik.uni-freiburg.de/~frank/ENG/beamer/example/Beamer-class-example1.pdf#zoom=page-fit
[screeenshot]
1. Expanding divs to improve text selection. (Yury)
2. Adding enhanceTextSelection as an option.
3. Moving feature functionality from text_layer_builder.js to text_layer.js.
4. Added expandTextDivs method to only load expanded divs on first click, and only show on subsequent clicks
The PDF file contains an image that we're allowed to use, since it's just the PDF.js logo.
The logo image was simply inverted (so that it requires a /Decode entry in the image dictionary that triggers the use of `jpg.js` instead of the browser), converted to JPEG, and finally edited by hand to change the order of the DQT/SOF{n} markers.
Ensure that the zoom buttons are disabled correctly if the `scale` is smaller/larger than `MIN_SCALE/MAX_SCALE` in `PDFViewerApplication._updateUIToolbar`
In the `zoom{In, Out}` functions in `PDFViewerApplication`, we prevent the zoom value from becoming smaller/larger than `MIN_SCALE/MAX_SCALE`.
However, if the user sets the zoom level through the hash parameter the zoom buttons might not be correctly disabled; try http://mozilla.github.io/pdf.js/web/viewer.html#zoom=10.
Note that this issue has been present since "forever", but given that the solution is so simple I think that we should just fix this. (I'm also fixing a stupid typo I previously made in the JSDoc comment.)
Since the `mobile-viewer` example is based on the old FirefoxOS/B2G PDF viewer, it didn't need to have the same kind of `open/close` methods as the default viewer.
However, now that it has been re-purposed as a simple `mobile-viewer` example, it seems like a good idea to ensure that it has proper asynchronous `open/close` methods.
Fixes 7571.
When adding new entries to `ProblematicCharRanges`, you have to be careful to not make any mistakes since that could cause glyph mapping issues.
Currently the existing reference tests should probably help catch any errors, but based on experience I think that having a unit-test which specifically checks `ProblematicCharRanges` would be both helpful and timesaving when modifying/reviewing changes to this code.
Hence this patch which adds a function (and unit-test) that is used to validate the entries in `ProblematicCharRanges`, and also checks that we don't accidentally add more character ranges than the Private Use Area can actually contain.
The way that the validation code, and thus the unit-test, is implemented also means that we have an easy way to tell how much of the Private Use Area is potentially utilized by re-mapped characters.
Instead of having `Parser_getObj` fail unconditionally for the referenced PDF file, this patch attempts to let searching for the main trailer continue even if there are errors.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1250079.
This is similar to the existing `isCmd` and `isDict` functions, which already support similar kind of checks.
With the updated `isName` function, we'll be able to simplify many callsites from: `isName(someVariable) && someVariable.name === 'someName'` to: `isName(someVariable, 'someName')`.
With the viewer code now being split into various components/files, having an obsolete comment in `PDFViewer` that references thumbnails despite there being no other mentions of them in the entire file seems strange.
*Note:* This comment is simply a left-over from older versions of PDF.js, where the *entire* default viewer code was placed in just one file (and where we unconditionally created thumbnails, regardless whether they were visible or not).
There are PDF generators which create destinations with e.g. too large top values, which cause the wrong page to be scrolled into view because the offset becomes negative.
By ignoring negative offsets, we can prevent this issue, and get a similar behaviour as in Adobe Reader.
However, since we're also using `PDFViewer_scrollPageIntoView` in more cases than just when links (in the document/outline) are clicked, the patch adds a way to allow the caller to opt-out of this behaviour.
In e.g. the following situations, I think that we still want to be able to allow negative offsets: when restoring a position from the `ViewHistory`, when the `viewBookmark` button is used to obtain a link to the current position, or when maintaining the current position on zooming.
Rather than adding another parameter to `PDFViewer_scrollPageIntoView`, I've changed the signature to take an parameter object instead. To maintain backwards compatibility, I've added fallback code enclosed in a `GENERIC` preprocessor tag.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=874482.
With PR 7502 we no longer dispatch an event when the `val` is out of bounds, so to better communicate why nothing happens this patch logs an error in that case (similar to the logging of errors when trying to set an invalid scale).
The way that the default viewer is currently implemented, means that e.g. keyboard short-cuts could trigger the new error. Hence this patch also adds the necessary validation code, both to `app.js` and `pdf_link_service.js` to prevent unnecessary errors.
The AMO extension is no longer supported as PDF.js is integrated in
Firefox. It was already removed from the bot's preview messages around a
year ago and has not been used or updated since.
This patch improves the performance of issue 5808, but I'm not sure if it's enough to call it fixed. On average, this patch reduces the number of textLayer div's by a factor of 3, and it also reduces the time spend in `getTextContent` by a factor of ~2.
The PDF file is generated by `Scribus PDF`, which for reasons I cannot understand is placing redundant `Tf` commands before *every* showText command.
Note how the PDF file also contains lots of (basically) identical fonts, but with slightly different names, which causes unnecessary font-switching. This causes some unnecessary breaking of textLayer div's, but this issue cannot be easily worked around.
[api-minor] Add a parameter to `PDFPageProxy_getTextContent` that controls whether `PartialEvaluator_getTextContent` will attempt to combine same line text items
Note that I used a separate warning message for this case, instead of utilizing the same one as in the unsupported subtype case, to more clearly indicate that the PDF file itself is to blame rather than PDF.js.
Fixes 7446.
This patch attempts to cleanup a couple of things:
- Remove the `previousPageNumber` paramater. Prior to PR 7289, when the events were dispatched even when the active page didn't change, it made sense to be able to detect that in an event listener. However, now that's no longer the case, and furthermore other similar events (e.g. `scalechanging`/`scalechange`) don't include information about the previous state.
- Don't dispatch the events when the value passed to `set currentPageNumber` is out of bounds. Given that the active page doesn't change in this case, again similar to PR 7289, I don't think that the events should actually be dispatched in this case.
- Ensure that the value passed to `set currentPageNumber` is actually an integer, to avoid any issues (note how e.g. `set currentScale` has similar validation code).
Given that these changes could possibly affect the PDF.js `mochitest` integration tests in mozilla-central, in particular https://dxr.mozilla.org/mozilla-central/source/browser/extensions/pdfjs/test/browser_pdfjs_navigation.js, I ran the tests locally with this patch applied to ensure that they still pass.
Fonts that are not referenced by `Ref`s are very uncommon in practice, but it can unfortunately happen. In this case, we're currently not caching them in the usual way, i.e. by `Ref`, which leads to failures when a page is rendered after `cleanup` has run.
The simplest solution would have been to remove the `font.translated` workaround, but since this would have meant loading these kind of fonts over and over, the patch attempts to be a bit clever about this situation.
Note that if we instead loaded fonts per *page*, instead of per document, this issue wouldn't have existed.
Originally, I was just going to change this code to use `Ref_toString` in a couple more places. When I started reading the code, I figured that it wouldn't hurt to clean up a couple of comments. While doing this, I noticed that the logic for the (rare) `isDict(fontRef)` case could do with a few improvements.
There should be no functional changes with this patch, but given the added reference checks, we will now avoid bogus `Ref`s when resolving font aliases. In practice, as issue 7403 shows, the current code can break certain PDF files even if it's very rare.
Note that the only thing that this patch will change, is the `font.loadedName` in the case where a `fontRef` is a reference *and* the font doesn't have a descriptor. Previously for `fontRef = Ref(4, 0)` we'd get `font.loadedName = 'g_d0_f4_0'`, and with this patch `font.loadedName = g_d0_f4R`, which is actually one character shorted in most cases. (Given that `Ref_toString` contains an optimization for the `gen === 0` case, which is by far the most common `gen` value.)
In the already existing fallback case, where the `fontName` is used to when creating the `font.loadedName`, we allow any alphanumeric character. Hence I don't see how (as mentioned above) e.g. `font.loadedName = g_d0_f4R` would be an issue here.
From the discussion in issue 7445, it seems that there may be cases where an API consumer would want to get the text content as is, without combined text items.
After PR 7039, the PDF file in issue 7492 no longer renders at all, but note that text selection wasn't working correctly previously.
The problem with the PDF file in issue 7492 is that the `cMap`, in the `toUnicode` entry in the font, contains an invalid name:
```
/CMapName /-usr-share-fonts-truetype-Panton-Panton Family-Fontfabric - Panton.otf,000-UTF16 def
```
When we parse that line, things obviously break because there are spaces present in the wrong places.
To avoid that issue, the patch simply lets `parseCMap` continue when errors are encountered, to try and recover usable data. Note that by not aborting immediatly when an error is encountered, we are also able to fix the text selection.
Obviously, it could be argued that we should just immediatly reject a corrupt `cMap`. But given that they usually are correct, it seems that trying to recover as much data as possible from corrupt one can only be a good thing for both glyph mapping and text selection.
Fixes 7492.
In the PDF file in the issue, some of the glyphs end up being mapped to the Lepcha Unicode block; see https://en.wikipedia.org/wiki/Lepcha_(Unicode_block).
This didn't use to matter, but after HarfBuzz updates that improved support for Lepcha fonts, in particular https://bugzilla.mozilla.org/show_bug.cgi?id=1249861, some glyphs are now moved horizontally.
To avoid that, this patch adds the Lepcha block to the list of Unicode ranges that we skip when building the glyph mapping.
Fixes 7426.
After PR 7289, we'll now reset the current page view in cases where I don't think we should. To avoid this, this patch ensures that we'll not modify the position when the page number is out-of-bounds.
**STR:**
1. Open http://mozilla.github.io/pdf.js/web/viewer.html#page=1&zoom=auto,-98,696
2. Enter an invalid number, e.g. `1000`, in the `pageNumber` input.
**ER:**
The current position in the document shouldn't change, since the page number wasn't valid.
**AR:**
The document resets to the top of the page `1`.
After PR 7441, where `recoverGlyphName` is used a lot more than before, many PDF files will generate a lot of warnings the console. For normal usage, compared to debugging/development, this is probably more annoying than helpful.
Fallback to attempt to recover standard glyph names when amending the `charCodeToGlyphId` with entries from the `differences` array in `type1FontGlyphMapping` (issue 7439)
The method signature was improved in PR 6546, which was included in the `1.2.109` release from last November.
Hence I think that we should now be able to remove the fallback code for the old method signature. Note that this patch now throws an `Error` in `GENERIC` builds, to clearly indicate that the `open` callsite must be modified.
In PR 7273, we simplified the `MOZCENTRAL` specific check for fullscreen support. Unfortunately I've just, by coincidence, found out about https://bugzilla.mozilla.org/show_bug.cgi?id=1268749, which disabled the unprefixed fullscreen support in release versions of Firefox. If we do nothing, this will lead to Presentation Mode being unavailable in future Firefox releases.
This patch should fix things for now, and I'm afraid that we'll need to uplift it to Firefox 49 as well.
With the changes in PR 7289, we no longer dispatch a 'pagechanging' event on load. Since most PDF documents open on the first page, this means that the `previous` and `firstPage` buttons are no longer correctly disabled.
To avoid this, this patch moves the code that updates various UI toolbar state into one method, which is then called on document initialization and from the various existing event handling functions.
To reduced the amount of compatibility code that we need to maintain in the addon, I think that we should be more explicit about which Firefox versions we intend to support. In the future, this should allow us to more quickly remove old compatibility code.
Furthermore, this patch also tries to be more explicit about the addon support for Seamonkey, by defining it in terms of the underlying Firefox version.
Move the Chromium-specific URL rewriting code to the top of viewer.js
(before the PDF.js library) to make sure that the URL is as expected.
This ensures that variables that synchronously depends on the URL
(e.g. PDFViewerApplication.initialBookmark) are properly set.
Currently `importl10n` fails, since mxr.mozilla.org has been down for a while. Based on discussions in #developers, it isn't clear to me if/when it'll be available again.
Furthermore, I'm not sure why we're even getting the l10n files through a code search tool, rather than just using the official Mozilla repository directly.
From the discussion in issue 7386, it wasn't really clear if we can restrict addon support to Firefox `45` (i.e. the version that corresponds to the *current* ESR version).
However, we have a bunch of code for *very* old Firefox versions. Hence this patch changes the minimum supported version to Firefox `38` (which was released on `2015-05-12`, and correspond to the *previous* ESR version), and removes code that only applies to old Firefox versions.
Regardless what we end up deciding regarding addon support for previous Firefox versions, given the amount of code that even the Firefox `>= 38` condition lets us remove, I certainly think that there is value in doing this.
Test:
1. Build the Chrome extension and load it.
2. Visit https://robwu.nl/pdfjs/object-embed.html
3. Verify that all displayed blocks have the same width and
height as the reference ("Expected dimension").
In fonts with only upper-case glyphs, that are also missing a space glyph, `get spaceWidth` won't be able to return anything useful.
By adding upper-case `I` as a fallback, we can thus improve text-selection in some PDF files.
Note that locally, the patch causes slight movement in a few existing `text` tests, but in my opinion this actually looks like slight improvements.
Fixes 7180.
Currently the `isSpace` utility function is a member of `Lexer`, which seems suboptimal, given that it's placed in `core/parser.js`. In practice, this means that in a number of `core/*.js` files we thus have an *otherwise* completely unnecessary dependency on `core/parser.js` for a one-line function.
Instead, this patch moves `isSpace` into `shared/util.js` which seems more appropriate for this kind of utility function. Not to mention that since all the affected `core/*.js` files already depends on `shared/util.js`, this doesn't incur any more file dependencies.
Privacy policy: https://github.com/Rob--W/pdfjs-telemetry#privacy-policy
Unit tests (offline):
```
node test/chromium/test-telemetry.js
```
Server tests (requires that Nginx is installed):
```
git clone https://github.com/Rob--W/pdfjs-telemetry.git
cd pdfjs-telemetry/
python testserver.py TestHttp TestHttps
```
Integration test (extension + server):
- Build the extension
- Edit build/chromium/telemetry.js and remove the check for
chrome.runtime.id.
- Start Chrome (preferably a new profile):
chromium --user-data-dir=/tmp/pdftest --no-first-run
- Open chrome://net-internals#events
- Visit chrome://extensions and enable Developer mode.
- Load unpacked extension, select build/chromium.
- Go to the chrome://net-internals tab and filter on pdfjs.robwu.nl.
- Click on URL_REQUEST and verify that the server replied with 204.
- Reload the extension.
- Verify that chrome://net-internals did not contain a new log request.
Currently `setGState` is completely broken, and looking through the history of that code, it seems to me that this may never have worked correctly.
This patch fixes the text-selection in `extgstate.pdf` in the test-suite, which is also added as a `text` test.
Fixes http://www.pdf-archive.com/2013/09/30/file2/file2.pdf.
Note how it's not possible to show the various Popup Annotations in the above document.
To fix that, this patch lets the Popup inherit the flags of the parent, in the special case where the parent is `viewable` *and* the Popup is not.
In general, I don't think that a Popup must have the same flags set as the parent. However, it seems very strange to have a `viewable` parent annotation, and then not being able to view the Popup.
Annoyingly the PDF specification doesn't, as far as I can find, mention anything about how this case should be handled, but this patch seem consistent with the actual behaviour in Adobe Reader.
Use chrome.storage.sync to store preferences instead of
chrome.storage.local, to allow settings to be synchronized if the user
chooses to sign in in Chrome and enables synchronization of extension
preferences.
Commit df10513e10 unfortunately broke the options dialog of the Chromium extension because the logic required to work with the preference was not added. This patch adds the required logic to show the preference in the options dialog and to persist it to the preferences storage.
Verified using Chromium 50 on Arch Linux.
Currently for explicit destinations, compared to named destinations, we manually try to build a hash that often times is a quite poor representation of the *actual* destination. (Currently this only, kind of, works for `\XYZ` destinations.)
For PDF files using explicit destinations, this can make it difficult/impossible to obtain a link to a specific section of the document through the URL.
Note that in practice most PDF files, especially newer ones, use named destinations and these are thus unnaffected by this patch.
This patch also fixes an existing issue in `PDFLinkService_getDestinationHash`, where a named destination consisting of only a number would not be handled correctly.
With the added, and already existing, type checks in place for destinations, I really don't think that this patch exposes any "sensitive" internal destination code not already accessible through normal hash parameters.
*Please note:* Just trying to improve the algorithm that generates the hash is unfortunately not possible in general, since there are a number of cases where it will simply never work well.
- First of all, note that `getDestinationHash` currently relies on the `_pagesRefCache`, hence it's possible that the hash returned is empty during e.g. ranged/streamed loading of a PDF file.
- Second of all, the currently computed hash is actually dependent on the document rotation. With named destinations, the fetched internal destination array is rotational invariant (as it should be), but this will not hold in general for the hash. We can easily avoid this issue by using a stringified destination array.
- Third of all, note that according to the PDF specification[1], `GoToR` destinations may actually contain explicit destination arrays. Since we cannot really construct a hash in `annotation.js`, we currently have no good way to support those. Even though this case seems *very* rare in practice (I've not actually seen such a PDF file), it's in the specification, and this patch allows us to support that for "free".
---
[1] http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.1951685
Currently the `getPageIndex` method will happily return `0`, even if the `Ref` parameter doesn't actually point to a proper /Page dictionary.
Having the API trust that the consumer is doing the right thing seems error-prone, hence this patch which adds a check for this case.
Given that the `Catalog_getPageIndex` method isn't used in any hot part of the codebase, this extra check shouldn't be a problem.
(Note: in the standard viewer, it is only ever used from `PDFLinkService_navigateTo` if a destination needs to be resolved during document loading, which isn't common enough to be an issue IMHO.)
These have been found using `gulp lint` in combination with the `unused:
true` parameter for JSHint. Unfortunately there are too many false
positives to enable this feature, but now that most globals have been
removed because of the conversion to UMD the results are much more
useful than before.
This was only ever useful for the Opera extension because the API
requires a whitelisted extension ID. Opera ditched PDF.js from their
extension gallery, so we don't need to keep this in the tree.
I've seen the above error occasionally when the scale is updated many times in quick succession, but I've not been able to pinpoint exactly why it happens.
Since the error isn't caught, this means that the `pageViewDrawCallback` function doesn't run to completion.
Unfortunately, given the very intermittent nature of the issue, I haven't got any good STR for reliably reproducing this issue. However, I hope that this patch can be accepted anyway, since it's simple and should help prevent unnecessary errors.
In the font in question, there are a couple of `topDict` entries that have invalid values (`0xF 0xF`, i.e. just eof markers without any actual numbers).
This causes the `parseFloatOperand` function, inside `CFFParser_parseDict`, to return `NaN`. Currently we pass this broken font onto the browser, which OTS unsurprisingly rejects.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1068432.
As evident from e.g. PRs 6485 and 7118, some bad PDF generators unfortunately create Arrays where *some* elements are indirect objects (i.e. `Ref`s). This seems to mostly affect Arrays that contain numbers, such as e.g. `Matrix/FontMatrix/BBox/FontBBox/Rect/Color/...`, and has manifested itself in PDF files that fail to render correctly (some elements are missing).
The problem in both the cases above, besides broken rendering, was that there were *no* errors/warnings that indicated what the problem was, making it difficult to pinpoint the issue.
Hence this patch, where I've audited all usages of `Dict_get` in `src/core/` files, and replaced it with `Dict_getArray` where appropriate to try and prevent unnecessary future bugs.
Re: issue 7261.
Given the we have `gulp fonttest`, which tests the `fonts.js` functionality at a higher level, and that we have *a lot* of font specific reference tests, I'm not convinced that we *also* need unit-tests for it.
We're already, since quite some time, using the standard Fullscreen API provided that it's available in the browser. The warning is only caused by the code that checks if the Fullscreen API is supported.
This patch uses a simple preprocessor tag to avoid the warning, since I'm assuming that in general, we want to try and remain backwards compatible with the prefixed versions of the Fullscreen API.
Fixes 7270.
Multiple shadow roots are not supported any more in Chrome 51+
(https://crbug.com/603448#c6), so this patch changes the way that PDFs
are rendered in `<embed>` / `<object>` tags.
I used shadow roots because their content is not visible from the web
page, so the odds of conflicts were minimal. Now I have to render the
PDF frame directly in the page, which can be observed from the page
(unfortunately).
Now the following happens when an embedded PDF tag is detected:
- `<embed>` tags: The type and src attributes are updated.
- `<object>` tags: The type attribute is changed and the fallback
content is set and displayed.
When Firefox is run in e10s mode, which will soon be the default, the PDF.js zoom dropdown menu doesn't look right. This is apparently because the `<select>` DOM element is rendered in the parent, and that all the necessary style information isn't sent up from the child. See the discussion in https://bugzilla.mozilla.org/show_bug.cgi?id=910022.
Besides this causing the PDF.js UI to *look* worse in e10s, notably it also means that the `customScaleOption` isn't hidden like it ought to be.
To work-around that, this patch utilizes the `hidden` attribute, since https://bugzilla.mozilla.org/show_bug.cgi?id=1242450 at least made that work in e10s.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1194700.
This naming issue has been present since PR 3529, but at least I cannot find any issues/bugs that seem to have been caused by it, which is good.
The patch also removes an unnecessary `else` branch, since an already existing `break` means that it's redundant.
Fixes 7232.
Furthermore we introduce two new methods named `setCallback` and
`setReason` so external code does not change the properties of the class
directly. Finally we update various names of properties and methods to
be more self-explanatory.
With the recent PR 7172, which made the viewer modular, there's now a couple of modules that are no longer easily accessible (e.g. through the console).
This can make testing/debugging more difficult, and means that e.g. https://github.com/mozilla/pdf.js/wiki/Debugging-PDF.js#enabling no longer works in the generic viewer.
For now, as a simple solution, this patch just exposes those non-classes on `PDFViewerApplication` to ensure that they are available, and to avoid polluting the `window` scope.
Persist the state of content sidebar while browsing away from viewer and
initializing the same on returning back to the viewer. The state is saved
in persistent store preferences and used upon viewer initialization.
Fixes#6935
We cannot piggy-back on the `updateviewarea` event in order to update the stored sidebar state, since there're a number of cases where opening/switching the sidebar view won't fire a `updateviewarea` event.
Note that `updateviewarea` only fires when the position changes in the *viewer*, which means that it won't fire if e.g. the viewer is narrow, such that the sidebar overlays the document transparently; or when switching views, without the document position also changing.
This patch also moves the handling of `forceOpen` parameter in `PDFSidebar_switchView`, to prevent triggering back-to-back rendering and dispatching of events.
We currently don't have *any* unit-tests for `LinkAnnotation`s, so it seemed a good idea to add a few. These tests are taken from various actual PDF files.
- Add support for the 'NewWindow' property.
- Ensure that destinations are applied to the *remote* document, instead of the current one.
- Handle the `F` entry being a standard string, instead of a dictionary.
This patch also adds/improves utility functions for checking if the passwords are correct/incorrect, and replaces `string2binary` with `stringToBytes`. Finally the patch does away with the `DictMock`, in favour of using actual `Dict`s.
Re: issue 6905.
Using `new {Name,Cmd}` should be avoided, since it creates a new object on *every* call, whereas `{Name,Cmd}.get` uses caches to only create *one* object regardless of how many times they are called.
Most of these are found in the unit-tests, where increased memory usage probably doesn't matter very much. But it still seems good to get rid of those cases, since no part of the codebase ought to advertise that usage.
Given the small size of the patch, I'm also tweaking a few comments and class names.
This is a regression from PR 7097.
(Also, out of scope for this PR, but I think that a `setTimeout` value of `1000 ms` is too large. Switching from scrolling to zooming can fell sluggish, and give the impression that nothing happens.)
Previously the commands were not properly parsed as such by GitHub
because they need to be indented with four spaces.
Furthermore, address some minor textual nits.
In the API unit-tests, we're currently loading the `basicapi.pdf` before every sub-test in `PDFDocument` and `Page`, which slows down the unit-tests quite a bit.
Locally this patch reduces the run time for `gulp unittest` by at least 40% for me.
Some browsers render certain special characters with width 0, others with strictly positive width. (For example, the Greek Delta, Δ, has width 0 in Google Chrome, and a positive width in Firefox.) The `if` clause in operation so far results in different text layer DOM trees for different browsers.
This commit fixes that by adding the elements independently of their width.
Note that in the PDF files provided by the reporter, this issue was limited to `Rect` arrays in AcroForm entries (which we currently don't support).
However, since a bad PDF generator could create this problem in *any* kind of annotation, the reduced test-case included here uses a simple LinkAnnotation instead.
Fixes 7115.
According to "The table directory" under https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6.html#Directory, TrueType font tables should have `uint32` checksums.
This is something that I noticed, and was initially confused about, while debugging a TrueType issue.
As far as I can tell, the current (`int32`) checksums we use doesn't cause any issues in practice. However, I do think that this should be addressed to agree with the specification, and to reduce possible confusion when reading the font code.
This is regression from PR 7063, causing `gulp importl10n` to fail:
```
$ gulp importl10n
[19:45:02] Using gulpfile c:\Users\Jonas\Git\pdfjs\gulpfile.js
[19:45:02] Starting 'importl10n'...
Downloading ach...
[19:45:02] 'importl10n' errored after 4.42 ms
[19:45:02] Error: EEXIST, file already exists 'c:\Users\Jonas\Git\pdfjs\l10n\ach
'
at Error (native)
at Object.fs.mkdirSync (fs.js:747:18)
at downloadLanguageFiles (c:\Users\Jonas\Git\pdfjs\external\importL10n\local
es.js:59:8)
at next (c:\Users\Jonas\Git\pdfjs\external\importL10n\locales.js:90:5)
at Object.downloadL10n (c:\Users\Jonas\Git\pdfjs\external\importL10n\locales
.js:91:5)
at Gulp.<anonymous> (c:\Users\Jonas\Git\pdfjs\gulpfile.js:92:11)
at module.exports (c:\Users\Jonas\Git\pdfjs\node_modules\gulp\node_modules\o
rchestrator\lib\runTask.js:34:7)
at Gulp.Orchestrator._runTask (c:\Users\Jonas\Git\pdfjs\node_modules\gulp\no
de_modules\orchestrator\index.js:273:3)
at Gulp.Orchestrator._runStep (c:\Users\Jonas\Git\pdfjs\node_modules\gulp\no
de_modules\orchestrator\index.js:214:10)
at Gulp.Orchestrator.start (c:\Users\Jonas\Git\pdfjs\node_modules\gulp\node_
modules\orchestrator\index.js:134:8)
```
Pass the `PDFJS.postMessageTransfer` parameter to the worker, so that the `MessageHandler` can be setup correctly in `createDocumentHandler` (issue 6957)
Currently there's a lot of duplicate code for non-embedded `toFontChar`, which this patch simplifies by extracting the code into a helper function instead.
This patch adds a `getUnicodeForGlyph` helper function, which is used to recover Unicode values for non-standard glyph names.
Some PDF generators, e.g. Scribus PDF, use improper `uniXXXX` glyph names which breaks the glyph mapping. We can avoid this by converting them to "standard" glyph names instead.
Fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1132849.
Fixes 6893.
Fixes 6894.
[PDFThumbnailView] Re-factor the `canvas` to `image` conversion such that we always render to a `canvas`, and then replace it with an `image` once rendering is done
In the PDF file in question, some of the 'name' table entries have `record.length === 0`. This becomes problematic in the non-unicode case, since `font.getBytes(0)` will fetch the *entire* stream.
Given that OTS rejects 'name' entries larger than `2^16`, this thus explain the sanitizer errors.
Fixes 7020.
Functionality wise, `ensureThumbnailVisible` is essentially just a shorthand for `scrollThumbnailIntoView`. (And note that `PDFViewer` doesn't implement a `ensurePageVisible` method.)
The only remaining usage of `PDFThumbnailViewer_ensureThumbnailVisible` is inside `PDFPresentationMode`, which introduces an otherwise unnecessary `PDFThumbnailViewer` dependency there.
We're already relying on the `presentationmodechanged` event, in various files, to track the state of Presentation Mode. Thus we can simply listen for that event in `PDFSidebar` too, and update the thumbnails if necessary.
*This is a follow-up to PRs 6299 and 6441.*
The patch also adds an option to `PDFThumbnailView`, that disables the canvas-to-image conversion entirely, which might be useful in the context of issue 7026.
The sidebar code has, except for minor fixes/additions (such as attachments), been largely untouch for years.
To avoid having a bunch of sidebar code sprinkled throughout viewer.js, this patch moves the sidebar code into a separate file (pdf_sidebar.js), similar to how most other functionality has been moved in the last few years.
Besides simply moving code around, this patch also has the added benefit that we now keep track of the sidebar state (not just opened/closed).
This now makes it possible to handle both `Preferences` *and* `ViewHistory` settings for the sidebar state in a cleaner way, preventing strange and confusing interactions between the two.
Changes `PDFOutlineView`/`PDFAttachmentView` to be initialized once, since we're always creating them, and refactor their `render` methods to instead pass in the `outline`/`attachments`.
For consistency with other "classes", the `PDFOutlineView`/`PDFAttachmentView` are renamed to `PDFOutlineViewer`/`PDFAttachmentViewer`.
Also, make sure that the outline/attachments are reset when the document is closed. Currently we keep the old ones around until the `getOutline`/`getAttachments` API calls are resolved for a new document.
Re: issue 7017.
This should prevent the error, but does not fix the broken rendering.
The rendering issues are caused by `svg.js` not supporting the various text rendering modes, in this case specifically "invisible" (as indicated in the console `Warning: Unimplemented method setTextRenderingMode`).
Compared to the canvas case, where we just ignore invisible text, a smarter solution is probably required for the SVG case. Since just ignoring invisible text in `svg.js` would mean that text-selection isn't possible.
To reduce code duplication, the initialization code now uses the `reset` method.
Also, this patch moves `charactersToNormalize` out of `PDFFindController`, since it seemed better suited to be a "constant".
Open http://www.puolustusvoimat.fi/wcm/61ba4180411e702ea19ee9e364705c96/luonnonmuonaohjelmalumo1985.pdf?MOD=AJPERES#pagemode=bookmarks.
Note how the outline looks entirely empty, but hovering over it you'll see that there are entries present. There's two separate issues here, first of all the fact that you cannot visually make out the outline items, and secondly that the lack of text means that the clickable area becomes too small.
In Adobe Reader this issue is somewhat mitigated, since they use an icon for every item. For PDF.js, the easist way to address this seems to be making use of a default title.
This issue should be *very* rare in practice, but given the simplicity of the solution I think that we should fix this.
The `hasImage` property is a left-over from older thumbnail code, and has been made obsolete by `renderingState`.
Having two different properties tracking (basically) the same state is asking for trouble, since it's very easy to forget to update one of them, with annoying bugs as the result.
This is required to be able to use it in the annotation display code,
where we now apply it to sanitize the filename of the FileAttachment
annotation. The PDF file from https://bugzilla.mozilla.org/show_bug.cgi?id=1230933 has shown that some PDF generators include the path of the file rather than the filename, which causes filenames with weird initial characters. PDF viewers handle this differently (for example Foxit Reader just replaces forward slashes with spaces), but we think it's better to only show the filename as intended.
Additionally we add unit tests for the `getFilenameFromUrl` helper
function.
*A more robust solution for issue 6066.*
As a temporary work-around for (the upstream) [bug 1164199](https://bugzilla.mozilla.org/show_bug.cgi?id=1164199), we parsed *all* images in the Firefox addon during a short time.
Doing so uncovered an issue with our image handling (see 6066), for JPEG images with a `DeviceGray` ColorSpace *and* `bpc !== 1` (bits per component).
As long as we let the browser handle image decoding in this case, this isn't going to be an issue, but I do think that we should proactively fix this to avoid future issues if we change where the images are decoded (in `jpg.js` vs in browser).
Also, we currently don't seem to have a test-case for that kind of image data.
Currently the `C` entry in an outline item is returned as is, which is neither particularly useful nor what the API documentation claims.
This patch also adds unit-tests for both the color handling, and the `F` entry (bold/italic flags).
`Dict_getAll` is problematic for a number of reasons. First of all, as issue 6961 shows, it can be really bad for performance, since it dereferences all indirect objects.
Second of all, all the derefencing can lead to data being unncessarily requested when ranged/chunked loading is used, thus unnecessarily delaying rendering.
Note: For cases where `Dict_getAll` was previously used, `Dict_getKeys` in combination with `Dict_get` can be used instead. This has the advantage that data isn't requested until it's actually needed.
For the operators that we currently support, the arguments are not `Dict`s, which means that it's not really necessary to use `Dict_getAll` in `EvaluatorPreprocessor_read`.
Also, I do think that if/when we support operators that use `Dict`s as arguments, that should be dealt with in the corresponding `case` in `PartialEvaluator_getOperatorList` which handles the operator.
The only reason that I can find for using `Dict_getAll` like that, is that prior to PR 6550 we would just append certain (currently unsupported) operators without doing any further processing/checking. But as issue 6549 showed, that can lead to issues in practice, which is why it was changed.
In an effort to prevent possible issue with unsupported operators, this patch simply ignores operators with `Dict` arguments in `PartialEvaluator_getOperatorList`.
For the `CalGray`/`CalRGB`/`Lab` colour spaces, we're currently using `getAll` to retrieve the parameters. However that's not really necessary, since we may just as well explicitly `get` the needed parameters instead.
Some bad PDF generators, in particular "Scribus PDF", duplicates resources *a lot* at various levels of the PDF files. This can lead to `PartialEvaluator_hasBlendModes` taking an unreasonable amount of time to complete.
The reason is that the current code is using `Dict_getAll`, which recursively dereferences *all* indirect objects, which can be really slow. This patch instead uses `Dict_getKeys`, and then manually looks up only the necessary indirect objects.
I've added the PDF file as a `load` test. The most important thing here is probably to ensure that the file remains available in the repo, and the comment should help reduced the chance of regressions. (Note that locally, the `load` test times out without this patch, but we cannot really assume that that always happens.)
Fixes 6961.
The Chrome extension enforces that local files cannot be embedded in
non-local web pages. The previous check was too strict (because the
origin of a file:-URL is "null"), and prevented local PDF from being
viewed in local files).
This patch fixes that problem, by querying the actual tab URL via the
background page.
Steps to verify:
1. Create a HTML file: `<iframe src=test.pdf width=100% height=100%>`
2. Build and load the extension.
3. Allow file access to the extension at `chrome://extensions`
4. Open the HTML file from a file:// URL.
5. VERIFY: The extension should attempt to load the PDF file.
6. Now open the following (replace ID with the extension ID, which you
can find at `chrome://extensions`):
`data:text/html,<iframe src="chrome-extension://ID/file:///test.pdf">`
7. VERIFY: The next error should be displayed:
"Refused to load a local file in a non-local page for security reasons."
Reverts "Hack to avoid intermidiate Chrome failures during tests."
(2b2c521213).
require.js uses importScript asynchronously, which activates the worker
GC bug in WebKit. This patch works around a bug in a way that is similar
in the upcoming (but not yet released) require.js 2.1.23
The advantage of the new work-around is that it allows the runtime to
garbage-collect idle Workers.
References:
- https://crbug.com/572225
- https://webkit.org/b/153317
*This patch is based on something I noticed while debugging some of the PDF files in issue 6931.*
In a number of the cases in `setGState`, we're implicitly assuming that we're not dealing with indirect objects (i.e. `Ref`s). See e.g. the 'Font' case, or the various cases where we simply do `gStateObj.push([key, value]);` (since the code in `canvas.js` won't be able to deal with a `Ref` for those cases).
The reason that I didn't use `Dict_forEach` instead, is that it would re-introduce the unncessary closures that PR 5205 removed.
The intention of PR 5192 was to avoid adding empty `setGState` ops to the operatorList. But the patch accidentally used `>=`, which means that it's not actually working as intended, since empty arrays always have `length === 0`.
Even though the currently known test-cases render correctly without this patch, that seems more like a lucky coincidence, given that there's no guarantee that `transferMap[255] === 0` for every possible transfer function.
This patch fixes an issue that I inadvertently introduced in PR 5815, where we accidentally modify the `Differences` array in the encoding dictionary for indirect objects.
Instead of this change, we could also have used the now existing `Dict_getArray`. However in this case I don't think that would have been a good idea, since it would mean iterating through the array *twice*.
Currently the `deprecated` message is using `warn`, meaning that it's possible to disable warnings about deprecated API usage through the `PDFJS.verbosity` setting.
I don't think that it should be possible to opt out of deprecation messages,[1] since it might mean that in a custom deployment of PDF.js these messages could be overlooked, leading to PDF.js being broken (seemingly without any warning) when updating to a future version.
Obviously this could be considered the responsibility of the people doing custom PDF.js implementations, but in order to reduce the support burden later on, it seems better to "annoy" people upfront.
Compared to various `info`/`warn`/`error` messages, `deprecated` messages should be very simple to get rid of -- just update the API usage and the message goes away!
---
[1] In e.g. Firefox it doesn't seem possible to prevent deprecation warnings from being displayed (in the Browser Console).
@ -4,7 +4,7 @@ The issues are used to track both bugs filed by users and specific work items fo
If the issue is related to errors produced by a specific PDF, please always include the PDF by providing a URL where contributors can download it. Without a PDF for reproduction, such issues will be closed. We understand that many PDFs contain sensitive information, however having a PDF is essential to resolving the issue and building our regression testing suite. If possible, try creating a reduced example exhibiting the problem but not containing sensitive data. Also small PDFs are best suited for our regression testing. If an important issue only shows on sensitive PDFs, contributors might be willing to accept these PDFs via a secure exchange.
If the issue is related to errors produced by a specific PDF, please always include the PDF by providing a URL where contributors can download it. Without a PDF for reproduction, such issues will be closed. We understand that many PDFs contain sensitive information, however having a PDF is essential to resolving the issue and building our regression testing suite. If possible, try creating a reduced example exhibiting the problem but not containing sensitive data. Also small PDFs are best suited for our regression testing. If an important issue only shows on sensitive PDFs, contributors might be willing to accept these PDFs via a secure exchange.
The issue tracking system is designed to record a single technical problem. A bug report is something where a developer/contributor can work on. The GitHub issue tracker is not a good place for general, not well thought out or unworkable ideas. Most likely a discussion-type issue will not be addressed for a long time or closed as invalid. The best place is our dev-pdf-js@lists.mozilla.org mailing list. You can subscribe to it using http://lists.mozilla.org or Google Groups. This way you will reach not only developers. As an alternative, you can join our weekly engineering meeting to discuss new ideas for the project.
The issue tracking system is designed to record a single technical problem. A bug report is something where a developer/contributor can work on. The GitHub issue tracker is not a good place for general, not well thought out or unworkable ideas. Most likely a discussion-type issue will not be addressed for a long time or closed as invalid. The best place for general discussions is our #pdfjs IRC channel on irc.mozilla.org.
If you are developing a custom solution, first check the examples at https://github.com/mozilla/pdf.js#learning and search existing issues. If this does not help, please prepare a short well-documented example that demonstrates the problem and make it accessible online on your website, JS Bin, GitHub, etc. before opening a new issue or contacting us on the IRC channel -- keep in mind that just code snippets won't help us troubleshoot the problem.
If you are developing a custom solution, first check the examples at https://github.com/mozilla/pdf.js#learning and search existing issues. If this does not help, please prepare a short well-documented example that demonstrates the problem and make it accessible online on your website, JS Bin, GitHub, etc. before opening a new issue or contacting us on the IRC channel -- keep in mind that just code snippets won't help us troubleshoot the problem.
PDF.js is built into version 19+ of Firefox, however the extension is still available:
PDF.js is built into version 19+ of Firefox, however one extension is still available:
+ [Development Version](http://mozilla.github.io/pdf.js/extensions/firefox/pdf.js.xpi) - This version is updated every time new code is merged into the PDF.js codebase. This should be quite stable but still might break from time to time. This version is also reported to work when installed as extension in Seamonkey 2.39.
+ [Development Version](http://mozilla.github.io/pdf.js/extensions/firefox/pdf.js.xpi) - This extension is mainly intended for developers/testers, and it is updated every time new code is merged into the PDF.js codebase. It should be quite stable, but might break from time to time.
#### Chrome and Opera
+ Please note that the extension is *not* guaranteed to be compatible with Firefox versions that are *older* than the current ESR version, see the [Release Calendar](https://wiki.mozilla.org/RapidRelease/Calendar#Past_branch_dates).
+ The extension should also work in Seamonkey, provided that it is based on a Firefox version as above (see [Which version of Firefox does SeaMonkey 2.x correspond with?](https://wiki.mozilla.org/SeaMonkey/FAQ#General)), but we do *not* guarantee compatibility.
#### Chrome
+ The official extension for Chrome can be installed from the [Chrome Web Store](https://chrome.google.com/webstore/detail/pdf-viewer/oemmndcbldboiebfnladdacbdfmadadm).
+ The official extension for Chrome can be installed from the [Chrome Web Store](https://chrome.google.com/webstore/detail/pdf-viewer/oemmndcbldboiebfnladdacbdfmadadm).
*This extension is maintained by [@Rob--W](https://github.com/Rob--W).*
*This extension is maintained by [@Rob--W](https://github.com/Rob--W).*
+ Opera has also published an extension for their browser at the [Opera add-ons catalog](https://addons.opera.com/en/extensions/details/pdf-viewer/).
+ Build Your Own - Get the code as explained below and issue `gulp chromium`. Then open
+ Build Your Own - Get the code as explained below and issue `node make chromium`. Then open
Chrome, go to `Tools > Extension` and load the (unpackaged) extension from the
Chrome, go to `Tools > Extension` and load the (unpackaged) extension from the
directory `build/chromium`.
directory `build/chromium`.
@ -52,16 +53,19 @@ To get a local copy of the current code, clone it using git:
$ cd pdf.js
$ cd pdf.js
Next, install Node.js via the [official package](http://nodejs.org) or via
Next, install Node.js via the [official package](http://nodejs.org) or via
[nvm](https://github.com/creationix/nvm). If everything worked out, run
[nvm](https://github.com/creationix/nvm). You need to install the gulp package
globally (see also [gulp's getting started](https://github.com/gulpjs/gulp/blob/master/docs/getting-started.md#getting-started)):
$ npm install -g gulp-cli
If everything worked out, install all dependencies for PDF.js:
$ npm install
$ npm install
to install all dependencies for PDF.js.
Finally you need to start a local web server as some browsers do not allow opening
Finally you need to start a local web server as some browsers do not allow opening
PDF files using a file:// URL. Run
PDF files using a file:// URL. Run
$ node make server
$ gulp server
and then you can open
and then you can open
@ -73,10 +77,10 @@ It is also possible to view all test PDF files on the right side by opening
## Building PDF.js
## Building PDF.js
In order to bundle all `src/` files into two productions scripts and build the generic
In order to bundle all `src/` files into two production scripts and build the generic
viewer, issue:
viewer, run:
$ node make generic
$ gulp generic
This will generate `pdf.js` and `pdf.worker.js` in the `build/generic/build/` directory.
This will generate `pdf.js` and `pdf.worker.js` in the `build/generic/build/` directory.
Both scripts are needed but only `pdf.js` needs to be included since `pdf.worker.js` will
Both scripts are needed but only `pdf.js` needs to be included since `pdf.worker.js` will
@ -84,15 +88,21 @@ be loaded by `pdf.js`. If you want to support more browsers than Firefox you'll
to include `compatibility.js` from `build/generic/web/`. The PDF.js files are large and
to include `compatibility.js` from `build/generic/web/`. The PDF.js files are large and
should be minified for production.
should be minified for production.
## Using PDF.js in a web application
To use PDF.js in a web application you can choose to use a pre-built version of the library
or to build it from source. We supply pre-built versions for usage with NPM and Bower under
the `pdfjs-dist` name. For more information and examples please refer to the
[wiki page](https://github.com/mozilla/pdf.js/wiki/Setup-pdf.js-in-a-website) on this subject.
## Learning
## Learning
You can play with the PDF.js API directly from your browser through the live
You can play with the PDF.js API directly from your browser using the live
With the prebuilt or source version open `web/viewer.html` in a browser and the test pdf should load. Note: the worker is not enabled for file:// urls, so use a server. If you're using the source build and have node, you can run `node make server`.
With the prebuilt or source version open `web/viewer.html` in a browser and the test pdf should load. Note: the worker is not enabled for file:// urls, so use a server. If you're using the source build and have node, you can run `gulp server`.
"description":"Controls how external links will be opened.\n 0 = default.\n 1 = replaces current window.\n 2 = new window/tab.\n 3 = parent.\n 4 = in top window.",
"type":"integer",
"enum":[
0,
1,
2,
3,
4
],
"default":0
},
"disablePageLabels":{
"type":"boolean",
"default":false
},
"disableTelemetry":{
"title":"Disable telemetry",
"type":"boolean",
"description":"Whether to prevent the extension from reporting the extension and browser version to the extension developers.",
"default":false
},
"enhanceTextSelection":{
"type":"boolean",
"default":false
},
"renderer":{
"type":"string",
"enum":[
"canvas",
"svg"
],
"default":"canvas"
},
"renderInteractiveForms":{
"type":"boolean",
"default":false
},
"enablePrintAutoRotate":{
"title":"Automatically rotate printed pages",
"description":"When enabled, pages whose orientation differ from the first page are rotated when printed.",