DjVuDocument provides convenient interface for opening, decoding and saving back DjVu documents in single page and multi page formats.
Notifications.
DjVuDocument provides convenient interface for opening, decoding and saving back DjVu documents in single page and multi page formats.Input formats It can read multi page DjVu documents in either of the 4 formats: 2 obsolete (old bundled and old indexed) and two new (new bundled and new indirect).
Output formats To encourage users to switch to the new formats, the DjVuDocument can save documents back only in the new formats: bundled and indirect.
Conversion. Since DjVuDocument can open DjVu documents in an obsolete format and save it in any of the two new formats (new bundled and new indirect), this class can be used for conversion from obsolete formats to the new ones. Although it can also do conversion between the new two formats, it's not the best way to do it. Please refer to DjVmDoc for details.
Decoding. DjVuDocument provides convenient interface for obtaining DjVuImage corresponding to any page of the document. It uses DjVuFileCache to do caching thus avoiding unnecessary multiple decoding of the same page. The real decoding though is accomplished by DjVuFile.
Messenging. Being derived from DjVuPort, DjVuDocument takes an active part in exchanging messages (requests and notifications) between different parties involved in decoding. It reports (relays) errors, progress information and even handles some requests for data (when these requests deal with local files).
Typical usage of DjVuDocument class in a threadless command line program would be the following:
GString file_name="/tmp/document.djvu"; GP<DjVuDocument> doc=new DjVuDocument; doc->init(GOS::filename_to_url(file_name)); int pages=doc->get_pages_num(); for(int page=0;page<pages;page++) { GP<DjVuImage> dimg=doc->get_page(page); // Do something };Comments for the code above
- Since the document is assumed to be stored on the hard drive, we don't have to cope with DjVuPorts and can pass ZERO pointer to the init() function. DjVuDocument can access local data itself. In the case of a plugin though, one would have to implement his own DjVuPort, which would handle requests for data arising when the document is being decoded.
- In a threaded program instead of calling the init() function one can call start_init() and stop_init() to initiate and interrupt initialization carried out in another thread. This possibility of initializing the document in another thread has been added specially for the plugin because the initialization itself requires data, which is not immediately available in the plugin. Thus, to prevent the main thread from blocking, we perform initialization in a separate thread. To check if the class is completely and successfully initialized, use is_init_ok(). To see if there was an error, use is_init_failed(). To know when initialization is over (whether successfully or not), use is_init_complete(). To wait for this to happen use wait_for_complete_init(). Once again, all these things are not required for single-threaded program.
Another difference between single-threaded and multi-threaded environments is that in a single-threaded program, the image is fully decoded before it's returned. In a multi-threaded application decoding starts in a separate thread, and the pointer to the DjVuImage being decoded is returned immediately. This has been done to enable progressive redisplay in the DjVu plugin. Use communication mechanism provided by DjVuPort and DjVuPortcaster to learn about progress of decoding. Or try dimg->wait_for_complete_decode() to wait until the decoding ends.
- See Also: DjVuFile, DjVuImage, GOS.
Initialization As mentioned above, the DjVuDocument can go through several stages of initialization. The functionality is gradually added while it passes one stage after another:
- First of all, immediately after the object is created init() or start_init() functions must be called. Nothing will work until this is done. init() function will not return until the initialization is complete. You need to make sure, that enough data is available. Do not call init() in the plugin. start_init() will start initialization in another thread. Use stop_init() to interrupt it. Use is_init_complete() to check the initialization progress. Use wait_for_complete_init() to wait for init to finish.
- The first thing the initializing code learns about the document is its type (BUNDLED, INDIRECT, OLD_BUNDLED or OLD_INDEXED). As soon as it happens, document flags are changed and notify_doc_flags_changed() request is sent through the communication mechanism provided by DjVuPortcaster.
- After the document type becomes known, the initializing code proceeds with learning the document structure. Gradually the flags are updated with values:
- DOC_DIR_KNOWN: Contents of the document became known. This is meaningful for BUNDLED, OLD_BUNDLED and INDIRECT documents only.
- DOC_NDIR_KNOWN: Contents of the document navigation directory became known. This is meaningful for old-style documents (OLD_BUNDLED and OLD_INDEXED) only
- DOC_INIT_OK or DOC_INIT_FAILED: The initializating code finished.
Initializing thread
In a single-threaded application, the start_init() function performs
the complete initialization of the DjVuDocument before it returns.
In a multi-threaded application, though, it initializes some internal
variables, requests data for the document and starts a new
initializing thread, which is responsible for determining the
document type and structure and completing the initialization
process. This additional complication is justified in the case of
the DjVu plugin because performing initialization requires data and
in the plugin the data can be supplied by the main thread only.
Thus, if the initialization was completed by the main thread, the
plugin would run out of data and block. Stages of initialization
Immediately after the start_init() function terminates, the
DjVuDocument object is ready for use. Its functionality will
not be complete (until the initializing thread finishes), but
the object is still very useful. Such functions as get_page()
or get_djvu_file() or id_to_url() may be called
before the initializing thread completes. This allows the DjVu
plugin start decoding as soon as possible without waiting for
all data to arrive. To query the current stage of initialization you can use
get_doc_flags() function or listen to the
notify_doc_flags_changed() notifications distributed with the help
of DjVuPortcaster. To wait for the initialization to
complete use wait_for_complete_init(). To stop initialization
call stop_init(). Querying data
The query for data is done using the communication mechanism
provided by DjVuPort and DjVuPortcaster. If port
is not ZERO, then the request for data will be forwarded to it.
If it is ZERO then DjVuDocument will create an internal
instance of DjVuSimplePort and will use it to access local
files and report errors to stderr. In short, if the document
file is stored on the local hard disk, and you're OK about reporting
errors to stderr, you may pass ZERO pointer to DjVuPort
as DjVuDocument can take care of this situation by itself. The URL
Depending on the document type the url should point to:
Contrary to start_init(), which just starts the initialization
thread in a multi-threaded environment, this function does not
return until the initialization completes (either successfully or
not). Basically, it calls start_init() and then
wait_for_complete_init().
To wait for the initialization to complete use
wait_for_complete_init() function. To query the initialization stage use get_flags() function. To learn whether initialization was successful or not,
use is_init_ok() and is_init_failed(). Note: In a single threaded application the initialization
completes before the init() function returns.
See is_init_complete() and wait_for_complete_init()
for more details.
See is_init_complete() and wait_for_complete_init()
for more details.
Note: To check the stage of the document initialization
use get_flags() or is_init_complete() functions. To
wait for the initialization to complete use wait_for_complete_init().
For single threaded applications the initialization completes
before the init() function returns.
Note: To wait for the initialization to complete use
wait_for_complete_init(). For single threaded applications
the initialization completes before the init() function
returns.
Note: The pointer returned is guaranteed to be non-ZERO
only after the DjVuDocument learns its type (passes through
the first stage of initialization process). Please refer to
init() for details.
The function tries it best to map the page number to the URL.
Although, if the document structure has not been fully discovered
yet, an empty URL will be returned. Use wait_for_complete_init()
to wait until the document initialization completes. Refer to
init() for details. Depending on the document format, the function assumes, that there
is enough information to complete the request when:
Depending on the document format, the function starts working
properly as soon as:
Depending on the document format the translation is done in the
following way:
For INDIRECT documents the URL is obtained by
appending the name of the found file to the URL of
the directory containing the document.
If information obtained by the initialization thread is not
sufficient yet, the
Depending on the document type, the information is sufficient when
Negative page_num has a special meaning for the old indexed
multipage documents: the DjVuDocument will start decoding of the
URL with which it has been initialized. For other formats page
-1 is the same as page 0. DjVuDocument can also connect the created page to the specified
port before starting decoding. This option will allow
the future owner of DjVuImage to receive all messages and
requests generated during its decoding. If this function is called before the document's structure becomes
known (the initialization process completes), the DjVuFile,
which the returned image will be attached to, will be assigned a
temporary artificial URL, which will be corrected as soon as enough
information becomes available. The trick prevents the main thread
from blocking and in some cases helps to start decoding earlier.
The URL is corrected and decoding will start as soon as
DjVuDocument passes some given stages of initialization and
page_to_url(), id_to_url() functions start working
properly. Please look through their description for details. Note: To wait for the initialization to complete use
wait_for_complete_init(). For single threaded applications
the initialization completes before the init() function
returns.
First of all the function checks, if the ID contains a number.
If so, it just calls the get_page() function above. If ID is
ZERO or just empty, page number -1 is assumed. Otherwise
the ID is translated to the URL using id_to_url().
The behavior becomes different, though in the case when the
document structure is unknown at the moment this function is called.
In this situations it invents a temporary URL, creates a
DjVuFile, initializes it with this URL and returns
immediately. The caller may start decoding the file right away
(if necessary). The decoding will block but will automatically
continue as soon as enough information is collected about the
document. This trick should be quite transparent to the user and
helps to prevent the main thread from blocking. The decoding will
unblock and this function will stop using this "trick" as soon
as DjVuDocument passes some given stages of initialization and
page_to_url(), id_to_url() functions start working
properly. If dont_create is FALSE the function will return the file
only if it already exists. Note: To wait for the initialization to complete use
wait_for_complete_init(). For single threaded applications
the initialization completes before the init() function
returns.
First of all the function checks, if the ID contains a number.
If so, it just calls the get_djvu_file() function above. If ID is
ZERO or just empty, page number -1 is assumed. Otherwise
the ID is translated to the URL using id_to_url(). If dont_create is FALSE the function will return the file
only if it already exists.
Note: It may happen that the returned DataPool will
not contain all the data you need. In this case you will need
to install a trigger into the DataPool to learn when the
data actually arrives.
As described in start_init(), for multi-threaded applications the
initialization is carried out in parallel with the main thread.
This function blocks the calling thread until the initializing
thread reads enough data, receives information about the document
format and exits. This function returns true if the
initialization is successful. You can use get_flags() or
is_init_complete() to check more precisely the degree of
initialization. Use stop_init() to interrupt initialization.
Plugin Warning. This function will read contents of the whole
document. Thus, if you call it from the main thread (the thread,
which transfers data from Netscape), the plugin will block.
If force_djvm is TRUE then even one page documents will be
saved in the DJVM BUNDLED format (inside a FORM:DJVM); Plugin Warning. This function will read contents of the whole
document. Thus, if you call it from the main thread (the thread,
which transfers data from Netscape), the plugin will block.
Plugin Warning. This function will read contents of the whole
document. Thus, if you call it from the main thread (the thread,
which transfers data from Netscape), the plugin will block.
Depending on the document's type, the meaning of where is:
ZERO will also be returned if the initializing thread has not
learnt enough information about the document (DOC_DIR_KNOWN has
not been set yet). Check is_init_complete() and init()
for details.
ZERO will also be returned if the initializing thread has not
learnt enough information about the document (DOC_DIR_KNOWN has
not been set yet). Check is_init_complete() and init()
for details.
enum DOC_TYPE
DjVuDocument(void)
void start_init(const GURL & url, GP<DjVuPort> port=0, DjVuFileCache * cache=0)
port - If not ZERO, all requests and notifications will
be sent to it. Otherwise DjVuDocument will create an internal
instance of DjVuSimplePort for these purposes.
It's OK to make it ZERO if you're writing a command line
tool, which should work with files on the hard disk only
because DjVuDocument can access such files itself.
cache - It's used to cache decoded DjVuFiles and
is actually useful in the plugin only. static GP<DjVuDocument> create( const GURL &url, GP<DjVuPort> xport=0, DjVuFileCache * const xcache=0)
static GP<DjVuDocument> create( GP<DataPool> pool, GP<DjVuPort> xport=0, DjVuFileCache * const xcache=0)
static GP<DjVuDocument> create( ByteStream &bs, GP<DjVuPort> xport=0, DjVuFileCache * const xcache=0)
void stop_init(void)
void init(const GURL & url, GP<DjVuPort> port=0, DjVuFileCache * cache=0)
bool is_init_complete(void) const
bool is_init_ok(void) const
void set_needs_compression(void)
bool needs_compression(void) const
bool needs_rename(void) const
bool can_compress(void) const
bool is_init_failed(void) const
int get_doc_type(void) const
long get_doc_flags(void) const
bool is_bundled(void) const
GURL get_init_url(void) const
GP<DataPool> get_init_data_pool(void) const
Accessing pages
int get_pages_num(void) const
GURL page_to_url(int page_num) const
int url_to_page(const GURL & url) const
GURL id_to_url(const char * id) const
GP<DjVuImage> get_page(int page_num, bool sync=true, DjVuPort * port=0)
sync - When set to TRUE the function will not return
until the page is completely decoded. Otherwise,
in a multi-threaded program, this function will
start decoding in a new thread and will return
a partially decoded image. Refer to
wait_for_complete_decode() and
is_decode_ok().
port - A pointer to DjVuPort, that the created image
will be connected to. GP<DjVuImage> get_page(const char * id, bool sync=true, DjVuPort * port=0)
GP<DjVuFile> get_djvu_file(int page_num, bool dont_create=false)
GP<DjVuFile> get_djvu_file(const char * id, bool dont_create=false)
virtual GP<DataPool> get_thumbnail(int page_num, bool dont_decode)
bool wait_for_complete_init(void)
int wait_get_pages_num(void)
DjVuFileCache* get_cache(void) const
Saving document to disk
GP<DjVmDoc> get_djvm_doc(void)
void write(ByteStream & str, bool force_djvm=false)
void expand(const char * dir_name, const char * idx_name)
idx_name - - Name of the top-level file containing the document
directory (basically, list of all files composing the document).virtual void save_as(const char where[], const bool bundled=0)
GP<DjVmDir> get_djvm_dir(void) const
GP<DjVmDir0> get_djvm_dir0(void) const
GP<DjVuNavDir> get_nav_dir(void) const
virtual bool inherits(const char * class_name) const
Alphabetic index HTML hierarchy of classes or Java