Jiafei Wen and Xiaolong Wu*
Department of Computer Engineering and Computer Science, California State University Long Beach, USA
Received Date: July 28, 2014; Accepted Date: August 16, 2014; Published Date: August 18, 2014
Citation: Wen J, Wu X (2014) Implementation of a Collaborative Document Processing in the Cloud. J Comput Sci Syst Biol 7:174-179. doi: 10.4172/jcsb.1000153
Copyright: © 2014 Wen J, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Computer Science & Systems Biology
Document processing is one of the most widely used and well developed With the recent fast development of high-speed internet and distributed computing, it is possible to move document processing to web-based, and even cloud-based. The initial benefits of moving office documents into cloud for small and medium sized are business cost saving for buying, maintaining, and upgrading both software and hardware. However, the most significant advantage of doing is to enable users of real-time collaborative editing on a shared cloud-based document. Therefore, moving office applications into cloud is an inevitable trend for the development of office application. A novel efficient document-processing model (DPC) in the cloud is proposed. Detailed description and functioning of this model is briefly discussed first in this paper. Next, we implemented the DPC model in the Google cloud through the Google App Engine. Our cases testing verified the proposed DPC model enabling users to process their office document collaboratively by a proper granularity in cloud.
Office document processing; Cloud; Collaborative editing
Today the office documents can be extremely sophisticated, which may contain complex formulas, graphic illustrations, video clips, and control information [1-3]. Correspondingly, the office document processing has evolved into large, complex and powerful applications based on user requirements [4-6]. Among them, scientific paper is a good example, in which researchers need to jointly develop and refine a document in a collaborative way. To address the requirements of collaborative editing, many different software systems have been developed, some commercial and some academic. Since an exhaustive review of such systems is beyond the scope of this paper, we refer reads to read articles [7,8]. The requirements of collaborative teams can be much more difficult than those for standalone software.
As a result, the first problem is the costs that are incurred as for purchasing, maintaining, and upgrading required software and hardware. The second problem is more system failures and human mistakes when the application becomes more complex.
In order to satisfy the rapidly changing user demands and maintain the market, office document processing developers continue upgrading their products by constantly adding new functions and features. These upgrades cost people on not only software but also hardware, since upgrades demand more storage space and faster CPU. Hence, people will have to continue to invest on both software and hardware in order to support their routine document production and processing.
In our previous research , we have stated that cloud computing technology  can be an alternative approach to address this problem, since it enables services and storage facilities to be provided over the Internet and allows users to access the services and storage facilities through the Internet. Services provided by the cloud can be a web application, and office document processing is a suitable candidate for it. With the web application of office document processing, users can create, edit, and share their documents without installing a complex software suite locally. In this case, user can expect to save thousands of dollars on both hardware and software. Besides, user also can put more focus on the creative work based on the latest document processing application, regardless of the cost of upgrading for software and hardware.
Until now, there are two companies, Google and Microsoft, provide office document processing as services in the cloud [5,7]. Details of these applications in the cloud will be introduced in the background section of this paper. However, in our previous research, we discussed that neither of them provides the collaboration with a proper granularity of collaborative editing on shared document. They both implemented the collaboration without dealing with different logical objects respectively. In this case, multiple users can edit one sentence, even one word, on the shared document concurrently, which will bring confusion and disorder among users. To address this problem, our previous research proposed a Document Processing Model in the Cloud (DPC model). The DPC model enables users to process their office document collaboratively with a proper granularity in the cloud, which will be introduced briefly in the introduction.
The main purpose of this paper is to describe the implementation of the proposed DPC model. By the implementation, office document processing will become a web application available in the cloud with a proper granularity of the collaborative editing. Users are able to perform their document processing work through browsers without installing the office document processing application on their own computer.
As mentioned before, Google and Microsoft offer office document processing through the cloud. Google Docs is a free, web-based office suite, and data storage service offered by Google. It allows users to create and edit documents online while collaborating in real-time with other users. Microsoft Office 365 is commercial software plus services offering a set of products from Microsoft Corporation. Office 365 includes the Microsoft Office suites of desktop applications and hosted versions of Microsoft’s Server products, delivered and accessed over the Internet. Both of these two applications claim to emphasize the support of collaboration of document editing among users. Through our testing and evaluation, we consider that the collaborative editing of these two products do not have a proper granularity. For example, Google Docs enables multiple users to edit one shared document online, and it allows different users to edit one sentence, even one word, concurrently. However, in Google docs, even though users are notified where the change happened, they are not notified of the content of any changes, since changes are not highlighted by Google docs.
As shown in Figure 1, when a user is typing the word of “paragraph” in the second line, another user begins to type characters at the same place as the first user. In this case, the first user will be confused by the characters, since there is no visual difference between the characters he typed and the character typed by others. The first user only is noticed that there is another user editing this sentence but he does not know what the change is. This issue of confusion and disorder may become even worse when more users are working on a shared document. The reason for this issue is the editing operation does not base on different objects respectively. The content of the document is processed as one single object. We also found that the same problem exists in Microsoft office 365 (Figure 1).
To address this problem and enable users to process their office document collaboratively by a proper granularity in the cloud, our previous research proposed DPC model, which will be introduced briefly in the next section. Similar to Google Docs, the implementation of the DPC model described in this paper build a web application of it and deploy it as a “software as a service” in the cloud.
The DPC model is Document Processing in the Cloud model. It is object-oriented based on XML logical structure . It treats editable components in a document as distinct objects, and it gives users respective access to each object. As a result, multiple users working on a shared document can do collaborative editing based on distinct objects in real time. Such mechanism provides a more logical granularity for document processing collaboration, since the DPC model processes the content of a document as logical objects rather than treating it as a string stream.
Defined in the DPC model, firstly, a whole document will be divided into thirteen objects listed in Table 1, based on which the whole document is divided. DPC Objects includes nine composites object which include basic object, and four basic objects which are atomic. Each of the DPC objects cannot be edited by more than one user at any time (Table 1).
After being divided into DPC objects, the whole document becomes a unit of DPC objects. Each DPC objects is a unit of work distribution, which will be sent to processors in the cloud. The combination of all units is the entire document. User will be led to the target object in the cloud to finish their editing work. After all editing works finish, DPC objects will be collected and combined to form the final result document. In order to get a complete utilization in the cloud environment, DPC model also defined eight formulas as follow: (Table 2)
The detailed information about these formulas is introduced by . It is worthwhile to note that in formula six, the DPC model use ACCESS_PATH, described by XPath , to indicate the different objects after division and to lead users to the target objects they want to edit. Figure 2 shows DPC objects after division with its ACCESS_ PATH in the cloud. Since the XPath of each node is unique in XML document, it can be identifiers of DPC objects in the cloud (Figure 2).
As defined by formula one in the Chapter three, on the first level of the DPC model, there are three main components which are DOC, Middleware, and Processors. Accordingly, there are three corresponding main parts in the implementation (Figure 3).
As shown in Figure 3, the DOC part includes the source document which is going to be uploaded to the cloud, and the result document which is downloaded from the cloud. In the cloud environment, middleware includes the parser, the manager, and the combiner. The workflows of the parser and the manager are shown in Figures 4 and 5. The workflow of the combiner is to combine the results from manager according to the XPath of each result piece, which is similar to the reverse direction of the parser workflow.
As shown in Figure 4, after receiving a source document, the parser records the XPath of each node in the source document. Then, the parser divides the source document into pieces based on the DPC model, after which the parser packs each piece with its corresponding XPath into DPC objects. Finally, the parser sends the DPC objects to the manager. We use ISO 29500 format document as an example to illustrate this process. An ISO 29500 document is described by several XML documents, so actually it is a collection of XML documents (Figure 5). In Figure 5, docx format is based on ISO 29500, which contains a “[Content_Types].xml” document and two file folders named “_rels” and “word” respectively. In the file folder named “word”, there are several XML documents constituting the main content of the example document.
In order to divide the example document to draw out DPC objects easily, the parser creates a new XML document to contain all of the individual XML documents in the source document. By doing so, it is easy to record the XPath of each DPC object, and it is also easy for keeping the integrity of DPC objects. In the new XML for integrating all XML documents in the source document, the parser marks each XML document by its own title, such as “workbook.xml”, used to mark the piece from workbook.xml. After this step of integration, the source document is as shown in (Figure 6).
We use Extensible Style sheet Language Transformations (XSLT) technology to integrate these individual documents. XSLT is a XMLbased language used for the transformation of XML documents , which is used to generate a new XML document without changing the original XML document it based on. For example, Figure 7 shows the main steps of the Stylesheet of XSLT for integration, through which all the XML documents in the source document are integrated into one XML document. As mentioned before, after integration, the source document is as shown in Figure 6 as one single XML document.
Through the result document of integration, the parser records each node’s XPath. For example, the <document.xml> node’s XPath is “xxx.docx/word.xml/document.xml”.
After recording the XPath, the parser divides the result document based on the specification of DPC and then draws out the corresponding DPC objects. If a user wants to edit the content of a paragraph, he must have access to that object. Obviously, there will be some nodes which do not belong to any DPC object. These nodes will also be sent to the manager as backup.
As shown in Figure 8 the manager processes two kinds of inputs. The first kind of input consists of user requests and the second consists of DPC objects. If the input consists of DPC objects, the manager will determine whether these DPC objects have been saved. If the input has not been saved which means this document is uploaded into the cloud for the first time, the manager will save them in its storage. Then, manager will send the DPC objects to corresponding processors. If the input has been saved which means those DPC objects come from processors rather than from the parser, the manager will refresh the DPC Objects saved before according to the input. DPC objects in storage are used to display the whole document to users, so they need to be refreshed in time.
If the input consists of user request, the manager will check whether the request is coming from the owner of the target document or not. The owner of the document, who uploaded the document, decides which user can edit the document. The owner needs to send invitations to users who will be allowed for collaborative editing. Then those users will be authorized for such editing once they log in. If it is coming from the owner, the manager will display the whole document on the browser. If it is not, the manager will validate the user firstly, and then the browser will display the document after validation. If the user wants to edit the uploaded document, the request will indicate which part in the document the user wants to edit. By such request, the manager will check whether the object in the document is available for editing or not. If it is available, the manager will authorize user to edit. Otherwise, the user needs to wait in line for the target object. Since the document is saved on demand, the manager will refresh the display periodically.
The implementation of the DPC model in this paper is deployed in the Google Cloud through Google App Engine . Google App Engine is a cloud computing platform providing platform as service and hosting web application in the Google-managed data centers. The application implemented the DPC model will be enabled by it as a web application hosted in the Google-managed data centers [20-22].
In this section, we designed test cases to test uploading, editing by a single user, and collaborative editing by multiple users functions of the DPC model implementation. Four main test cases and their results are described in the following paragraphs respectively. These test cases run on the browser of Firefox 7.0 for functional testing of black box testing (Figure 9).
|Test Case I:||Upload the example document.|
|Purpose:||Upload the example document into the cloud through the implementation of the DPC and display its content on the browser.|
|Test data:||Docx document with two paragraphs.|
|Test steps:||Start the application of the implementation of the DPC by a browser; select the example document; click upload button;|
|Screenshot:||Shown in Figure 9|
As tested in Test Case I, the paragraphs in black fonts are the content of the example document, and they are also editable paragraphs on the browser. Users can edit them after login by clicking the content, as described in the Test Case II. When a user clicks the paragraph, the paragraph will change to an editable field (Figure 10).
|Test Case II:||Edit the example document on the browser.|
|Purpose:||Edit one paragraph of the example document displayed on browser.|
|Test data:||Example document uploaded by test case one.|
|Test steps:||Click the first paragraph and edit the content.|
|Screenshot:||Shown in Figure 10.|
When a paragraph is under editing, it cannot be edited by other users through other browsers. If a user clicks the paragraph which is under editing, he will receive the information saying “another user is editing this paragraph”, as described in the Test Case III (Figures 11 and 12).
|Test Case III:||Edit a paragraph which is under editing.|
|Purpose:||Edit a paragraph which is under editing by using different user name. The browser will give the invalid information to user.|
|Test data:||Example document in the cloud.|
|Test steps:||Two users login by two different user names through two different browsers; One user clicks the first paragraph, and then the other user also clicks the first paragraph.|
|Screenshot:||Shown in Figure 11.|
|Test Case IV:||Two users edit two paragraphs of one shared document through different browsers at same time.|
|Purpose:||Collaborative editing through different browsers.|
|Test data:||Example document in the cloud.|
|Test steps:||Login the application through two browsers by different user names; Click the first paragraph in one browser; Click the second paragraph in another browser. Two paragraphs can be edited at same time through different browser.|
|Screenshot:||Shown in Figure 12.|
Moving office document processing into the cloud is a trend in IT industry, since it not only help user save cost on their software and hardware upgrading, but also enable user to do collaborative editing on a shared document through internet. Our previous research has proposed the DPC model for efficient office document processing in the cloud. In this paper, we introduce the implementation of the DPC model, which is deployed in the Google cloud through Google App Engine. The implementation works well and the results confirm that DPC model provides a proper granularity of collaborative editing by eliminating editing confusion and disorder among users.