Tripal Multi-Chado Extension module =================================== [Tripal Multi-Chado](http://www.drupal.org/project/tripal_mc) CONTENTS OF THIS FILE --------------------- * Introduction * Requirements * Installation * Configuration * Database Permissions * Example Use-cases * Use-case 1: Setup a public and a private Chado instance * Use-case 2: Add a private instance to your existing public instance * Use-case 3: Access a same instance with different user credentials * Use-case 4: Use multiple public instances with different data * Features * API for developpers * Companion module * Known Issues * Troubleshooting * Perspectives * Maintainers INTRODUCTION ------------ This module enables the use of several Chado schema inside a same database used by Drupal+Tripal. By default, Drupal is installed in the "public" schema of the website PostgreSQL database and Tripal can install Chado in the "chado" schema. The Multi-Chado extension can be used to install or use more than one Chado instance across different schemata and it also enables the use of different PostgreSQL database credentials allowing the administrator to do fine tuning of database accesses. REQUIREMENTS ------------ * [Entity API](http://www.drupal.org/project/entity) * [Entity Reference](http://www.drupal.org/project/entityreference) * [Tripal >= 7.x-2.1](http://www.drupal.org/project/tripal) or Tripal 7.x-2.x-dev released after November 2016. Earlier versions of Tripal do not integrate required hooks. Tripal Multi-Chado v1.x works with Tripal v2.x while Tripal Multi-Chado 2.x works with Tripal v3.x. * PostgreSQL Drupal user account "search_path" set to "public" without "chado" or with "chado" coming after "public". ex.: ALTER USER drupal_user SET search_path = public; or ALTER USER drupal_user SET search_path = public,chado; or ALTER USER drupal_user SET search_path = "$user",public,chado; ... To figure out your current "search_path", you can issue the PostgreSQL query: SHOW search_path; INSTALLATION ------------ * Install as you would normally install a contributed Drupal module. See [doc](http://drupal.org/documentation/install/modules-themes/modules-7) for further information. * Enable the module "Tripal Multi-Chado" on menu page "Admin menu > Site building > Modules" in section "Tripal Extensions". URL path: /admin/modules CONFIGURATION ------------- Note: See also "Example uses-cases" section for detailed uses. * Add new Chado connections on the settings page and associate them to some of your Drupal users: Menu: "Tripal > Extensions > Tripal Multi-Chado" URL path: admin/tripal/extension/tripal_mc * Current (active) Chado schema name can be displayed as a block: this module adds a block called "Chado schema". You can place it on your site from the Drupal block management interface: Menu: "Structure > Blocks" URL path: admin/structure/block * If needed, configure permissions to allow other users to manage Chado connections. * Chado content of each instance must be managed by a Drupal admin account that is different from your default Drupal admin account. Each of these accounts must be associated to only one Chado connection to the Chado instance to manage. Therefore, you will have to add a new Drupal account for each Chado instance you add, with the appropriate permissions (ie. "Administer Tripal", "Install Chado",...). DATABASE PERMISSIONS -------------------- If you want to use different PostgreSQL users with the Multi-Chado module, you may have to manage some of the database privileges at the PostgreSQL level. There are some basic rules to keep in mind: 1. each PostgreSQL user will need to be allowed to create objects in the database but the database owner can be someone else. 2. each PostgreSQL user will need to be granted full privileges on the 'public', 'genetic_code', 'frange' and 'so' schemata. 3. each object created by any of these users in the 'public', 'genetic_code', 'frange' and 'so' schemata should be accessible by the others. 4. however, any schema that is a Chado instance can be restricted to a limited list of users that can exclude the Drupal default PostgreSQL user account. It means that the Drupal default PosgresSQL user may not be allowed to access to a private Chado schema at the PostgreSQL level and when it tries to access to the private schema, it will get an error message like: "ERROR: permission denied for schema " If you mess up with the PostgreSQL privileges here are some sample (ordered) SQL queries to restore them assuming 'tripal_database' is the Drupal database name, 'drupal' is the PostgreSQL user account used by default by Drupal and 'employee' and 'employee2' are PostgreSQL user accounts for private Chado data stored in the schema 'chado_private'. You will need to connect to your 'tripal_database' database using a PostgreSQL admin account. $ ALTER DATABASE tripal_database OWNER TO employee; $ ALTER SCHEMA chado OWNER TO drupal; $ ALTER SCHEMA chado_private OWNER TO employee; $ GRANT ALL PRIVILEGES ON SCHEMA chado_private TO employee2; $ REVOKE employee,employee2 FROM drupal; $ GRANT drupal TO employee,employee2; $ GRANT CREATE ON DATABASE tripal_database TO drupal,employee2; $ REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA chado_private FROM drupal,public CASCADE; $ REVOKE ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA chado_private FROM drupal,public CASCADE; $ REVOKE ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA chado_private FROM drupal,public CASCADE; $ ALTER DEFAULT PRIVILEGES FOR ROLE drupal,employee,employee2 IN SCHEMA public GRANT ALL ON TABLES TO drupal; $ ALTER DEFAULT PRIVILEGES FOR ROLE drupal,employee,employee2 IN SCHEMA public GRANT ALL ON SEQUENCES TO drupal; $ ALTER DEFAULT PRIVILEGES FOR ROLE drupal,employee,employee2 IN SCHEMA public GRANT EXECUTE ON FUNCTIONS TO drupal; $ GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO drupal; $ GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO drupal; $ GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public TO drupal; If you have an 'employee3' account with an exclusive use on schema 'chado3': $ GRANT drupal TO employee3; $ ALTER SCHEMA chado3 OWNER TO employee3; $ REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA chado3 FROM drupal,employee,employee2,public CASCADE; $ REVOKE ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA chado3 FROM drupal,employee,employee2,public CASCADE; $ REVOKE ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA chado3 FROM drupal,employee,employee2,public CASCADE; $ ALTER DEFAULT PRIVILEGES FOR ROLE employee3 IN SCHEMA public GRANT ALL ON TABLES TO drupal; $ ALTER DEFAULT PRIVILEGES FOR ROLE employee3 IN SCHEMA public GRANT ALL ON SEQUENCES TO drupal; $ ALTER DEFAULT PRIVILEGES FOR ROLE employee3 IN SCHEMA public GRANT EXECUTE ON FUNCTIONS TO drupal; EXAMPLE USE-CASES ----------------- ### Use-case 1: Setup a public and a private Chado instance You would like to setup a Tripal site with both a public and a private Chado database so you won't have to configure twice the things (web server, Drupal modules, user accounts,...). But you have a big concern about your data privacy and you want to make sure a hacker could not easyly break into your website and access your private data. 1. Configure PostgreSQL database First of all, you will have to configure your PostgreSQL database in order to have at least to different user accounts. One account will have access to both public and private data while the other will be restricted to only public data. Your public Chado data will be stored in the default 'chado' schema while the private Chado data will be stored in another schema, let's say for instance 'chado_private'. Since both account will be used by Drupal to run the webiste, they will both require access to the 'public' schema. Log on with your favorite PostgreSQL interface as admin: 1. Create a 'public' user: $ CREATE USER drupal WITH PASSWORD ''; 2. Create a 'private' user: $ CREATE USER employee WITH PASSWORD ''; $ GRANT drupal TO employee; -- later allows 'employee' to access public data (genetic_code, frange, so schemata for instance); 3. Create a new database for Drupal/Tripal: we don't want 'drupal' account to do whatever it wants on the database: $ CREATE DATABASE tripal_database OWNER employee; 'drupal' account will need to create chado schema (and genetic_code,etc.): $ GRANT CREATE ON DATABASE tripal_database TO drupal; **IMPORTANT:** Disconnect from PostgreSQL and reconnect to the *new* database 'tripal_database' *or* use the command "\c tripal_database". 4. Make sure privileges are setup correctly: $ ALTER DEFAULT PRIVILEGES FOR ROLE drupal,employee IN SCHEMA public GRANT ALL ON TABLES TO drupal,employee; $ ALTER DEFAULT PRIVILEGES FOR ROLE drupal,employee IN SCHEMA public GRANT ALL ON SEQUENCES TO drupal,employee; $ ALTER DEFAULT PRIVILEGES FOR ROLE drupal,employee IN SCHEMA public GRANT EXECUTE ON FUNCTIONS TO drupal,employee; The following 3 lines are only required if the database is not empty: $ GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO drupal,employee; $ GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO drupal,employee; $ GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public TO drupal,employee; 2. Setup Drupal system 1. Install and configure your Drupal website using the PostgreSQL account 'drupal'. You must NOT use the 'employee' account yet. 2. Download and expand the Entity API (entity) and entityreference (Entity Reference) modules: [Entity API](http://www.drupal.org/project/entity) [Entity Reference](http://www.drupal.org/project/entityreference) 3. Download and expand Tripal 7.x-2.x-dev archive: go to your Drupal module directory and type: # git clone https://github.com/tripal/tripal.git Note: at the time this document is being written, Tripal 2.1 is not out and only the dev version has the required features. 4. Download and expand Tripal Multi-Chado 7.x-1.0-RC1 archive found at: http://www.drupal.org/sandbox/guignonv/2429515 Note: at the time this document is being written, Tripal Multi-Chado has not been released as an official Drupal extension. 5. Enable the base modules Use Drush or go to Drupal module administration interface Menu: Modules URL path: admin/modules and enable the following modules: (Drush names) [x] Entity API (OTHER section) (entity) [x] Entity Reference (FIELDS section) (entityreference) [x] Views (VIEWS section) (views) [x] Chaos tools (Chaos tool suite section) (ctools) [x] PHP filter (Core section) (php) [x] Tripal Core (TRIPAL section) (tripal_core) Save and then enable: [x] Tripal Multi-Chado (TRIPAL EXTENSIONS section) (tripal_mc) 6. Create Private data admin account Go to Drupal administration interface and add a second administrator user that will be used to administrate Chado private data. Menu: People > +Add user URL path: admin/people/create Give that user the administrator role (or a role with all Tripal and Tripal Multi-Chado permissions). For convenience, we will call that account 'admin_private'. 3. Setup private connection 1. Start to create a new private connection Go to Tripal Multi-Chado administration interface and add a new connection. Menu: Tripal > Extensions > Tripal Multi-Chado > +Add connection URL path: tripal_mc_connection/add 2. Fill connection settings Name of the Chado connection: Private Data User login: employee Password & Password confirmation: Chado schema: chado_private Associated users: admin_private (use auto-complete) Connection description: This connection can access to private data. 3. Save your connection. 4. Initialize your Chado data 1. Install Chado public schema Menu: Tripal > Setup Tripal > Install Chado Schema URL path: admin/tripal/setup/chado_install Select your schema version and click on the 'Install/Upgrade Chado' button. 2. Run Tripal cron # drush --root= --username=admin trp-run-jobs or # drush @your-site-alias --username=admin trp-run-jobs IMPORTANT: where 'admin' stands for your default (public) Drupal admin account. 3. Install Chado private schema IMPORTANT: Logout from Drupal and login as 'admin_private'. Menu: Tripal > Setup Tripal > Install Chado Schema URL path: admin/tripal/setup/chado_install Select your schema version and click on the 'Install/Upgrade Chado' button. 4. Run Tripal cron # drush @your-site-alias --username=admin_private trp-run-jobs IMPORTANT: where 'admin_private' stands for your Drupal admin account for private data! 5. Remember about access rights As stated in PostgreSQL documentation: > "By default, users cannot access any objects in schemas they do not own." source: http://www.postgresql.org/docs/9.4/static/ddl-schemas.html Since the schema "chado_private" has been created using the 'employee' account no other user can access to this schema and its data. Therefore, the default PostgreSQL user account used by Drupal won't have access to the private data stored in chado_private. Even if an SQL injection attack is found in any part of a Drupal extension and used or if the default schema of a query can be altered, data stored in chado_private schema can't be queried that way. 5. Load your data Load your data as usual but with one difference: * always use the default Drupal admin account on both the web interface and drush to load and work with (sync) public data. * always use 'admin_private' account on both the web interface and drush to load and work with (sync) private data. You will have to manage public and private data separately which means for instance that you will need to synchronize your data one time for public data and one time for private data (2 synchronisation porcesses to run separately with different admin accounts). Have a look to 'Database permissions' section above. 6. Manage user access To allow a Drupal user to access to your private data, you just have to edit the Chado private connection created before (step 3) Menu: Tripal > Extensions > Tripal Multi-Chado URL path: admin/tripal/extension/tripal_mc and add the user to the Chado connection "Associated users" list. You can use the button "Add another item" if you need to add more than one user at a time, then just save. The next time your Drupal user will log into Drupal, he/she will see the private data. To revoke access to private data from a user, just edit the private connection again, delete the user name from the list (empty the line) and save. ### Use-case 2: Add a private instance to your existing public instance You have a running Tripal site with a public Chado database but you would like to use the same configuration for some private Chado data that you would like to share with some partners (but not the rest of the world). 1. Configure PostgreSQL database You will need a second PostgreSQL account for your private data if you don't already have one: reproduce step 1.2 of use-case 1 if needed. Then you will need to restrict public access to public data for more security. Therefore you will change the database ownership in order to prevent the public account to access/alter private data. However, we still need to let the public account access/alter public data. $ ALTER DATABASE tripal_database OWNER employee; $ GRANT CREATE ON DATABASE tripal_database TO drupal; Then follow use-case 1 steps 1.4. 2. Setup Drupal system If you don't have already installed the Entity API and/or the Entity Reference modules, reproduce use-case 1 step 2.2. Since the Tripal module is already enabled, you don't need to If you have a too old version of Tripal, you will have to update it. See use-case 1 step 2.3 to get the last Triapl version 2. Then follow use-case 1 step 2.5 and enable modules that have not been enabled yet. Follow use-case 1 step 2.6. 3. Setup private connection Reproduce use-case 1 step 3. 4. Load your private data Two cases: 1. you have a dump, 2. you don't and will load your private data in a blank 'chado' schema. 1. If you have a dump of the 'chado' schema (only) of your private data, you can load it into the database in a different schema name that you can call for instance 'chado_private'. Refer to PostgreSQL documentation on how to dump a single schema and load it into a database. Trick: if you don't know how to change the 'chado' schema name of your dump, you can rename you public 'chado' schema during the private data loading. Then you rename your private 'chado' data schema in something else and put back the public 'chado' schema name: -- before loading, rename public schema: $ ALTER SCHEMA chado RENAME TO chado_public; -- load private data into a 'chado' schema $ \i chado_private_dump.sql -- adjust schema names $ ALTER SCHEMA chado RENAME TO chado_private; $ ALTER SCHEMA chado_public RENAME TO chado; From here, we assume your private data has been loaded into a schema named 'chado_private'. 2. You don't have a dump. Reproduce use-case 1 steps 4.3, 4.4 and 5. 6. Manage user access Same as use-case step 6. ### Use-case 3: Access a same instance with different user credentials You have one Chado instance but you want to restrict access to database objects through PostgreSQL permission system. For instance you want to only allow SELECT on some Chado tables to the public account but allow some other users to alter any Chado data. Then data alteration on Chado data through Tripal and the public account will fail (error message). You can also forbid table access to the public account. This will lead to errors being displayed on the Tripal interface when using the public account (but nothing will go wrong on the Chado side and the errors can be ignored). Refer to PostgreSQL [GRANT](https://www.postgresql.org/docs/9.0/static/sql-grant.html) documentation for detailed use of PostgreSQL permissions. The process will be exactly the same as use-case 1 except that you will have your own step 1 and in step 3.2, the schema name of the new private Chado connection will remain 'chado' instead of 'chado_private'. ### Use-case 4: Use multiple public instances with different data You want to serve several Chado instances that are public but you want to use a same site with the same users. Some of your users will see one instance, some other will see other instances. Maybe you can even implement the hook "hook_tripal_mc_get_user_connection" (see "API FOR DEVELOPPERS" section) to provide a dropdown that allows your users to switch from one instance to another. 1. Configure PostgreSQL database Just keep your Drupal database as is! 2. Setup Drupal system Follow use-case 1 step 2.1 to 2.5 For step 2.6, create as many admin account you need but at least one for each instance. 3. Setup other connections For each instance: 1. Create a new connection Go to Tripal Multi-Chado administration interface and add a new connection. Menu: Tripal > Extensions > Tripal Multi-Chado > +Add connection URL path: tripal_mc_connection/add 2. Fill connection settings Name of the Chado connection: User login: drupal (ie. your Drupal PostgreSQL user account) Password & Password confirmation: Chado schema: Associated users: (use auto-complete) Connection description: 3. Save your connection. 4. Initialize your Chado data, load your data and manage user access It works the same as use-case 1 step 4, 5 and 6 but replace "private" by each of your instance names and repeat the process for each instance separately. FEATURES -------- This modules provides support for token and rules modules. Available rule events: * All events related to entities (creation, view, update,...). * A new Tripal Multi-Chado connection is used: when the current Multi-Chado connection is being changed to another connection. IMPORTANT NOTE: this event can't be triggered before the rules module is initialized! Therefore, you can't use it at page initialization time. Instead, use "Drupal is initializing" event with a condition described below. Available rule conditions: * Compare current connection mccid (entity identifier) with a given one * Compare current connection title with a given one Available rule actions: * Load current (active) Chado connection: loads the Multi-Chado connection object currently used by Drupal on current page rendering. * Load current user available Chado connections: returns an array of Multi-Chado connection object that are available to current user. API FOR DEVELOPPERS ------------------- Available hooks from Tripal Multi-Chado: * `function hook_tripal_mc_sync_tables_alter(&$sync_tables)`: Allows other Tripal extensions to add their custom tables to the tables managed by the Multi-Chado module across schemata. It can also be used to prevent the Multi-Chado module from handling some Chado tables. * `function hook_tripal_mc_get_user_connection($tripal_mc_user)`: Allows other Extension modules to choose which Chado connection to activate. By default, the Multi-Chado module uses the first Chado connection associated to a user or the default one but this hook allows to associate connection a different way. For instance, in a specific context, a specific connection could be used. You can think of an interface letting the user select which Chado connection he/she would like to activate during his/her session (unless he/she activate a different one during the session and so on...). You can also have a look to api/tripal_mc.api.inc documentation for available functions. The Multi-Chado connection entity has the following properties: * mccid: its unique internal identifier in Drupal database; * title: the Chado connection name; * machine_name: the Chado connection machine unique name (lower case & dashes); * host: the host name (eg. chado.server.com); * port: the host port on which Postgre runs (usually 5432); * username: the Postgre user name to use to connect to the specified database; * password: the password assoictaed to the given username; * password_decrypted: a boolean field telling if the password field contains an encrypted password (FALSE) or a decrypted password (TRUE); * database: the name of the Postgre database where Chado is stored; * schemaname: name of the schema in which Chadp resides (eg. chado_priv); * description: administrative description of the connection used by the for administrator for his/her management tasks; * uid: user idenfifier of the user who created (owns) the connection. * field_tripal_mc_au (associated_users): a localized array of array of Drupal user objects associated with the connection or an empty array. COMPANION MODULE ---------------- Tripal Multi-Chado Extension module contains a companion module called Tripal Multi-Chado Selector. The default behavior of Tripal Multi-Chado Extension module is to associate to a given user a given connection. While it is possible to associate several connections to a single user, only the first associated connection will be used and the others will be discarded. This companion module provides a Drupal block that allows users to select which connection they wish to activate inside their set of associated connections. To use this feature, just enable the module and place its block somewhere. KNOWN ISSUES ------------ * Materialized views are not supported yet. * On the content administration page, if no Chado node is found, a fake node will appear instead of a line telling no nodes where found. It's a workarround for a technical issue. * When installing a new Chado instance with a different PostgreSQL account from the Drupal one, the owner of schemata "frange", "genetic_code" and "so" is changed. TROUBLESHOOTING --------------- To see what Chado schema is active, use the dedicated block "Chado schema" (see "Configuration" section). To test Drupal user connection when admin and see which Chado instance is enabled, use the ["Masquerade" module](http://www.drupal.org/project/masquerade) If you see some SQL errors on a white screen when you switch to a new Chado connection, check the following: 1. try to connect to your PostgreSQL database using an other software using the credentials of your new Chado connection (eg. 'psql' command line). 2. make sure you are allowed to connect to your PostgreSQL database from the web server running Drupal with the credentials of your new Chado connection. 3. make sure the PostgreSQL user has access to the 'public' schema. See "Database Permissions" section. 4. make sure the 'search_path' of the PostgreSQL user account used is correct See "Requirements" section. PERSPECTIVES ------------ I plan to add feature module support to this module (Chado connection export). If people request it, I might integrate this module with the [Context module](http://www.drupal.org/project/context) About having Chado stored in different database, that would require to synchronize the 'public' schema across databases and it would be way too complicated. So I don't think I will support that feature while it might be feasible. MAINTAINERS ----------- Current maintainer: * Valentin Guignon [vguignon](http://www.drupal.org/user/423148)