Opasnet Base

From Testiwiki
Revision as of 20:56, 22 February 2012 by Jouni (talk | contribs) (commented)
Jump to: navigation, search



This page is about the Opasnet Base 2 -database.

Question

How to improve the existing Opasnet Base? Following issues of Opasnet Base 1 must be resolved:

  • Opasnet Base 1 structure makes filtering by location very slow on big data
  • Opasnet Base 1 structure is perhaps unnecessarily complex = queries are hard to adapt and read
  • MySQL is not ideal for storing huge amounts of data, or at least this is how we assume?
  • MySQL tables have fixed column types = difficult to store data objects with varying types of indices into one data table
  • Multiple languages (localization) not supported

Answer

MySQL is good on relations but weak on dynamic big data. Let's keep the basic "scaffold" in MySQL and store the big data (locations and results) into noSQL-base. After few days of research the best candidate for noSQL-base seems to be MongoDB. Combining relational MySQL and non-relational MongoDB will be the foundation for the new Opasnet Base 2.

Table structure in the database

All tables

acts
Uploads, updates, and other actions
Field Type Null Extra Key
id int(10) unsigned NO auto_increment PRI
type ENUM('initial','replace','append') NO
who varchar(255) NO
when timestamp NO
comments varchar(255) YES
lang char(3) NO ISO 639-2, default:'eng'

--1 : I guess this should have field obj_id to identify which object we are updating. In the previous version, act - obj was many-to-many relationship, but it can as well be many-to-one which makes life easier; it is not important to know that two variables were uploaded in the same model run. Also series_id and unit could be in this table. --Jouni 22:56, 22 February 2012 (EET)

acts_objs
Acts of an objects
Field Type Null Extra Key
act_id int(10) unsigned NO PRI
obj_id int(10) unsigned NO PRI
series_id int(10) unsigned NO
unit varchar(64) YES

2 : If comment #1 holds, this table should be removed. --Jouni 22:56, 22 February 2012 (EET)

objs
Object information (all objects)
Field Type Null Extra Key
id int(10) unsigned NO auto_increment PRI
ident varchar(64) NO UNI
name varchar(255) NO
type ENUM('variable','study','method','assessment','class','nugget','encyclopedia') NO
page int(10) unsigned NO
wiki_id tinyint(3) unsigned NO
inds
Indices
Field Type Null Extra Key
id int(10) unsigned NO auto_increment PRI
act_id int(10) unsigned NO UNI(with ident)
ident varchar(64) NO UNI(with act_id)
type ENUM('entity','number','time') NO
description varchar(255) NO
page int(10) unsigned NO
wiki_id tinyint(3) unsigned NO
order_index int(10) unsigned NO
hidden boolean NO false
wikis
Wiki information
Field Type Null Extra Key
id tinyint(3) NO PRI
url varchar(255) NO
wname varchar(255) NO

MongoDB

Tables to store all locations and results

db.<obj.ident>.dat

act_id (indexed), <loc.id>, <loc.id>, ..., <loc.id>, res

--# : I guess this is about loc.id's, not about ind.id's. Loc.id's are either identifiers explained below or values of continuous indices. --Jouni 22:56, 22 February 2012 (EET)

Tables to store real location values for entity type indices

db.<obj.ident>.locs

loc_id(indexed), inds.id, key, value

# : If the index is entity type, inds.id is the identifier of the index (whose details are in MySQL) and value is the name of the location. If the index is continuous, it does not need rows in this table, as everything necessary is described in the tables inds and db.<object.ident>.dat. --Jouni 22:56, 22 February 2012 (EET)

--# : What is key? --Jouni 22:56, 22 February 2012 (EET)

Rationale

See also