February 24, 2007

Copyrighting Databases

Here is a brief summary of the discussion about a GNU license for databases/dictionaries, and a primarily conclusion out of it.

“You cannot copyright databases in the US AFAIK. There was a case about a phone dictionary” “why would we regard some dictionaries' definitions as better than others? There is not a single, correct definition of any English word”.
“A dictionary requires as much work as a phone book and isn't a very creative process” “The amount of work isn't important, it's about the creativity. Writing all those definitions in the dictionary requires creativity, so you get copyright on the dictionary”.
“you cannot copyright the name + number in that phone book, since that is considered a ‘fact’.”  
“a list (database) of genomes for a bunch of species isn't copyrightable either”  

Yes, phone numbers, contact info, genomes, are definitely facts. In this case, I don’t claim the its content is subject to copyright, but maybe its design.

By contrast, natural languages, so far, don't have a structured architecture. Most people believe that its impossible or very difficult to put them in a structured shape.

So, here, in this case, in a dictionary case, when you build a bilingual dictionary - not just a wordlist -, in fact, you're trying to convert something unstructured to a structured thing. You're trying to (create) and (innovate) 'structuring standards', and structuring contents according to these standards.

So, the procedures and the activities that make databases subject to copyright are:

  • Normalization: Designing UML/ERD.
  • Structuring something unstructured: Word definitions are not “Facts”.
Posted at 11:11 PM | Comments?
Categories: Business, Internationalization

September 12, 2006

The Core IT Dictionary

I've just uploaded a simple English-Arabic IT dictionary. Here's an express overview about it and what're the key ideas beyond it.

The Theory of Context-Based Dictionary

Primary language: Primary Key

Primary key = English term + Context flags + Grammar flags

So for, we have to deal with English as a (primary language). But considering a better solution for a multilingual dictionary project we should think of a standard and non-exceptions language; something like Esperanto.

Arabic term field:

I used to think that there're no (Synonyms), so that the relationship between a term and its corresponding one in Arabic (or any other language) should be One-to-One.

But by time, I noticed that every standalone term can't give you a one strict meaning without (metadata), which I call it in my dictionary (flags). It's just like atom-kernel: It doesn't represent or embody the material spirit without their electrons.

Thus, and once you identify each term as (term + flags), you can think of One-to-One relationships.

(LPI) field

I've added a field called (LPI?), in order to distinguish between three kinds of terms:

  • LPI.org: for core LPI glossary, according to the official LPI website.
  • LPI+: proposed terms.
  • Linux: Linux-related terms, but it's not a part of LPI terms. In fact, the majority of these terms are GUI-related terms, which LPI is not interested in it, and I hope it will stay like this in the future! :)
  • Blank: out of Linux focus.

Why LPI terms?

I believe that LPI subjects focus on the strict core of standard Linux issues.

And once you agree with me that Linux is going to be the standard IT environment in the future, you will realize that these are the core IT terms!

The License

  1. Since I don't think that there's a perfect license for databases/dictionaries so far,
  2. And considering that I've added a field called (Explanation) from LPI glossary page http://www.lpi.org/en/glossary.html (in the old CMS)
  3. aside from the lack of a dynamic-enough solution for collaborating,

I decided to publish it under the terms of common creative license.

Posted at 3:17 PM | Comments?
Edited on: April 03, 2007 1:42 PM
Categories: Internationalization

August 17, 2006

GNU Dictionary

Last year, Arabeyes group received a request from Wikitionary administrator for appending Arabeyes's dictionary to Wikitionary project, and a very interesting discussion started in the Arabeyes mailing list.

Here we have a GPL dictionary, and we want to insert it in an FDL project, can we do it?

This was the main question at that time, but this simple question, will lead to several important questions:

  1. Which is more suitable for a dictionary: GPL or FDL?
    And:
  2. What is Dictionary considered originally: a software component or a text?
    And once you think that it's a database, neither a software component nor a text, you might say:
  3. Which is more suitable for a database: GPL/LGPL or FDL?
    And finally:
  4. What's the basic difference between GPL and FDL?

Simply, I think that the basic difference is that FDL focuses on the material as a printed material, but not as a software library.

While GPL/LGPL discuss the library issues, but not the printed material issues.

Technically, what's dictionary?

By simple analysis, you can realize that dictionary basically, is not a code; it's something real, you can print it, publish it for human readers. But in the same time, in practical, it's a (library), that could be understood and used by software applications, so that they can employ it, and capitalize on it, for their own sake.

Simply: it's a printable library.

And actually, this is database!

Now take a look to this simple matrix:

IPM - The Intellectual Products Matrix
Printable Library License Collaboration system
Software x y GPL/LGPL CVS
Text y x FDL HTTP/Wiki
Databases y y ?? ??
Arts (Images) y y CC x
Arts (Media) x y CC x

So, we need:

  • A GNU license for Databases/Dictionaries, which considers: Library issues, and printing issues.
  • A powerful and agile collaboration system which can really capitalize on the power of databases.

The License:

It should cover all the potential usages of a dictionary:

  • A software dictionary.
  • A printed dictionary
  • Using its glossary in the commercial/noncommercial media materials.

The collaboration system:

I think that appending contributions via POP3/SMTP client mail is the best way for a collaborative database project. And here's why:

Massive appending No web interface required
HTTP/Wiki x x
SMTP/SQL y y

So the matrix will be like this:

IPM - The Intellectual Products Matrix
Printable Library License Collaboration system
Software x y GPL/LGPL CVS
Text y x FDL HTTP/Wiki
Databases y y GDL (GNU Dictionary/Database License) SMTP/SQL
Arts (Images) y y CC x
Arts (Media) x y CC x
Posted at 1:20 PM | Comments?
Categories: Internationalization