February 24, 2007

Copyrighting Databases

Here is a brief summary of the discussion about a GNU license for databases/dictionaries, and a primarily conclusion out of it.

“You cannot copyright databases in the US AFAIK. There was a case about a phone dictionary” “why would we regard some dictionaries' definitions as better than others? There is not a single, correct definition of any English word”.
“A dictionary requires as much work as a phone book and isn't a very creative process” “The amount of work isn't important, it's about the creativity. Writing all those definitions in the dictionary requires creativity, so you get copyright on the dictionary”.
“you cannot copyright the name + number in that phone book, since that is considered a ‘fact’.”  
“a list (database) of genomes for a bunch of species isn't copyrightable either”  

Yes, phone numbers, contact info, genomes, are definitely facts. In this case, I don’t claim the its content is subject to copyright, but maybe its design.

By contrast, natural languages, so far, don't have a structured architecture. Most people believe that its impossible or very difficult to put them in a structured shape.

So, here, in this case, in a dictionary case, when you build a bilingual dictionary - not just a wordlist -, in fact, you're trying to convert something unstructured to a structured thing. You're trying to (create) and (innovate) 'structuring standards', and structuring contents according to these standards.

So, the procedures and the activities that make databases subject to copyright are:

  • Normalization: Designing UML/ERD.
  • Structuring something unstructured: Word definitions are not “Facts”.
Posted at 11:11 PM | Comments?
Categories: Business, Internationalization