[[Image:Skdb.png]]


''apt-get for real stuff!''


Much of this discussion has taken place in irc.freenode.net's #hplusroadmap, as well as on following mailing lists: [http://groups.google.com/group/openmanufacturing-dev openmanufacturing-dev], [http://groups.google.com/group/openmanufacturing open source manufacturing], and [http://groups.google.com/group/openvirgle openvirgle]. Please see also [http://heybryan.org/exp.html another explanation at exp.html], [http://oscomak.net/ oscomak.net], etc. This page (as of 2008-12-09) is terribly outdated and may never recover. Much of the relevant information is spewn across the #hplusroadmap logs and mailing list discussions that it may be hopelessly recoverable. Take this as a lesson for documentation during design phases of complex projects ...


Engineering knowledge is learned information about how the world works and ways to make it do what you want or need. Social or societal knowledge is the set of all knowledge in a society. Therefore, societal engineering knowledge is a society's knowledge about how to make the world do what it wants.  This is part of the literary construct involved in the negotiation of reality constructs.


Specialized social networks seem to be easier to create and gain momentum under amplifying effect of the internet. We can aggregate the collective effort of hundreds of individuals to build a large knowledge base, but making use of that knowledge is difficult because it has no structure or well-defined relation to other knowledge.


The '''s'''ocietal engineering '''k'''nowledge '''d'''ata'''b'''ase ([[skdb]]) is a group of programs designed to aggregate, store, present, and process clusters of engineering knowledge, described as 'packages' in analogy to linux software management tools like [http://en.wikipedia.org/wiki/Advanced_Packaging_Tool APT], yum, [http://cpan.org/ CPAN], rpm, etc.. These project packages might include simulation and visualization tools for tweaking, development tools to rewrite data about the package, explicit well-documented links to other projects, etc. The success of APT and friends comes not from any magical software intelligence, but rather in the wide-spread social diffusion and easy accessibility: the ability to easily get new components, and just as easily throw them away; users can play with them, see how the work, and implement changes on the spot. 


Physically building projects will become much easier when we have [[fabbers]] like [http://reprap.org RepRap], [http://fennetic.net/machines/hextatic hextatic], [[molecular nanotechnology]] etc., but in the mean time we have [[hu]]mans who can track down materials and tools (by following detailed instructions) to make the particular [[blackbox]] that they need for a project. The point to remember is that computers cannot pull together apparently unrelated concepts to make a new functioning whole, but there are many humans eager for an opportunity to contribute just this sort of knowledge.


Skdb started as a method for finding a feasible [[self-replication]] process: the idea is to come up with as much social knowledge (facts that can't be derived from first principles) as possible, throw it in a pot, connect all of the possibilities together, and fish out the closed cycles.


Users must know how to explain the functionality they want in generally agreed-upon language, in order to search for existing projects and avoid duplicated effort.


Imagine the possibilities for project design now: using a database, you can download different components and just add it to the project - if it doesn't exist yet leave a few comments and hope that somebody can come along and make something to meet your desired specifications. Just leave the &quot;project definition files&quot; up on the net somewhere, and somebody is bound to come around and finish the job. As long as we can directly reference other projects, tools, and people that have done the same thing, the barrier to entry will stay low.


Part of the database may contain [[biobricks]]: genes that can be combined to do a particular task. Although it is not computationally tractable to simulate or predict all possible interactions between components in an [[amorphous computing]] system, you can specify well understood mechanisms that have been shown to work, and use techniques that limit crosstalk to narrow the search space.


= Getting started =

* General file specification standard

* General client (hack Debian APT?)

** fenn's suggested name for the technology distribution: [[autogenix]].

* Server daemon process, git presentation layer

* Need to get a number of servers to start hosting copies of the database

* Seed the database with a number of functional parts, components, tools, etc.

** Basic: [[lemon battery]], [[paper airplane]], ...

** Advanced: [[airplane]], [[automobile]], [[spaceship]], [[computer]] ([[microprocessor]] -- see [http://opencores.org OpenCores]), ...

* Connect the pieces together into functional configurations with &quot;circuit builders&quot; like [[BioBench]], [http://tinkercell.com tinkercell], [http://celldesigner.org celldesigner.org], etc.

* '''Note''' that it would be ''nice'' to start with the [[elements]] and specify from the ground up where to go get them (such as the data on [http://mindat.org/ mindat.org]), and then how to process them with the materials around there, but we ''must'' avoid the infinite regress in trying to find everything back down to the source because there's a world of resources and materials already available (though poorly mapped) that can be used to boostrap various projects. So each project, component, part, tool is more like a node that aggregates information about that specific object/concept/idea, while also allowing an interface for simulation or 'making' a final project, or specifications on how to make it if it's generally not available; i.e., this is all 'blackboxed' but with information on what to do and what approaches to take if something is unavailable, not localized, etc.


= todo? =

* skdb

** ikiwiki

** git

** later: torrent distribution services

*** seed all packages

*** be a tracker to organize all of the seeds and leechers for each of the packages

* metarepo website

** This is simply where we describe the overall system architecture; might be doable in a meta namespace on the ikiwiki installation.

* userspace

** autospec - validates the units for the packages, validates the yaml, gets additional python plugins if necessary when the yaml specifies some weird class that is not yet installed.

*** Are the 'python plugins' (that specify additional classes) to be provided in package (.skdb) files? tentatively: yes. Then why wouldn't this be specified in the metadata for the skdb file as a 'dependency'? 

** agx-get - gets packages from the database, gets all metadata (local user cache in ~/.skdb/mimedb/ or something), allows the user to search for packages, uses a sources.list type file to connect to skdb or other metarepo installations, torrent functionality would be nice, and must be able to process the metadata files and give the user options to download particular parts (or everything) related to a .skdb file -- this will typically involve downloading a new skdb package in the first place in order to understand the skdb file in its entirety (for example, different types of packages will most likely have different types of metadata specified, different types of resources, with data worth parsing on the user-end for options related to seeking out the package-contents).

*** disambiguator - lets user specify what sort of file formats need to be processed, what the input/outputs have to be, and finds puzzle pieces that match these specs

** agx-sim - the simulator

*** Packages (.skdb) would describe how they are to be simulated, if at all. I am sure somebody would like to come up with a standardized simulation interface. There are a few problems (computationally) that have to be addressed, like whether or not it should be a binary sort comparison, n!, or something else. Hard to tell, but the important aspect is that the architecture allows for this sort of thing. 

** agx-make - considering fablab facilities (/dev stuff), configuration, layout, orientations, inventory, number of humans available, etc., generate an overall recipe for the entire lab to execute and make the product ('go command' should be avoidable via passing a parameter)

*** Generating the programming for the machinery will be somewhat high-level: this would not be an immediate compiling down to &quot;turn the servo&quot; sort of code, but rather something that each of the machines with their pre-installed firmware can understand. 

*** Generating the programming in the first place: run through each of the (.skdb) files in the overall design (itself a .skdb), load up the modules geared towards interpreting these variations on (.skdb) (i.e., some have particular data formats that can only be parsed in certain ways), assemble a list of all code (within the packages) that is to be called/evaluated. The code within each of the packages should 'call a standard API' to make things happen. This API will abstract the actual, physical hardware away from the (.skdb) python scripts so that a variety of configurations can be used without having to go in by hand and making hundreds of versions of (.skdb) internal python scripts. 

**** standard-API: all (.skdb) scripts will know how to call specific functions and actions of the machinery, and then the 'print server' will figure out the relationship between those functions and the actual drivers and /dev/ while also providing garbage collecting (so that wires aren't tripped over and so on).

**** Alternative to the standard-API idea: what if the python scripts in (.skdb) files printed out code, and this code could then be compiled for the specific fablab config that the user has?

* Note: whether or not a bug report is from agx-sim, agx-make, or autospec is irrelevant; all of this data should be fed back into skdb. Especially user-reports, bug reports, usecases, feedback, etc.


= Unit testing =

All packages are required to have their own method of doing 'unit tests' to test each of the features. In a project, the programmer should be able to call upon an overall test of all of the units working together (pipelined and whatnot).


= OSCOMAK =

2008-04-06: I see no differences between this project and [http://www.kurtz-fernhout.com/oscomak/index.htm OSCOMAK]. -- [[User:Kanzure|Kanzure]] 18:16, 6 April 2008 (CDT)


= 2008-04-06: File format =

* 2008-04-06: fenn suggested using GNU 'units' (the program) for dealing with unit conversions between different programs.


''caution: massive bryan text block below''

You generally cannot do a single program that can do everything. Same with file formats. You do not want OSCOMAK/skdb to become a giant object-oriented database of bureaucracy. Instead, you need to be able to take creative license at all of the different nodes/packages, and from there take things where you need it to go. Each package will be a wiki page, much like the CPAN pod entries for perl modules. These entries would specify where to get information and what packages of PDFs to read (or links and pages cached from the web, anything really). Then, there will also be a readout of common programs that are used with that package or relations to other programs. This would be like apt-cache-search, except a full ontology can be given, plus guides for understanding and interpreting the ontologies so that people can find the software that they need. It may be possible to one day have automated software interpret and parse the ontologies to get to the programs they need, but in the mean time we can just let people assemble the software packages together. This is much like &quot;coming to terms with the authors [of a paper]&quot; (a quote I found in a paper I was reading a few weeks ago re: how to read papers). Once a programmer has been sufficiently introduced into a topic, he can then proceed to immediately take advantage of the available software that is mentioned in the package file. Therefore, the file format is more like a way to organize information and knowledge and mention where it was discovered, references, links, ways to get ahold of the responsible individuals, as well as an easy way to query for related software, and then allowing the person to interpret the (and I emphasize) ''natural language'' descriptions of the software packages in order to find the component he needs. These software packages will do various things, depending on the package. For example, in [[computational fluid dynamics]] there are software packages that can do things like [[finite element analysis]] or [[fluid flows]] etc. Those would be examples of programs to add. Their relation to the subject is that they are given a [[mesh file]] and then they take this data and apply a set of specifications [there does not seem to be a standardized format for this yet, so this is perhaps a poor example, but whatever], and then do a simulation, and the simulation can be automatically interpreted if necessary (i.e., does it fall within the mission parameters for the use of the overall system that the programmer is designing?). 


I was worrying about dependency-specification for a while. You would have to specify the fundamental variables that each package is using. This can still be included, a special type of file format (and then other programs can fill in the blanks or do special things to come up with exact materials, if the package is for some specialized subset of all possible materials). But it's not the central focus. The central focus is an easy way to access and query and download (1) societal knowledge and (2) a way to download the software. As part of #1 would be information on how the software in #2 is relevant. So, there should be a querying system (apt-get / [[autogenix]]). The system to download specific &quot;knowledge-sets&quot; (kind of like data-sets) should involve a basic hierarchical navigation system for exploring different perspectives of the ontology of the entire database. There should not be only one, massive encompassing presentation of all of the software packages -- rather, different ontologies should be tagged and accessible by searching for such tags, thus finding possible entrances into the social knowledge by using the right keywords (it's amazing what I have learned over the users by using Google and reading to find new keywords to use ...). On this note, [http://heybryan.org/projects/autoscholar/ AutoScholar] is an excellent example of an interface system too, except that it's geared towards downloading papers and is not necessarily associated with the relevant software. Actually, what if we developed a website that would serve as a ghost-overlay to the scientific publication journals and databases? This would just mean that we categorize the papers within their respective fields, show the social network as well as conceptual network and advancements and so on, and then we provide the relevant software. The knowledge is already there. Just needs to be locally cached and aggregated, and digested into new forms (thus the wiki part).


Theoretically, we can add specifications to a package format to describe the input files and types of input parameters to a software package, so really the 'units' idea should still be applicable. This is a way of using interconnecting software packages so that the flow is not interrupted or lost in the process. Just note that it's not the main function of the system. -- [[User:71.41.149.62|71.41.149.62]] 08:51, 7 April 2008 (CDT)


= 2008-04-08: Thoughts on the structure =

There are two main components that I am particularly concerned about: (1) formalized knowledge in the form of data sets, units, specifications on file formats, mathematical models stored in data files, etc., and then (2) the social knowlede that we process and convert into #1. All that we really need to do is figure out how we want #2 and #1 to work, and then create a certain type of (wikiable) file format to represent a project/unit in the overall database. It does not matter where on the internet the files are, as long as the user can automatically fetch them from the shell. It would be ideal if the database can be mirrored on other locations. It would also be ideal if we can do CVS+wiki all at the same time. What would the namespace structure of the wiki look like? There would be a namespace for package files, and then what? A namespace for the different types of files? It is important to clearly separate &quot;natural language&quot; from everything else. -- [[User:Kanzure|Kanzure]] 15:54, 8 April 2008 (CDT)


tags tags tags! -fenn

: Sure. -- [[User:71.41.149.62|71.41.149.62]] 08:43, 9 April 2008 (CDT)

:: See [http://debtags.alioth.debian.org/ debtags] and a [http://wiki.debian.org/DebTags wiki page on debtags]. -- [[User:Kanzure|Kanzure]] 10:35, 13 April 2008 (CDT)


= 2008-04-09: A pretty good bet on what the start can be =

Okay, I think I know what to start with. The trick is the file formats and specifications for each type of file format. You know the websites where they list file format types and what programs might be used to work with them? Same thing here. Autogenix would be used to download new meta-information on file formats for various programs, with backlinks and references to the programs in the local database. 

''i think what we're really interested in here is the semantic content structure of the files, the format is just a.. formality. even if there is a list of programs that work with the files, it won't help an automated system to make automated decisions or take action on those files, if there is no semantic metadata about what the files are good for. ''-fenn


Yesterday I was figuring that an interesting way to implement metadata would be to include a new way to specify file format IO in the parameters for programs. For example, here's the conventional approach to telling people about the parameters to your program (below). This may be known as '''shell wrapping''' (what I want). What's currently out there are [http://perldoc.perl.org/Pod/Usage.html usage statements (pod2usage here)]. Yeah, the pod2usage module is not enough, it's not formal enough, it's pretty basic and based off of natural language processing (look at the examples on that page) -- instead, we need to formalize this and have a database for managing file formats and such information. For Ruby, [http://cmdline.rubyforge.org/ the cmdline tool] might be the solution - but it looks like more natural language stuff. Okay -- so I can't find any specific examples of it hardcoded, but it's everywhere, and it's not formally identified as such (I had to go start the Wikipedia article on [http://en.wikipedia.org/wiki/Usage_message usage messages]. But a good example might be [http://curl.haxx.se/docs/manpage.html cURL's help page] or [http://en.wikipedia.org/wiki/Manual_page_(Unix) any man page] really. 


What would a more formal API for specifying IO file format types? You would have to reference a centralized database, maybe a [[DTD]] [http://en.wikipedia.org/wiki/DTD] (what's that about?). So this will end up being a simple library or module to include while programming, and then instead of hardcoding some natural language to specify the input parameters, you reference the database, and then you would output a formal file when somebody passes the -h or --help parameter to your program. Plus you'd submit an entry to the database so that there's a reference to your program. You would also tag and write a description of the program (which would have its own file format, a simple XML-like method). The calls to the API should be simple. They should not be runtime calls, but rather compile-time calls --- the idea is that you construct the XML output specifying acceptable IO formats that the current program uses, right? So in this sense it may be silly to do it at compile-time since you may as well open up another file and just make a new file for specifying the data types expected to be passed (this is not the same thing as &quot;error, expected integer and got (noninteger) object instead&quot; - there are certain integer numbers that can be of a certain type, and bash and other shells are not 'typed' systems where you see &quot;error, wget output is not the same as cat output&quot; or whatever). So this would be a secondary tool to run when compiling your projects -- something that could easily be called in a makefile. This does not solve the code-documentation problem ([http://heybryan.org/bookmarking.html]) [not that it was supposed to in the first place]. 


So once we have the file format database, we can then start making up our specifications for each of the file formats plus let [[agx-get]] download new scripts to interpret new versions of the files whenever necessary. Then, we just throw up the packages on the wiki, with file format specifications at the top (kind of like the XHTML version specification number at the top of w3c valid HTML docs). The wiki does not need separate namespaces for different aspects of packages, just solid, explicit, meta-information on the relationship between two pages, i.e. the page [[file.pack]] might cite some reference information which can be downloaded at [[filepack-information.html]] -- and instead of just interpreting the file extension (the previous method), read the file version/format specs to go lookup software that can handle that sort of data. When we want to compress a package and all of its contents, we can just read the [[file.pack]] page and then scan its references and 'links' to other packages and download it all, so it's not like everything will be all over the place, as long as we do routine backups and mirroring etc. -- [[User:71.41.149.62|71.41.149.62]] 08:43, 9 April 2008 (CDT)


= 2008-04-13: ikiwiki =

For the wiki+versioning system, see [http://ikiwiki.info/ ikiwiki] and [http://ikiwiki.info/rcs/git/ git]. Now on to the package manager, [[agx-pkg]]. The linux package managers are usually broken into two main tools: dpkg for [http://debian.org/ debian] or emerge for [http://gentoo.org/ gentoo], which are the low-level packaging tools for dealing with packages on a client/user machine. Then, there are interfaces like apt, which connect to package repositories (as attained by sources.list) to get all of the metadata on all of the packages at once (it's not much data, so it's not that much of a load on the system - 10 MB compressed?).


= Integrating this with debtorrents =

* [http://advogato.org/article/972.html Notes on debtorrent, revamping the meta-repository infrastructure]


= 2008-05-14: pepakura =

We need to make some example metadata files, and for that we need an example artifact and process to build it with.We considered origami due to its apparent simplicity, but it turns out that origami is conceptually difficult for humans, and maybe computers too. So, a related process, called papercraft or &quot;pepakura&quot;, where paper is allowed to be cut and taped/glued back together, sidesteps the problems of origami at a minimum of added complexity. Also, papercraft can be used to generate a physical artifact from an arbitrary polygon mesh, using only FOSS tools available today. (blender mesh_unfold.py and inkscape) 

Current upper limit of the technology: http://www.bertsimons.nl/zenphoto/paperworks/


fenn's initial papercraft attempt:


[[Image:25-hedron.png]]

[[Image:DCP_0804.png]]


In this case, the paper torus is the artifact, papercraft is the process used to make it, and a 2d printer, paper, scissors, tape, are the build-time requirements. Unfortunately a paper torus doesn't have much functionality without further modification, such as filling with foam, applying fiberglass, or embedding other functional components.


papercraft is generalizable and not limited to regular polyhedrons, although automated &quot;bud chopping&quot; is not yet implemented. The current mesh_unfold.py has no problem with &quot;weird&quot; topology, as long as there are no twisted quadrilaterals:

[[Image:triangulated_knot.jpg]]


* 2008-12-08: Please also see the [http://groups.google.com/group/openmanufacturing open source manufacturing] mailing list posts on [http://groups.google.com/group/openmanufacturing/browse_thread/thread/aa37d3d4badd1bb6/0b326bd815e465f1?lnk=gst&amp;q=treemaker#0b326bd815e465f1 origami], and [http://groups.google.com/group/openvirgle/msg/99b9855310a9e00f the old openvirgle-origami post].


= 2008-05-18: Formalizing the reuslts =

So using [http://blender.org/ blender] anywhere in a todo list is generally a bad idea because of the well-known [http://www.google.com/search?client=opera&amp;rls=en&amp;q=blender+steep+learning+curve&amp;sourceid=opera&amp;ie=utf-8&amp;oe=utf-8 ridiculously steep learning curve]. So, ideally, just go find some [[3D models]] that you want to play with, then run the unfold.py script (possibly in blender's [http://geheim.blender.org/forum/viewtopic.php?t=6242&amp;view=next batch mode]). Then dump the 3D models, 2D unfolded meshes (SVG) into a [http://git.or.cz/ git repository], and throw in some [http://python.org/ python] to specify the class in a metadata file, like so (but modified to fit the needs of pepakura) (it's from [[2008-05-12]]):


&lt;python&gt;

class metadata:

	name = &quot;Default project name&quot;

	pgpKey = &quot;3940914afdafdja0r391&quot;

	hashSum = &quot;90149940141&quot; # This is of the dot skdb file in general, yes?

	maintainerEmailAddy = &quot;Bryan Bishop &lt;kanzure@gmail.com&gt;&quot;

	reqFiles # Required files for a default installation.

	configScript = path/to/config/script/within/.skdb/config.py

	# To help with over-rides, the config-script is still included.

&lt;/python&gt;


Then we need to work on [[agx-get]], the python script that automates the downloading of these metadata files and the dot-skdb files (the git repositories, basically). And then we need to demonstrate a command that would encapsulate the cupsd server and actually printing it out, instructing the user to make the folds and cuts and use the tape, etc.


= 2009-04-03 =

* [[Part mating]]

** part mating: [http://heybryan.org/~bbishop/docs/Scale-Space%20Representations%20and%20their%20Applications%20to%203D%20Matching%20of%20Solid%20Models.ppt.pdf Scale-space representations and their applications to 3D match of solid models]

* [[PythonOCC]] ( http://pythonocc.org/ ) (OpenCASCADE API bindings for python)

* [http://pyyaml.org/ PyYAML]

* Dependency algorithms

** [http://search.cpan.org/~adamk/Algorithm-Dependency-1.108/lib/Algorithm/Dependency.pm Algorithm::Dependency] (perl)

** [http://heybryan.org/~bbishop/docs/scripts/topsort.py topsort.py] (python)

* shape grammars?