What is known about compressing graphs? Here, with "compressing", I mean something like "putting a graph into a zip program"; or with a more technical expression, what is know about the Kolmogorov complexity of a graph? Does it make sense to define something in this line at all? I guess one needs a binary string, first of all. Is there then a way to get a binary string from a graph in some unequivocal way, without having to deal with labeling issues, permutations, etc.?

2$\begingroup$ how do we get around graph isomorphism? $\endgroup$– SuvritNov 6 '11 at 20:29

$\begingroup$ For Eulerian graphs that are part of a reasonably small "ensemble" (and probably more generally for graphs that are part of an ensemble that is sharply peaked in some way) this is very much doable and it leads to a rather good pseudometric using information distance. I have emailed you some unpublished work from 2003 that details this and have a bit more on related subjects if you are interested. – Steve Huntsman 0 secs ago $\endgroup$– Steve HuntsmanNov 7 '11 at 1:07

$\begingroup$ This is an old question but if anyone is still interested, I have found a number of theorems and algorithms derived therefrom that allow me to reversibly encode any simple graph in an integer that is no more than $T_n = (n^2+n)/2$ bits long where n is the number of vertices. It only works for undirected graphs however. $\endgroup$– Stuart LaForgeFeb 18 '19 at 4:31
Li and Vitányi in their standard textbook on Kolmogorov complexity (3rd edition, p.456) observe
Almost all strings have high complexity. Therefore, almost all tournaments and almost all undirected graphs have high complexity.
This is made more precise in Section 6.4. In particular, they show that the number of distinct isomorphism classes of undirected graphs with $n$ vertices asymptotically approaches $2^\binom{n}{2}/n!$, with only a small error.
By Stirling's approximation, the Kolmogorov complexity of undirected graphs with $n$ vertices can then be seen to be close to $n(n1)/2 + c$ bits. For undirected graphs the adjacency matrix is symmetric and has 0's on the diagonal, so one only needs to store the $n(n1)/2$ bits above the diagonal.
This makes more precise the claim made in another answer, that the adjacency matrix representation is optimal. (Note that close in this argument may still be an unbounded function of $n$; see also Lemma 6.4.6 and the comment after it.)
 Ming Li and Paul Vitányi, An Introduction to Kolmogorov Complexity and Its Applications, 3rd edition, SpringerVerlag, 2008. doi: 10.1007/9780387498201

$\begingroup$ Thank you! Here some more refined information: math.hmc.edu/seniorthesis/archives/2006/jhearn/… Applications of Kolmogorov Complexity toGraphs John Hearn, Harvey Mudd College, 2006 $\endgroup$– user3409Nov 7 '11 at 0:19
Each graph can be uniquelly represented as an adjacency matrix. This is simply an nxn matrix of values 0,1. Each graph can therefore be expressed uniquely as a string of length n^2 by sequentially concatenating each line. Graphs with cycles and directed graphs are also uniquely represented.
Examples:
The complete graph of degree 3 can be expressed as 011101110 > 011101110
The graph with Nodes {1,2,3} and edges {{1, 2},{1,3}} can be expressed as 011100100 > 011100100
From this representation and from kolmogorov complexity certain statements become evident.
(1) For any other representation of graphs, there exists a constant c such that the conditional kolgomorov complexity on this representation saves at most c bits from the kolmogorov complexity from adjacency matrix.
(2) Sparse graphs are highly compressible (adjacency matrix of sparse graph contain a lot of zeroes versus very few ones)
(3) if all you care about is about equivalency classes between graphs.
Program p1(n) := find the nth graph g such that no graph preceding (in my representation) is isomorphic,.
Program p2(n) := find the smallest graph that is isomorphic to the graph n.
program p3(n) := find the n' such that p1(n') = p2(n)
p3 is the function that compresses a graph
p1 is the decompression function

3$\begingroup$ This representation encodes additional information other than the isomorphism type of the underlying graph, namely an ordering on the vertices. It seems to me to be a highly nontrivial question how to store only the isomorphism type of a graph in a minimal way; if one could do this in a completely irredundant way efficiently this would solve the graph isomorphism problem. $\endgroup$ Nov 6 '11 at 19:59

$\begingroup$ To add to Qiaochu Yuan's observation, there are graphs which have a superpolynomial number of different adjacency matrices which all encode isomorphic graphs. This means that the argument (1), claiming that this representation is within a constant term $c$ from any other, cannot hold. $\endgroup$ Nov 6 '11 at 22:05

$\begingroup$ Andras has a point here. Thank you. It is not sufficient to say that "I look at the complexity of string obtained from the adjacency matrix". There must be something more to be added, because of the isomorphism issue  and even more superficially, just the chosen labeling. I would not even try to look at the min/max over all possible strings obtained in this way. Maybe a graph graph canonicalization procedure is going to work (Laszlo Babai, Eugene Luks works from the late 70s). A little post script: I am OK, if the whole thing is going to be as hard as graph isomorphism.. $\endgroup$– user3409Nov 6 '11 at 22:07

2$\begingroup$ Why all this objection about isomorphisms? The problem was to compress graphs, not graphsuptoisomorphism. The adjacency matrix represents graphs on a fixed vertex set in a onetoone way. $\endgroup$ Nov 13 '11 at 12:04

$\begingroup$ For the fully connected 3 node graph, assuming that we are interested in unlabelled, undirected graphs that aren't multigraphs, you don't need 011101110, you only need "111" this then becomes just 111. You could also thin out highly connected graphs by swapping what's connectedness and what isn't, you'll need to store an extra bit to show if you've performed this adjustment, in this case reducing things to just "1". You may also want to assume that the graph is connected, lacking free floating islands. $\endgroup$ Aug 12 at 14:15
Graph compression algorithms are starting to be used in biological applications (e.g. Itzkovitz, et al. Coarsegraining and selfdissimilarity of complex networks; L. Peshkin, Structure induction by lossless graph compression; M. Hayashida and T. Akutsu Comparing biological networks via graph compression). Unlike the pure mathematical side, the networks involved are not the whole class of (nonisomorphic) (un)labelled graphs.
Motivated by string compression, these algorithms can detect frequently occurring induced subgraphs. Recently, the related topic of network motifs (small connected subgraphs that occur as an induced subgraph significantly more frequently than in randomized networks) has received an explosion of interest. It is hoped that graph compression algorithms can provide insight into network motifs (and other small induced sugraphs, or "graphlets"), and thus provide insight into the (biological) network as a whole.
While searching for a definition of chromatic entropy, I found a link that might prove helpful to you:
Entropy, Orbits, and Spectra of Graphs by A. Mowshowitz and V. Mitsou.
In particular, they summarize results that study complexity of graphs by characterizing it in terms of entropy, information content of graphs, etc.
it depends on whether the graphs you want to compress have any particular repeated "structure" in the sense of something that can be recognized by an algorithm or if they are random (you dont specify suggest you edit question or specify this in comments for better feedback). if the graphs are random, compression is not possible just as theory predicts that "most strings are random".
"random" graphs here can be roughly defined as a graph such that half of all the possible edges are distributed randomly. sparse graphs can be "compressed" in the sense that if you store only the edges that are present thats an improvement over storing a list of all edges with '1' or '0' for edge present or absent). nobody so far has suggested the somewhat obvious strategy for dense graphs of storing the inverse and then compressing that.
any possible existing data compression algorithm can be used on graphs by converting the graph into a string using some conversion algorithm (ie 11 mapping between strings and graphs) and various conversion algorithms vs. the graph structure would affect how much the graph can be compressed.
the question is also very much related to the question, "what are different ways/data structures to store graphs, esp those that take into acct repeated structures in the graph".

$\begingroup$ oops this should read "complement" instead of "inverse" $\endgroup$– vznSep 21 '12 at 3:47
this question is closely related to a field called graph complexity in computational complexity theory. there are many compression algorithms possible, any standard one from computer science will suffice.
however one fairly natural graph compression algorithm is to represent them via circuits (DAGs). in this way circuits can be used to construct more complex graphs out of simpler "basic" graphs. there are some foremost problems (eg in complexity class separations) expressible in this area/framework. for example NP$\stackrel{?}{\subset}$P/poly is open, is a stronger conjecture than P$\neq$NP (because its the nonuniform version vs uniform case where Kolmogorov complexity is more associated with uniform circuits), and can be expressed as a lower bound associated with building graphs out of stars using monotone circuits (AND/OR operations only on edge sets).
see this nice survey, Computational Complexity of Graphs by Stasys Jukna
The answer is given both in theoretical terms and numerical methods in these papers:
Correlation of Automorphism Group Size and Topological Properties with Programsize Complexity Evaluations of Graphs and Complex Networks H. Zenil, F. SolerToscano, K. Dingle and A. Louis Physica A: Statistical Mechanics and its Applications, vol. 404, pp. 341–358, 2014. [online, preprint, video] doi: 10.1016/j.physa.2014.02.060 (Elsevier)
Methods of Information Theory and Algorithmic Complexity for Network Biology H. Zenil, N.A. Kiani and J. Tegnér Seminars in Cell and Developmental Biology, vol. 51, pp. 3243, 2016. doi:10.1016/j.semcdb.2016.01.011 [online, preprint] (Elsevier)
and
A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity H. Zenil, S. HernándezOrozco, N.A. Kiani, F. SolerToscano, A. RuedaToicen Entropy 20(8), 605, 2018. [online] (MDPI)