Kernel Trap

Syndicate content
KernelTrap is a web community devoted to sharing the latest in kernel development news.
Updated: 2 days 6 hours ago

2.6.27-rc8, "This One Should Be The Last One"

Wed, 01/10/2008 - 2:55am

"So yet another week, another -rc," began Linux creator, Linus Torvalds, announcing the 2.6.27-rc8 Linux kernel. He continued, "this one should be the last one: we're certainly not running out of regressions, but at the same time, at some point I just have to pick some point, and on the whole the regressions don't look _too_ scary. And -rc8 obviously does fix more of them." Linus went on to note that most of the changes since -rc7 are small, "and there aren't even a whole lot of them."

Jiri Kosina cautioned that there is still an unknown bug affecting the e1000e driver currently in the 2.6.27 kernel, "rendering the cards unusable for most of the i-am-not-a-hacker users (and remember, even Dave Airlie bricked his laptop completely to death, when trying to restore eeprom contents)" When asked how to duplicate the bug, Jiri noted that the inability to reliably reproduce the bug added to the difficulty in debugging the problem, "apparently it is some kind of race, as it usually takes multiple cycles to trigger".

read more

2.6.27-rc6, "Things Are Calming Down"

Sat, 13/09/2008 - 9:42am

"The patches most people hopefully care about tend to be small details," noted Linus Torvalds, announcing the 2.6.27-rc6 kernel. He continued, "and so more regressions should hopefully be closed now, some by just reverting the commits that caused breakage. I don't think anything special merits explicit comment, but you can get a flavor for things by scanning the appended shortlog." Earlier in the announcement email, Linus did note some specifics about which drivers caused the bulk of the patch:

"Same old deal - except it's been almost two weeks since -rc5. That said, the diff is actually about the same size, so I guess that means things are calming down. Most of the diff (bulk-wise) is updates to the new gspca (standard USB webcam) driver, although some of it is also removal of the dead miropcm20* driver."

read more

Tux3 Acting Like A Filesystem

Fri, 05/09/2008 - 2:44am

Daniel Phillips noted that his new Tux3 versioning filesystem is now operating like a filesystem, "the last burst of checkins has brought Tux3 to the point where it undeniably acts like a filesystem: one can write files, go away, come back later and read those files by name. We can see some of the hoped for attractiveness starting to emerge: Tux3 clearly does scale from the very small to the very big at the same time. We have our Exabyte file with 4K blocksize and we can also create 64 Petabyte files using 256 byte blocks." He went on to discuss some of the remaining features yet to be implemented, including atomic commits, versioning, coalesce on delete, a version of the filesystem written in the kernel, extents, locking, and extended attributes.

Reviewing the above list, Daniel decided he would work next on the coalesce on delete functionality, noting, "without this we can still delete files but we cannot recover file index blocks, only empty them, not so good." He added that at this time he was only going to focus on file truncation, "as soon as file truncation is added to the test mix we will see much more interesting behavior from the bitmap allocator, and we will discover some great ways to generate horrible fragmentation issues. Yummy." Daniel continued to point out that Tux3 is an open source project, and as such is always looking for others to participate, "whoever wants to carve their initials on what is starting to look like a for-real Linux filesystem, now is a great time to take a flyer. The code base is still tiny, builds fast, has lots of interactive feedback and is easy to work on. And you get to put your email address near the beginning of the list, which will naturally write its way into the history of open source. Probably."

read more

2.6.27-rc5, Fixing Regressions

Tue, 02/09/2008 - 1:48am

Linus Torvalds announced the 2.6.27-rc5 Linux Kernel, noting that his "weekly releases" tend to happen every eight days, adding, "the bulk of it is all config updates, and with arm and powerpc leading the pack." Linus continued:

"While the config updates amount to about three quarters of the diff, and if you don't use a rename-aware diff the blackfin include file movement pretty much accounts for the rest, hidden behind all those trivial (but bulky) changes are a lot of small changes that hopefully fix a number of regressions.

"The most exciting (well, for me personally - my life is apparently too boring for words) was how we had some stack overflows that totally corrupted some basic thread data structures. That's exciting because we haven't had those in a long time. The cause turned out to be a somewhat overly optimistic increase in the maximum NR_CPUS value, but it also caused some introspection about our stack usage in general. Including things like a patch to gcc to fix insane stack usage for vararg functions on x86-64. But that one would only hit anybody who was a bit too adventurous and selected the big 4096 CPU configuration. The rest of the regressions fixed are a bit more pedestrian."

read more

2.6.27-rc4, "Random Stuff All Over"

Tue, 26/08/2008 - 8:10am

"Another week, another -rc," began Linus Torvalds, announcing the 2.6.27-rc4 Linux kernel, continuing, "this time the diffstat is almost totally dominated by the addition of the musb driver that drives the MUSB and TUSB controllers integrated into omap2430 and davinci. That, together with the removal of the auerswald USB driver (replaced by libusb version) is more than half of the bulk of the patch, and obviously most users won't ever notice." Linus added:

"Apart from those bulky USB updates, there's some arch updates (blackfin and ia64), network and input driver updates, and an XFS and UBIFS update. The rest is mostly random stuff all over, probably best described by the appended shortlog. A number of regressions should be off the table, but more remain..."

read more

AXFS, Advanced Execute In Place Filesystem

Fri, 22/08/2008 - 2:10pm

"I'd like to get a first round of review on my AXFS filesystem," began Jared Hulbert, describing his new Advanced XIP File System for Linux. XIP stands for eXecute-In-Place. The new filesystem received quite a bit of positive feedback. Jared offered the following description:

"This is a simple read only compressed filesystem like Squashfs and cramfs. AXFS is special because it also allows for execute-in-place of your applications. It is a major improvement over the cramfs XIP patches that have been floating around for ages. The biggest improvement is in the way AXFS allows for each page to be XIP or not. First, a user collects information about which pages are accessed on a compressed image for each mmap()ed region from /proc/axfs/volume0. That 'profile' is used as an input to the image builder. The resulting image has only the relevant pages uncompressed and XIP. The result is smaller memory sizes and faster launches."

read more

Git 1.6.0 Released

Tue, 19/08/2008 - 10:46pm

"The latest feature release GIT 1.6.0 is available at the usual places," began Git maintainer, Junio Hamano, announcing the latest stable release of the distributed version control system originally written by Linus Torvalds. Among the current changes, Junio noted, "with the default Makefile settings, most of the programs are now installed outside your $PATH, except for 'git', 'gitk' and some server side programs that need to be accessible for technical reasons." He continued, "by default, packfiles created with this version uses delta-base-offset
encoding introduced in v1.4.4. Pack idx files are using version 2 that allows larger packs and added robustness thanks to its CRC checking, introduced in v1.5.2 and v1.4.4.5.
" Julio highlighted several other changes, including the addition of a '.sample' extension to the default trigger scripts to be sure they don't execute in a default install, and the removal of the 'stupid' merge strategy. Other changes include:

"Git-gui learned to stage changes per-line; Reduced excessive inlining to shrink size of the 'git' binary; When an object is corrupt in a pack, the object became unusable even when the same object is available in a loose form, we now try harder to fall back to these redundant objects when able; performance of 'git-blame -C -C' operation is vastly improved; even more documentation pages are now accessible via 'man' and 'git help'; longstanding latency issue with bash completion script has been addressed; pager. configuration variable can be used to enable/disable the default paging behaviour per command; git-cvsserver learned to respond to 'cvs co -c'; 'git-diff -p' learned to grab a better hunk header lines in BibTex, Pascal/Delphi, and Ruby files and also pays attention to chapter and part boundary in TeX documents; error codes from gitweb are made more descriptive where possible, rather than '403 forbidden' as we used to issue everywhere; git-merge has been reimplemented in C."

read more

64-bit Application Thread Creation Performance

Tue, 19/08/2008 - 6:51am

A recent discussion on the Linux Kernel mailing list noted that threaded 64-bit applications suffer a drastic slowdown in pthread_create performance when stack utilization goes above 4GB. Ingo Molnar offered an explanation of the problem, "unfortunately MAP_32BIT use in 64-bit apps for stacks was apparently created without foresight about what would happen in the MM when thread stacks exhaust 4GB. The problem is that MAP_32BIT is used both as a performance hack for 64-bit apps and as an ABI compat mechanism for 32-bit apps. So we cannot just start disregarding MAP_32BIT in the kernel - we'd break 32-bit compat apps and/or compat 32-bit libraries." The original report noted that once the shared stack goes above 4GB in size, thread creation can take as long as 10 milliseconds, a slowdown described as "quite unacceptable".

Ingo created a patch introducing a new MAP_STACK flag for glibc to be used instead of MAP_32BIT and avoid imposing the 32-bit performance limitation on threaded 64-bit applications. He noted, "glibc can switch to this new flag straight away - it will be ignored by the kernel." The new flag was quickly merged upstream, and changes were planned for glibc.

read more

Tux3 Hierarchical Structure

Fri, 15/08/2008 - 1:04pm

"It is about time to take a step back and describe what I have been implementing," began Daniel Phillips, referring to his new Tux3 filesystem. He provided a simple ASCII diagram that detailed the filesystem's hierarchical structure, describing each of the elements. About one he noted, "the volume table is a new addition not central to the goals of Tux3, but a nice feature to have given that it comes nearly for free. One Tux3 volume can have an arbitrary number of separate filesystems tucked inside it, indexed by a simple integer parameter at mount time. People say they like this idea and it imposes no significant complexity, so it goes in." Daniel continued:

"Each volume has a metablock pointing at the forward log chain for the volume, a version table that describes the hierarchical relationship between versions (snapshots), an atime table to take care of that horrid legacy Unix feature, and an inode table containing files and attributes of files. [...] Versioning takes place in three places, versioned pointers in the atime btree, versioned extents in a file data btree and versioned attributes in the inode table. [...] Notice the absence of a journal, the functionality of which is provided by forward log elements that I described in the Hammer thread (and will eventually write a separate post about)."

read more

Highlighting Interesting Mailing List Discussions

Thu, 14/08/2008 - 9:00am

New functionality has been enabled that allows logged-in users to highlight interesting mailing list discussions. This new feature has been provided out of necessity, as I'm finding myself with insufficient time of late for keeping up with the many mailing lists I track to post articles on KernelTrap. My goal is to inspire you to participate more in the process, occasionally clicking the new up-arrow on mailing list messages that you find interesting and worthy of attention. In the upcoming weeks, improved interfaces will be provided for navigating other people's votes, and for filtering on only the mailing lists you're interested in. Future KernelTrap stories and quotes will be selected from those that are highlighted by this voting process.

read more

2.6.27-rc3, "Things Really _Have_ Calmed Down"

Thu, 14/08/2008 - 5:49am

"Things really _have_ calmed down, and hopefully we've also resolved a lot of the regressions in -rc3," began Linus Torvalds, announcing the 2.6.27-rc3 Linux kernel. He noted that much of the patch size was from the inclusion of the new ath9k wireless driver, with much of the rest of the patch size due to the renaming of many arch include files in the ARM, AVR32 and m68lnommu architectures. Linus continued:

"All the small changes are where the regression fixes are, and other random improvements. And they're all over. The ShortLog (appended) probably gives a taste of it."

read more

LVM Snapshot Merging

Sun, 10/08/2008 - 5:02am

Mikulas Patocka announced new patches introducing snapshot merging for the Linux kernel's logical volume manager. He explained, "snapshot merging allows you to merge snapshot content back into the original device. The most useful use for this feature is the possibility to rollback [the] state of the whole computer after [a] failed package upgrade, [or an] administrator's error". The patches are for the 2.6.26 kernel, with device mapper 1.02.27 and LVM2.2.02.39.

Mikulas noted that there are three types of merges supported, --nameorigin, --namesnapshot, and --onactivate. The default merge method is --nameorigin, which can merge a snapshot into the origin volume, which can be mounted at any time after the merge starts. The --namesnapshot method merges into a snapshot, which can then be mounted. And the --onactive method schedules a merge to happen the next time the volume is activated, such as during a reboot. Mikulas noted, "this implementation of snapshot merging is meant to be stable, report any possible bugs to me."

read more

Btrfs 0.16, Improved Scalability And Performance

Sat, 09/08/2008 - 6:02am

"Btrfs v0.16 is available for download," began Chris Mason, announcing the latest release of his new Btrfs filesystem. He noted, "v0.16 has a shiny new disk format, and is not compatible with filesystems created by older Btrfs releases. But, it should be the fastest Btrfs yet, with a wide variety of scalability fixes and new features." Improved scalability and performance improvements include fine grained btree locking, pushing CPU intensive operations such as checksumming into their own background threads, improved data=ordered mode, and a new cache to reduce IO requirements when cleaning up old transactions. Other new features include support for ACLs, prevention of orphaned inodes so files won't be lost after a crash, and a more robust directory index format. Chris noted:

"There are still more disk format changes planned, but we're making every effort to get them out of the way as quickly as we can. You can see the major features we have planned on the ">development timeline. [...] the btrfs kernel module now weighs in at 30,000 LOC, which means we're getting very close to the size of ext[34]."

read more

Comparing HAMMER And Tux3

Fri, 08/08/2008 - 2:25am

"The big advantage Hammer has over Tux3 is, it is up and running and released in the Dragonfly distro," began Daniel Phillips, offering a comparison between the two filesystem. He continued, "the biggest disadvantage is, it runs on BSD, not Linux, and it so heavily implements functionality that is provided by the VFS and block layer in Linux that a port would be far from trivial. It will likely happen eventually, but probably in about the same timeframe that we can get Tux3 up and stable." This led into a lengthy and interesting technical discussion between Daniel and HAMMER author Matthew Dillon, comparing the design of the two filesystems.

Matthew reviewed the Tux3 notes and replied, "it sounds like Tux3 is using many similar ideas [as HAMMER]. I think you are on the right track. I will add one big note of caution, drawing from my experience implementing HAMMER, because I think you are going to hit a lot of the same issues. I spent 9 months designing HAMMER and 9 months implementing it. During the course of implementing it I wound up throwing away probably 80% of the original design outright." Daniel noted that he's been working on the Tux3 design for around ten years, "and working seriously on the simplifying elements for the last three years or so, either entirely on paper or in related work like ddsnap and LVM3." Matthew cautioned, "I can tell you've been thinking about Tux for a long time. If I had one worry about your proposed implementation it would be in the area of algorithmic complexity. You have to deal with the in-memory cache, the log, the B-Tree, plus secondary indexing for snapshotted elements and a ton of special cases all over the place. Your general lookup code is going to be very, very complex. My original design for HAMMER was a lot more complex (if you can believe it!) then the end result. A good chunk of what I had to do going from concept to reality was deflate a lot of that complexity." The friendly conversation offers a very detailed look at the design choices made in each of these file systems.

read more

2.6.27-rc2, "A Lot Of Random Changes"

Thu, 07/08/2008 - 5:44pm

"So it's been a week since -rc1, and -rc2 is out there," began Linux creator Linus Torvalds, announcing the 2.6.27-rc2 Linux kernel. He noted, "there's a lot of random changes in there, and I'm hoping we're starting to calm down, but one particular _kind_ of random change is probably worth pointing out explicitly due to the things it can result in: the fact that a number of architectures ended up using the 'lull' after -rc1 (hah!) to do the 'include/asm-xyz' => 'arch/xyz/include/asm' renames." Linus explained that for people actively developing and merging code with git, "be aware that we've recently had more renames than the rename detection limit in git defaults to, and as a result, if you have a rename<->data change conflict, you may want to increase the default limit." Linus noted that developers with sufficient ram can set "renamelimit=0" to completely disable the limit, and others can set it to a high value such as 5,000, "the default limit is pretty low just to not cause problems for people who have less memory in their machines than kernel developers tend to have..."

Linus continued, "the dirstat (with rename detection on, so as to not show the movement as huge changes) is fairly usual, with most of the changes in drivers, along with an ext4 and xfs update making 'fs' show up pretty high too". He added:

"The shortlog is still a tad too big to make it on the list (again, as usual - normally I end up posting shortlogs for -rc3 and later when they become more manageable) but let me just say that it isn't really all that interesting. Theres' a lot of small changes here, but nothing that makes you go 'Wow!'. Not that there _should_ be anything like that in -rc2, of course, so I'm not complaining."

read more

Linux Job Board

Thu, 07/08/2008 - 7:19am

We have partnered with HotLinuxJobs to provide KernelTrap readers with a new Linux Job Board dedicated to helping you find employment in free and open source technologies. The list of jobs on our website will be automatically updated as new jobs come available, and can be found by clicking the 'Jobs' link in the menu at the top of every page. When you find a listed job that you're interested in, follow the link from the job board to sign up for the free Jobs Email List. A HotLinuxJobs recruiter will then personally work with you, interviewing your over the phone before sending your resume to the potential employer, and protecting your confidentiality. You may sign up for the Jobs Email List even if you currently find no matching jobs, and a HotLinuxJobs recruiter will contact you when a match is found. Their website explains:

"[HotLinuxJobs is a] search firm specializing in the placement of Linux / Open Source professionals, providing both contract and direct hire services to our clients. Our knowledge of the Linux / Open Source landscape and employment marketplace make us your most efficient recruiting resource. If you are looking for exciting opportunities and want to work for leading companies adopting Open Source technologies, please contact us and send us a copy of your resume and sign up for our jobs email list."

read more

Reiser4 Update

Thu, 07/08/2008 - 4:00am

"I have had to apply the reiser4 patches from -mm kernels to vanilla based patchset for over a year now. Reiser4 works fine, what will it take to get it included in vanilla?" began a brief thread on the Linux Kernel mailing list. Theodore Ts'o offered several links detailing the reamining issues with Reiser4, then suggested, "people who really like reiser4 might want to take a look at btrfs; it has a number of the same design ideas that reiser3/4 had --- except (a) the filesystem format has support for some advanced features that are designed to leapfrog ZFS, (b) the maintainer is not a crazy man and works well with other LKML developers (free hint: if your code needs to be reviewed to get in, and reviewers are scarce; don't insult and abuse the volunteer reviewers as Hans did --- Not a good plan!)."

Edward Shishkin noted that Reiser4 development continues, "I am working on the plugin design document. It will be ready approximately in September. I believe that it'll address all the mentioned complaints." He added, "This document [defines] plugins [and] primitives (like conversion of run-time objects) used in reiser4, and describes all reiser4 interfaces, so that it will be clear that VFS functionality is not duplicated, there are not VFS layers inside reiser4, etc."

Hans Reiser, the original developer of the Reiser4 filesystem, was convicted of first degree murder on April 28'th, 2008. The latest Reiser4 patches currently live on kernel.org, as do the necessary support programs.

read more

Reviewing Linux-next

Wed, 06/08/2008 - 6:33am

"I do think 'next' as it is has a few issues that either need to be fixed (unlikely - it's not the point of next) or just need to be aired as issues and understood," noted Linus Torvalds about the linux-next development tree, originally designed as a way to get subsystem maintainers more involved in managing merge conflicts. Linus continued, "I don't think anybody wants it to go away. The question in my mind is more along the way of how/whether it should be changed. There was some bickering about patches that weren't there, and some about how _partial_ series were there but then the finishing touches broke things."

He listed his two primary concerns as, "I don't think it does 'quality control', and I think that's pretty fundamental," and, "I don't think the 'next' thing works as well for the occasional developer that just has a few patches pending as it works for subsystem maintainers that are used to it." Linus continued, "I don't think either of the above issues is a 'problem' - I just think they should be acknowledged. I think 'next' is a good way for the big subsystem developers to be able to see problems early, but I really hope that nobody will _ever_ see next as a 'that's the way into Linus' tree', because for the above two reasons I do not think it can really work that way." Andrew Morton noted, "a lot of the bugs which hit your tree would have been quickly found in linux-next too," then added, "but it's all shuffling deckchairs, really. Are we actually merging better code as a reasult of all of this? Are we being more careful and reviewing better and testing better? Don't think so."

read more

2.6.27-rc1, "Pretty Dang Busy"

Wed, 30/07/2008 - 10:18am

"It's two weeks (and one day), and the merge window is over," began Linus Torvalds, announcing the 2.6.27-rc1 kernel. He continued, "finally. I don't know why, but this one really did feel pretty dang busy. And the size of the -rc1 patch bears that out - at 12MB, it's about 50% bigger than 26-rc1 (but not that much bigger than 24/25-rc1, so it's not like it's anything unheard of)." He reflected, "the pure size of the -rc's _is_ making me a bit nervous, though. Sure, it means that we are good at merging it all, but I have to say that I sometimes wonder if we don't merge too much in one go, and even our current (fairly short) release cycle is actually too big." As for the actual changes, Linus explained:

"Much of -rc1 was in linux-next, but certainly not everything. We'll see how that whole thing ends up evolving - it certainly didn't solve all problems, and there was some bickering about things that weren't there (and some things that mostly were ;), but maybe it helped. There's a ton of new stuff in there, but at least personally the interesting things are the BKL pushdown and perhaps the introduction of the lockless get_user_pages_fast(). The build system also got updated to allow moving the architecture include files ('include/asm-xyz') into the architecture subdirectories ('arch/xyz/include/asm'), and sparc seems to have taken advantage of that already."

Other changes Linus highlighted included merging the UBI filesystem, as well as, "tracing, firmware loading, continued x86 arch merging, and moving more code to generic support (unified generic IPI handling, coherent dma memory allocation, show_mem etc). Bootmem rewrites. [And] some support for further scalability (ie 4k cpu cores)."

read more

Tux3 Versioning Filesystem

Sat, 26/07/2008 - 10:00am

"Since everybody seems to be having fun building new filesystems these days, I thought I should join the party, began Daniel Phillips, announcing the Tux3 versioning filesystem. He continued, "Tux3 is a write anywhere, atomic commit, btree based versioning filesystem. As part of this work, the venerable HTree design used in Ext3 and Lustre is getting a rev to better support NFS and possibly become more efficient." Daniel explained:

"The main purpose of Tux3 is to embody my new ideas on storage data versioning. The secondary goal is to provide a more efficient snapshotting and replication method for the Zumastor NAS project, and a tertiary goal is to be better than ZFS."

In his announcement email, Daniel noted that implementation work is underway, "much of the work consists of cutting and pasting bits of code I have developed over the years, for example, bits of HTree and ddsnap. The immediate goal is to produce a working prototype that cuts a lot of corners, for example block pointers instead of extents, allocation bitmap instead of free extent tree, linear search instead of indexed, and no atomic commit at all. Just enough to prove out the versioning algorithms and develop new user interfaces for version control."

read more