README 67.4 KB
Newer Older
L
Linus Torvalds 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
Devfs (Device File System) FAQ


Linux Devfs (Device File System) FAQ
Richard Gooch
20-AUG-2002


Document languages:







-----------------------------------------------------------------------------

NOTE: the master copy of this document is available online at:

http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html
and looks much better than the text version distributed with the
kernel sources. A mirror site is available at:

http://www.ras.ucalgary.ca/~rgooch/linux/docs/devfs.html

There is also an optional daemon that may be used with devfs. You can
find out more about it at:

http://www.atnf.csiro.au/~rgooch/linux/

A mailing list is available which you may subscribe to. Send
email
to majordomo@oss.sgi.com with the following line in the
body of the message:
subscribe devfs
To unsubscribe, send the message body:
unsubscribe devfs
instead. The list is archived at

http://oss.sgi.com/projects/devfs/archive/.

-----------------------------------------------------------------------------

Contents


What is it?

Why do it?

Who else does it?

How it works

Operational issues (essential reading)

Instructions for the impatient
Permissions persistence across reboots
Dealing with drivers without devfs support
All the way with Devfs
Other Issues
Kernel Naming Scheme
Devfsd Naming Scheme
Old Compatibility Names
SCSI Host Probing Issues



Device drivers currently ported

Allocation of Device Numbers

Questions and Answers

Making things work
Alternatives to devfs
What I don't like about devfs
How to report bugs
Strange kernel messages
Compilation problems with devfsd


Other resources

Translations of this document


-----------------------------------------------------------------------------


What is it?

Devfs is an alternative to "real" character and block special devices
on your root filesystem. Kernel device drivers can register devices by
name rather than major and minor numbers. These devices will appear in
devfs automatically, with whatever default ownership and
protection the driver specified. A daemon (devfsd) can be used to
override these defaults. Devfs has been in the kernel since 2.3.46.

NOTE that devfs is entirely optional. If you prefer the old
disc-based device nodes, then simply leave CONFIG_DEVFS_FS=n (the
default). In this case, nothing will change.  ALSO NOTE that if you do
enable devfs, the defaults are such that full compatibility is
maintained with the old devices names.

There are two aspects to devfs: one is the underlying device
namespace, which is a namespace just like any mounted filesystem. The
other aspect is the filesystem code which provides a view of the
device namespace. The reason I make a distinction is because devfs
can be mounted many times, with each mount showing the same device
namespace. Changes made are global to all mounted devfs filesystems.
Also, because the devfs namespace exists without any devfs mounts, you
can easily mount the root filesystem by referring to an entry in the
devfs namespace.


The cost of devfs is a small increase in kernel code size and memory
usage. About 7 pages of code (some of that in __init sections) and 72
bytes for each entry in the namespace. A modest system has only a
couple of hundred device entries, so this costs a few more
pages. Compare this with the suggestion to put /dev on a <a
href="#why-faq-ramdisc">ramdisc.

On a typical machine, the cost is under 0.2 percent. On a modest
system with 64 MBytes of RAM, the cost is under 0.1 percent.  The
accusations of "bloatware" levelled at devfs are not justified.

-----------------------------------------------------------------------------


Why do it?

There are several problems that devfs addresses. Some of these
problems are more serious than others (depending on your point of
view), and some can be solved without devfs. However, the totality of
these problems really calls out for devfs.

The choice is a patchwork of inefficient user space solutions, which
are complex and likely to be fragile, or to use a simple and efficient
devfs which is robust.

There have been many counter-proposals to devfs, all seeking to
provide some of the benefits without actually implementing devfs. So
far there has been an absence of code and no proposed alternative has
been able to provide all the features that devfs does. Further,
alternative proposals require far more complexity in user-space (and
still deliver less functionality than devfs). Some people have the
mantra of reducing "kernel bloat", but don't consider the effects on
user-space.

A good solution limits the total complexity of kernel-space and
user-space.


Major&minor allocation

The existing scheme requires the allocation of major and minor device
numbers for each and every device. This means that a central
co-ordinating authority is required to issue these device numbers
(unless you're developing a "private" device driver), in order to
preserve uniqueness. Devfs shifts the burden to a namespace. This may
not seem like a huge benefit, but actually it is. Since driver authors
will naturally choose a device name which reflects the functionality
of the device, there is far less potential for namespace conflict.
Solving this requires a kernel change.

/dev management

Because you currently access devices through device nodes, these must
be created by the system administrator. For standard devices you can
usually find a MAKEDEV programme which creates all these (hundreds!)
of nodes. This means that changes in the kernel must be reflected by
changes in the MAKEDEV programme, or else the system administrator
creates device nodes by hand.

The basic problem is that there are two separate databases of
major and minor numbers. One is in the kernel and one is in /dev (or
in a MAKEDEV programme, if you want to look at it that way). This is
duplication of information, which is not good practice.
Solving this requires a kernel change.

/dev growth

A typical /dev has over 1200 nodes! Most of these devices simply don't
exist because the hardware is not available. A huge /dev increases the
time to access devices (I'm just referring to the dentry lookup times
and the time taken to read inodes off disc: the next subsection shows
some more horrors).

An example of how big /dev can grow is if we consider SCSI devices:

host           6  bits  (say up to 64 hosts on a really big machine)
channel        4  bits  (say up to 16 SCSI buses per host)
id             4  bits
lun            3  bits
partition      6  bits
TOTAL          23 bits


This requires 8 Mega (1024*1024) inodes if we want to store all
possible device nodes. Even if we scrap everything but id,partition
and assume a single host adapter with a single SCSI bus and only one
logical unit per SCSI target (id), that's still 10 bits or 1024
inodes. Each VFS inode takes around 256 bytes (kernel 2.1.78), so
that's 256 kBytes of inode storage on disc (assuming real inodes take
a similar amount of space as VFS inodes). This is actually not so bad,
because disc is cheap these days. Embedded systems would care about
256 kBytes of /dev inodes, but you could argue that embedded systems
would have hand-tuned /dev directories. I've had to do just that on my
embedded systems, but I would rather just leave it to devfs.

Another issue is the time taken to lookup an inode when first
referenced. Not only does this take time in scanning through a list in
memory, but also the seek times to read the inodes off disc.
This could be solved in user-space using a clever programme which
scanned the kernel logs and deleted /dev entries which are not
available and created them when they were available. This programme
would need to be run every time a new module was loaded, which would
slow things down a lot.

There is an existing programme called scsidev which will automatically
create device nodes for SCSI devices. It can do this by scanning files
in /proc/scsi. Unfortunately, to extend this idea to other device
nodes would require significant modifications to existing drivers (so
they too would provide information in /proc). This is a non-trivial
change (I should know: devfs has had to do something similar). Once
you go to this much effort, you may as well use devfs itself (which
also provides this information).  Furthermore, such a system would
likely be implemented in an ad-hoc fashion, as different drivers will
provide their information in different ways.

Devfs is much cleaner, because it (naturally) has a uniform mechanism
to provide this information: the device nodes themselves!


Node to driver file_operations translation

There is an important difference between the way disc-based character
and block nodes and devfs entries make the connection between an entry
in /dev and the actual device driver.

With the current 8 bit major and minor numbers the connection between
disc-based c&b nodes and per-major drivers is done through a
fixed-length table of 128 entries. The various filesystem types set
the inode operations for c&b nodes to {chr,blk}dev_inode_operations,
so when a device is opened a few quick levels of indirection bring us
to the driver file_operations.

For miscellaneous character devices a second step is required: there
is a scan for the driver entry with the same minor number as the file
that was opened, and the appropriate minor open method is called. This
scanning is done *every time* you open a device node. Potentially, you
may be searching through dozens of misc. entries before you find your
open method. While not an enormous performance overhead, this does
seem pointless.

Linux *must* move beyond the 8 bit major and minor barrier,
somehow. If we simply increase each to 16 bits, then the indexing
scheme used for major driver lookup becomes untenable, because the
major tables (one each for character and block devices) would need to
be 64 k entries long (512 kBytes on x86, 1 MByte for 64 bit
systems). So we would have to use a scheme like that used for
miscellaneous character devices, which means the search time goes up
linearly with the average number of major device drivers on your
system. Not all "devices" are hardware, some are higher-level drivers
like KGI, so you can get more "devices" without adding hardware
You can improve this by creating an ordered (balanced:-)
binary tree, in which case your search time becomes log(N).
Alternatively, you can use hashing to speed up the search.
But why do that search at all if you don't have to? Once again, it
seems pointless.

Note that devfs doesn't use the major&minor system. For devfs
entries, the connection is done when you lookup the /dev entry. When
devfs_register() is called, an internal table is appended which has
the entry name and the file_operations. If the dentry cache doesn't
have the /dev entry already, this internal table is scanned to get the
file_operations, and an inode is created. If the dentry cache already
has the entry, there is *no lookup time* (other than the dentry scan
itself, but we can't avoid that anyway, and besides Linux dentries
cream other OS's which don't have them:-). Furthermore, the number of
node entries in a devfs is only the number of available device
entries, not the number of *conceivable* entries. Even if you remove
unnecessary entries in a disc-based /dev, the number of conceivable
entries remains the same: you just limit yourself in order to save
space.

Devfs provides a fast connection between a VFS node and the device
driver, in a scalable way.

/dev as a system administration tool

Right now /dev contains a list of conceivable devices, most of which I
don't have. Devfs only shows those devices available on my
system. This means that listing /dev is a handy way of checking what
devices are available.

Major&minor size

Existing major and minor numbers are limited to 8 bits each. This is
now a limiting factor for some drivers, particularly the SCSI disc
driver, which consumes a single major number. Only 16 discs are
supported, and each disc may have only 15 partitions. Maybe this isn't
a problem for you, but some of us are building huge Linux systems with
disc arrays. With devfs an arbitrary pointer can be associated with
each device entry, which can be used to give an effective 32 bit
device identifier (i.e. that's like having a 32 bit minor
number). Since this is private to the kernel, there are no C library
compatibility issues which you would have with increasing major and
minor number sizes. See the section on "Allocation of Device Numbers"
for details on maintaining compatibility with userspace.

Solving this requires a kernel change.

Since writing this, the kernel has been modified so that the SCSI disc
driver has more major numbers allocated to it and now supports up to
128 discs. Since these major numbers are non-contiguous (a result of
unplanned expansion), the implementation is a little more cumbersome
than originally.

Just like the changes to IPv4 to fix impending limitations in the
address space, people find ways around the limitations. In the long
run, however, solutions like IPv6 or devfs can't be put off forever.

Read-only root filesystem

Having your device nodes on the root filesystem means that you can't
operate properly with a read-only root filesystem. This is because you
want to change ownerships and protections of tty devices. Existing
practice prevents you using a CD-ROM as your root filesystem for a
*real* system. Sure, you can boot off a CD-ROM, but you can't change
tty ownerships, so it's only good for installing.

Also, you can't use a shared NFS root filesystem for a cluster of
discless Linux machines (having tty ownerships changed on a common
/dev is not good). Nor can you embed your root filesystem in a
ROM-FS.

You can get around this by creating a RAMDISC at boot time, making
an ext2 filesystem in it, mounting it somewhere and copying the
contents of /dev into it, then unmounting it and mounting it over
/dev.

A devfs is a cleaner way of solving this.

Non-Unix root filesystem

Non-Unix filesystems (such as NTFS) can't be used for a root
filesystem because they variously don't support character and block
special files or symbolic links. You can't have a separate disc-based
or RAMDISC-based filesystem mounted on /dev because you need device
nodes before you can mount these. Devfs can be mounted without any
device nodes. Devlinks won't work because symlinks aren't supported.
An alternative solution is to use initrd to mount a RAMDISC initial
root filesystem (which is populated with a minimal set of device
nodes), and then construct a new /dev in another RAMDISC, and finally
switch to your non-Unix root filesystem. This requires clever boot
scripts and a fragile and conceptually complex boot procedure.

Devfs solves this in a robust and conceptually simple way.

PTY security

Current pseudo-tty (pty) devices are owned by root and read-writable
by everyone. The user of a pty-pair cannot change
ownership/protections without being suid-root.

This could be solved with a secure user-space daemon which runs as
root and does the actual creation of pty-pairs. Such a daemon would
require modification to *every* programme that wants to use this new
mechanism. It also slows down creation of pty-pairs.

An alternative is to create a new open_pty() syscall which does much
the same thing as the user-space daemon. Once again, this requires
modifications to pty-handling programmes.

The devfs solution allows a device driver to "tag" certain device
files so that when an unopened device is opened, the ownerships are
changed to the current euid and egid of the opening process, and the
protections are changed to the default registered by the driver. When
the device is closed ownership is set back to root and protections are
set back to read-write for everybody. No programme need be changed.
The devpts filesystem provides this auto-ownership feature for Unix98
ptys. It doesn't support old-style pty devices, nor does it have all
the other features of devfs.

Intelligent device management

Devfs implements a simple yet powerful protocol for communication with
a device management daemon (devfsd) which runs in user space. It is
possible to send a message (either synchronously or asynchronously) to
devfsd on any event, such as registration/unregistration of device
entries, opening and closing devices, looking up inodes, scanning
directories and more. This has many possibilities. Some of these are
already implemented. See:


http://www.atnf.csiro.au/~rgooch/linux/

Device entry registration events can be used by devfsd to change
permissions of newly-created device nodes. This is one mechanism to
control device permissions.

Device entry registration/unregistration events can be used to run
programmes or scripts. This can be used to provide automatic mounting
of filesystems when a new block device media is inserted into the
drive.

Asynchronous device open and close events can be used to implement
clever permissions management. For example, the default permissions on
/dev/dsp do not allow everybody to read from the device. This is
sensible, as you don't want some remote user recording what you say at
your console. However, the console user is also prevented from
recording. This behaviour is not desirable. With asynchronous device
open and close events, you can have devfsd run a programme or script
when console devices are opened to change the ownerships for *other*
device nodes (such as /dev/dsp). On closure, you can run a different
script to restore permissions. An advantage of this scheme over
modifying the C library tty handling is that this works even if your
programme crashes (how many times have you seen the utmp database with
lingering entries for non-existent logins?).

Synchronous device open events can be used to perform intelligent
device access protections. Before the device driver open() method is
called, the daemon must first validate the open attempt, by running an
external programme or script. This is far more flexible than access
control lists, as access can be determined on the basis of other
system conditions instead of just the UID and GID.

Inode lookup events can be used to authenticate module autoload
requests. Instead of using kmod directly, the event is sent to
devfsd which can implement an arbitrary authentication before loading
the module itself.

Inode lookup events can also be used to construct arbitrary
namespaces, without having to resort to populating devfs with symlinks
to devices that don't exist.

Speculative Device Scanning

Consider an application (like cdparanoia) that wants to find all
CD-ROM devices on the system (SCSI, IDE and other types), whether or
not their respective modules are loaded. The application must
speculatively open certain device nodes (such as /dev/sr0 for the SCSI
CD-ROMs) in order to make sure the module is loaded. This requires
that all Linux distributions follow the standard device naming scheme
(last time I looked RedHat did things differently). Devfs solves the
naming problem.

The same application also wants to see which devices are actually
available on the system. With the existing system it needs to read the
/dev directory and speculatively open each /dev/sr* device to
determine if the device exists or not. With a large /dev this is an
inefficient operation, especially if there are many /dev/sr* nodes. A
solution like scsidev could reduce the number of /dev/sr* entries (but
of course that also requires all that inefficient directory scanning).

With devfs, the application can open the /dev/sr directory
(which triggers the module autoloading if required), and proceed to
read /dev/sr. Since only the available devices will have
entries, there are no inefficencies in directory scanning or device
openings.

-----------------------------------------------------------------------------

Who else does it?

FreeBSD has a devfs implementation. Solaris and AIX each have a
pseudo-devfs (something akin to scsidev but for all devices, with some
unspecified kernel support). BeOS, Plan9 and QNX also have it. SGI's
IRIX 6.4 and above also have a device filesystem.

While we shouldn't just automatically do something because others do
it, we should not ignore the work of others either. FreeBSD has a lot
of competent people working on it, so their opinion should not be
blithely ignored.

-----------------------------------------------------------------------------


How it works

Registering device entries

For every entry (device node) in a devfs-based /dev a driver must call
devfs_register(). This adds the name of the device entry, the
file_operations structure pointer and a few other things to an
internal table. Device entries may be added and removed at any
time. When a device entry is registered, it automagically appears in
any mounted devfs'.

Inode lookup

When a lookup operation on an entry is performed and if there is no
driver information for that entry devfs will attempt to call
devfsd. If still no driver information can be found then a negative
dentry is yielded and the next stage operation will be called by the
VFS (such as create() or mknod() inode methods). If driver information
can be found, an inode is created (if one does not exist already) and
all is well.

Manually creating device nodes

The mknod() method allows you to create an ordinary named pipe in the
devfs, or you can create a character or block special inode if one
does not already exist. You may wish to create a character or block
special inode so that you can set permissions and ownership. Later, if
a device driver registers an entry with the same name, the
permissions, ownership and times are retained. This is how you can set
the protections on a device even before the driver is loaded. Once you
create an inode it appears in the directory listing.

Unregistering device entries

A device driver calls devfs_unregister() to unregister an entry.

Chroot() gaols

2.2.x kernels

The semantics of inode creation are different when devfs is mounted
with the "explicit" option. Now, when a device entry is registered, it
will not appear until you use mknod() to create the device. It doesn't
matter if you mknod() before or after the device is registered with
devfs_register(). The purpose of this behaviour is to support
chroot(2) gaols, where you want to mount a minimal devfs inside the
gaol. Only the devices you specifically want to be available (through
your mknod() setup) will be accessible.

2.4.x kernels

As of kernel 2.3.99, the VFS has had the ability to rebind parts of
the global filesystem namespace into another part of the namespace.
This now works even at the leaf-node level, which means that
individual files and device nodes may be bound into other parts of the
namespace. This is like making links, but better, because it works
across filesystems (unlike hard links) and works through chroot()
gaols (unlike symbolic links).

Because of these improvements to the VFS, the multi-mount capability
in devfs is no longer needed. The administrator may create a minimal
device tree inside a chroot(2) gaol by using VFS bindings. As this
provides most of the features of the devfs multi-mount capability, I
removed the multi-mount support code (after issuing an RFC). This
yielded code size reductions and simplifications.

If you want to construct a minimal chroot() gaol, the following
command should suffice:

mount --bind /dev/null /gaol/dev/null


Repeat for other device nodes you want to expose. Simple!

-----------------------------------------------------------------------------


Operational issues


Instructions for the impatient

Nobody likes reading documentation. People just want to get in there
and play. So this section tells you quickly the steps you need to take
to run with devfs mounted over /dev. Skip these steps and you will end
up with a nearly unbootable system. Subsequent sections describe the
issues in more detail, and discuss non-essential configuration
options.

Devfsd
OK, if you're reading this, I assume you want to play with
devfs. First you should ensure that /usr/src/linux contains a
recent kernel source tree. Then you need to compile devfsd, the device
management daemon, available at

http://www.atnf.csiro.au/~rgooch/linux/.
Because the kernel has a naming scheme
which is quite different from the old naming scheme, you need to
install devfsd so that software and configuration files that use the
old naming scheme will not break.

Compile and install devfsd. You will be provided with a default
configuration file /etc/devfsd.conf which will provide
compatibility symlinks for the old naming scheme. Don't change this
config file unless you know what you're doing. Even if you think you
do know what you're doing, don't change it until you've followed all
the steps below and booted a devfs-enabled system and verified that it
works.

Now edit your main system boot script so that devfsd is started at the
very beginning (before any filesystem
checks). /etc/rc.d/rc.sysinit is often the main boot script
on systems with SysV-style boot scripts. On systems with BSD-style
boot scripts it is often /etc/rc. Also check
/sbin/rc.

NOTE that the line you put into the boot
script should be exactly:

/sbin/devfsd /dev

DO NOT use some special daemon-launching
programme, otherwise the boot script may not wait for devfsd to finish
initialising.

System Libraries
There may still be some problems because of broken software making
assumptions about device names. In particular, some software does not
handle devices which are symbolic links. If you are running a libc 5
based system, install libc 5.4.44 (if you have libc 5.4.46, go back to
libc 5.4.44, which is actually correct). If you are running a glibc
based system, make sure you have glibc 2.1.3 or later.

/etc/securetty
PAM (Pluggable Authentication Modules) is supposed to be a flexible
mechanism for providing better user authentication and access to
services. Unfortunately, it's also fragile, complex and undocumented
(check out RedHat 6.1, and probably other distributions as well). PAM
has problems with symbolic links. Append the following lines to your
/etc/securetty file:

vc/1
vc/2
vc/3
vc/4
vc/5
vc/6
vc/7
vc/8

This will not weaken security. If you have a version of util-linux
earlier than 2.10.h, please upgrade to 2.10.h or later. If you
absolutely cannot upgrade, then also append the following lines to
your /etc/securetty file:

1
2
3
4
5
6
7
8

This may potentially weaken security by allowing root logins over the
network (a password is still required, though). However, since there
are problems with dealing with symlinks, I'm suspicious of the level
of security offered in any case.

XFree86
While not essential, it's probably a good idea to upgrade to XFree86
4.0, as patches went in to make it more devfs-friendly. If you don't,
you'll probably need to apply the following patch to
/etc/security/console.perms so that ordinary users can run
startx. Note that not all distributions have this file (e.g. Debian),
so if it's not present, don't worry about it.

--- /etc/security/console.perms.orig    Sat Apr 17 16:26:47 1999 
+++ /etc/security/console.perms Fri Feb 25 23:53:55 2000 
@@ -14,7 +14,7 @@ 
 # man 5 console.perms 

 # file classes -- these are regular expressions 
-<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9] 
+<console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9] 

 # device classes -- these are shell-style globs 
 <floppy>=/dev/fd[0-1]* 

If the patch does not apply, then change the line:

<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9]

with:

<console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9]


Disable devpts
I've had a report of devpts mounted on /dev/pts not working
correctly. Since devfs will also manage /dev/pts, there is no
need to mount devpts as well. You should either edit your
/etc/fstab so devpts is not mounted, or disable devpts from
your kernel configuration.

Unsupported drivers
Not all drivers have devfs support. If you depend on one of these
drivers, you will need to create a script or tarfile that you can use
at boot time to create device nodes as appropriate. There is a
section which describes this. Another
section lists the drivers which have
devfs support.

/dev/mouse

Many disributions configure /dev/mouse to be the mouse device
for XFree86 and GPM. I actually think this is a bad idea, because it
adds another level of indirection. When looking at a config file, if
you see /dev/mouse you're left wondering which mouse
is being referred to. Hence I recommend putting the actual mouse
device (for example /dev/psaux) into your
/etc/X11/XF86Config file (and similarly for the GPM
configuration file).

Alternatively, use the same technique used for unsupported drivers
described above.

The Kernel
Finally, you need to make sure devfs is compiled into your kernel. Set
CONFIG_EXPERIMENTAL=y, CONFIG_DEVFS_FS=y and CONFIG_DEVFS_MOUNT=y by
using favourite configuration tool (i.e. make config or
make xconfig) and then make clean and then recompile your kernel and 
modules. At boot, devfs will be mounted onto /dev.

If you encounter problems booting (for example if you forgot a
configuration step), you can pass devfs=nomount at the kernel
boot command line. This will prevent the kernel from mounting devfs at
boot time onto /dev.

In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting
devfs onto /dev is completely safe, and requires no
configuration changes. One exception to take note of is when
LABEL= directives are used in /etc/fstab. In this
case you will be unable to boot properly. This is because the
mount(8) programme uses /proc/partitions as part of
the volume label search process, and the device names it finds are not
available, because setting CONFIG_DEVFS_FS=y changes the names in
/proc/partitions, irrespective of whether devfs is mounted.

Now you've finished all the steps required. You're now ready to boot
your shiny new kernel. Enjoy.

Changing the configuration

OK, you've now booted a devfs-enabled system, and everything works.
Now you may feel like changing the configuration (common targets are
/etc/fstab and /etc/devfsd.conf). Since you have a
system that works, if you make any changes and it doesn't work, you
now know that you only have to restore your configuration files to the
default and it will work again.


Permissions persistence across reboots

If you don't use mknod(2) to create a device file, nor use chmod(2) or
chown(2) to change the ownerships/permissions, the inode ctime will
remain at 0 (the epoch, 12 am, 1-JAN-1970, GMT). Anything with a ctime
later than this has had it's ownership/permissions changed. Hence, a
simple script or programme may be used to tar up all changed inodes,
prior to shutdown. Although effective, many consider this approach a
kludge.

A much better approach is to use devfsd to save and restore
permissions. It may be configured to record changes in permissions and
will save them in a database (in fact a directory tree), and restore
these upon boot. This is an efficient method and results in immediate
saving of current permissions (unlike the tar approach, which saves
permissions at some unspecified future time).

The default configuration file supplied with devfsd has config entries
which you may uncomment to enable persistence management.

If you decide to use the tar approach anyway, be aware that tar will
first unlink(2) an inode before creating a new device node. The
unlink(2) has the effect of breaking the connection between a devfs
entry and the device driver. If you use the "devfs=only" boot option,
you lose access to the device driver, requiring you to reload the
module. I consider this a bug in tar (there is no real need to
unlink(2) the inode first).

Alternatively, you can use devfsd to provide more sophisticated
management of device permissions. You can use devfsd to store
permissions for whole groups of devices with a single configuration
entry, rather than the conventional single entry per device entry.

Permissions database stored in mounted-over /dev

If you wish to save and restore your device permissions into the
disc-based /dev while still mounting devfs onto /dev
you may do so. This requires a 2.4.x kernel (in fact, 2.3.99 or
later), which has the VFS binding facility. You need to do the
following to set this up:



make sure the kernel does not mount devfs at boot time


make sure you have a correct /dev/console entry in your
root file-system (where your disc-based /dev lives)

create the /dev-state directory


add the following lines near the very beginning of your boot
scripts:

mount --bind /dev /dev-state
mount -t devfs none /dev
devfsd /dev




add the following lines to your /etc/devfsd.conf file:

REGISTER	^pt[sy]		IGNORE
CREATE		^pt[sy]		IGNORE
CHANGE		^pt[sy]		IGNORE
DELETE		^pt[sy]		IGNORE
REGISTER	.*		COPY	/dev-state/$devname $devpath
CREATE		.*		COPY	$devpath /dev-state/$devname
CHANGE		.*		COPY	$devpath /dev-state/$devname
DELETE		.*		CFUNCTION GLOBAL unlink /dev-state/$devname
RESTORE		/dev-state

Note that the sample devfsd.conf file contains these lines,
as well as other sample configurations you may find useful. See the
devfsd distribution


reboot.




Permissions database stored in normal directory

If you are using an older kernel which doesn't support VFS binding,
then you won't be able to have the permissions database in a
mounted-over /dev. However, you can still use a regular
directory to store the database. The sample /etc/devfsd.conf
file above may still be used. You will need to create the
/dev-state directory prior to installing devfsd. If you have
old permissions in /dev, then just copy (or move) the device
nodes over to the new directory.

Which method is better?

The best method is to have the permissions database stored in the
mounted-over /dev. This is because you will not need to copy
device nodes over to /dev-state, and because it allows you to
switch between devfs and non-devfs kernels, without requiring you to
copy permissions between /dev-state (for devfs) and
/dev (for non-devfs).


Dealing with drivers without devfs support

Currently, not all device drivers in the kernel have been modified to
use devfs. Device drivers which do not yet have devfs support will not
automagically appear in devfs. The simplest way to create device nodes
for these drivers is to unpack a tarfile containing the required
device nodes. You can do this in your boot scripts. All your drivers
will now work as before.

Hopefully for most people devfs will have enough support so that they
can mount devfs directly over /dev without losing most functionality
(i.e. losing access to various devices). As of 22-JAN-1998 (devfs
patch version 10) I am now running this way. All the devices I have
are available in devfs, so I don't lose anything.

WARNING: if your configuration requires the old-style device names
(i.e. /dev/hda1 or /dev/sda1), you must install devfsd and configure
it to maintain compatibility entries. It is almost certain that you
will require this. Note that the kernel creates a compatibility entry
for the root device, so you don't need initrd.

Note that you no longer need to mount devpts if you use Unix98 PTYs,
as devfs can manage /dev/pts itself. This saves you some RAM, as you
don't need to compile and install devpts. Note that some versions of
glibc have a bug with Unix98 pty handling on devfs systems. Contact
the glibc maintainers for a fix. Glibc 2.1.3 has the fix.

Note also that apart from editing /etc/fstab, other things will need
to be changed if you *don't* install devfsd. Some software (like the X
server) hard-wire device names in their source. It really is much
easier to install devfsd so that compatibility entries are created.
You can then slowly migrate your system to using the new device names
(for example, by starting with /etc/fstab), and then limiting the
compatibility entries that devfsd creates.

IF YOU CONFIGURE TO MOUNT DEVFS AT BOOT, MAKE SURE YOU INSTALL DEVFSD
BEFORE YOU BOOT A DEVFS-ENABLED KERNEL!

Now that devfs has gone into the 2.3.46 kernel, I'm getting a lot of
reports back. Many of these are because people are trying to run
without devfsd, and hence some things break. Please just run devfsd if
things break. I want to concentrate on real bugs rather than
misconfiguration problems at the moment. If people are willing to fix
bugs/false assumptions in other code (i.e. glibc, X server) and submit
that to the respective maintainers, that would be great.


All the way with Devfs

The devfs kernel patch creates a rationalised device tree. As stated
above, if you want to keep using the old /dev naming scheme,
you just need to configure devfsd appopriately (see the man
page). People who prefer the old names can ignore this section. For
those of us who like the rationalised names and an uncluttered
/dev, read on.

If you don't run devfsd, or don't enable compatibility entry
management, then you will have to configure your system to use the new
names. For example, you will then need to edit your
/etc/fstab to use the new disc naming scheme. If you want to
be able to boot non-devfs kernels, you will need compatibility
symlinks in the underlying disc-based /dev pointing back to
the old-style names for when you boot a kernel without devfs.

You can selectively decide which devices you want compatibility
entries for. For example, you may only want compatibility entries for
BSD pseudo-terminal devices (otherwise you'll have to patch you C
library or use Unix98 ptys instead). It's just a matter of putting in
the correct regular expression into /dev/devfsd.conf.

There are other choices of naming schemes that you may prefer. For
example, I don't use the kernel-supplied
names, because they are too verbose. A common misconception is
that the kernel-supplied names are meant to be used directly in
configuration files. This is not the case. They are designed to
reflect the layout of the devices attached and to provide easy
classification.

If you like the kernel-supplied names, that's fine. If you don't then
you should be using devfsd to construct a namespace more to your
liking. Devfsd has built-in code to construct a
namespace that is both logical and easy to
manage. In essence, it creates a convenient abbreviation of the
kernel-supplied namespace.

You are of course free to build your own namespace. Devfsd has all the
infrastructure required to make this easy for you. All you need do is
write a script. You can even write some C code and devfsd can load the
shared object as a callable extension.


Other Issues

The init programme
Another thing to take note of is whether your init programme
creates a Unix socket /dev/telinit. Some versions of init
create /dev/telinit so that the telinit programme can
communicate with the init process. If you have such a system you need
to make sure that devfs is mounted over /dev *before* init
starts. In other words, you can't leave the mounting of devfs to
/etc/rc, since this is executed after init. Other
versions of init require a named pipe /dev/initctl
which must exist *before* init starts. Once again, you need to
mount devfs and then create the named pipe *before* init
starts.

The default behaviour now is not to mount devfs onto /dev at
boot time for 2.3.x and later kernels. You can correct this with the
"devfs=mount" boot option. This solves any problems with init,
and also prevents the dreaded:

Cannot open initial console

message. For 2.2.x kernels where you need to apply the devfs patch,
the default is to mount.

If you have automatic mounting of devfs onto /dev then you
may need to create /dev/initctl in your boot scripts. The
following lines should suffice:

mknod /dev/initctl p
kill -SIGUSR1 1       # tell init that /dev/initctl now exists

Alternatively, if you don't want the kernel to mount devfs onto
/dev then you could use the following procedure is a
guideline for how to get around /dev/initctl problems:

# cd /sbin
# mv init init.real
# cat > init
#! /bin/sh
mount -n -t devfs none /dev
mknod /dev/initctl p
exec /sbin/init.real $*
[control-D]
# chmod a+x init

Note that newer versions of init create /dev/initctl
automatically, so you don't have to worry about this.

Module autoloading
You will need to configure devfsd to enable module
autoloading. The following lines should be placed in your
/etc/devfsd.conf file:

LOOKUP	.*		MODLOAD


As of devfsd-v1.3.10, a generic /etc/modules.devfs
configuration file is installed, which is used by the MODLOAD
action. This should be sufficient for most configurations. If you
require further configuration, edit your /etc/modules.conf
file. The way module autoloading work with devfs is:


a process attempts to lookup a device node (e.g. /dev/fred)


if that device node does not exist, the full pathname is passed to
devfsd as a string


devfsd will pass the string to the modprobe programme (provided the
configuration line shown above is present), and specifies that
/etc/modules.devfs is the configuration file


/etc/modules.devfs includes /etc/modules.conf to
access local configurations

modprobe will search it's configuration files, looking for an alias
that translates the pathname into a module name


the translated pathname is then used to load the module.


If you wanted a lookup of /dev/fred to load the
mymod module, you would require the following configuration
line in /etc/modules.conf:

alias    /dev/fred    mymod

The /etc/modules.devfs configuration file provides many such
aliases for standard device names. If you look closely at this file,
you will note that some modules require multiple alias configuration
lines. This is required to support module autoloading for old and new
device names.

Mounting root off a devfs device
If you wish to mount root off a devfs device when you pass the
"devfs=only" boot option, then you need to pass in the
"root=<device>" option to the kernel when booting. If you use
LILO, then you must have this in lilo.conf:

append = "root=<device>"

Surprised? Yep, so was I. It turns out if you have (as most people
do):

root = <device>


then LILO will determine the device number of <device> and will
write that device number into a special place in the kernel image
before starting the kernel, and the kernel will use that device number
to mount the root filesystem. So, using the "append" variety ensures
that LILO passes the root filesystem device as a string, which devfs
can then use.

Note that this isn't an issue if you don't pass "devfs=only".

TTY issues
The ttyname(3) function in some versions of the C library makes
false assumptions about device entries which are symbolic links.  The
tty(1) programme is one that depends on this function.  I've
written a patch to libc 5.4.43 which fixes this. This has been
included in libc 5.4.44 and a similar fix is in glibc 2.1.3.


Kernel Naming Scheme

The kernel provides a default naming scheme. This scheme is designed
to make it easy to search for specific devices or device types, and to
view the available devices. Some device types (such as hard discs),
have a directory of entries, making it easy to see what devices of
that class are available. Often, the entries are symbolic links into a
directory tree that reflects the topology of available devices. The
topological tree is useful for finding how your devices are arranged.

Below is a list of the naming schemes for the most common drivers. A
list of reserved device names is
available for reference. Please send email to
rgooch@atnf.csiro.au to obtain an allocation. Please be
patient (the maintainer is busy). An alternative name may be allocated
instead of the requested name, at the discretion of the maintainer.

Disc Devices

All discs, whether SCSI, IDE or whatever, are placed under the
/dev/discs hierarchy:

	/dev/discs/disc0	first disc
	/dev/discs/disc1	second disc


Each of these entries is a symbolic link to the directory for that
device. The device directory contains:

	disc	for the whole disc
	part*	for individual partitions


CD-ROM Devices

All CD-ROMs, whether SCSI, IDE or whatever, are placed under the
/dev/cdroms hierarchy:

	/dev/cdroms/cdrom0	first CD-ROM
	/dev/cdroms/cdrom1	second CD-ROM


Each of these entries is a symbolic link to the real device entry for
that device.

Tape Devices

All tapes, whether SCSI, IDE or whatever, are placed under the
/dev/tapes hierarchy:

	/dev/tapes/tape0	first tape
	/dev/tapes/tape1	second tape


Each of these entries is a symbolic link to the directory for that
device. The device directory contains:

	mt			for mode 0
	mtl			for mode 1
	mtm			for mode 2
	mta			for mode 3
	mtn			for mode 0, no rewind
	mtln			for mode 1, no rewind
	mtmn			for mode 2, no rewind
	mtan			for mode 3, no rewind


SCSI Devices

To uniquely identify any SCSI device requires the following
information:

  controller	(host adapter)
  bus		(SCSI channel)
  target	(SCSI ID)
  unit		(Logical Unit Number)


All SCSI devices are placed under /dev/scsi (assuming devfs
is mounted on /dev). Hence, a SCSI device with the following
parameters: c=1,b=2,t=3,u=4 would appear as:

	/dev/scsi/host1/bus2/target3/lun4	device directory


Inside this directory, a number of device entries may be created,
depending on which SCSI device-type drivers were installed.

See the section on the disc naming scheme to see what entries the SCSI
disc driver creates.

See the section on the tape naming scheme to see what entries the SCSI
tape driver creates.

The SCSI CD-ROM driver creates:

	cd


The SCSI generic driver creates:

	generic


IDE Devices

To uniquely identify any IDE device requires the following
information:

  controller
  bus		(aka. primary/secondary)
  target	(aka. master/slave)
  unit


All IDE devices are placed under /dev/ide, and uses a similar
naming scheme to the SCSI subsystem.

XT Hard Discs

All XT discs are placed under /dev/xd. The first XT disc has
the directory /dev/xd/disc0.

TTY devices

The tty devices now appear as:

  New name                   Old-name                   Device Type
  --------                   --------                   -----------
  /dev/tts/{0,1,...}         /dev/ttyS{0,1,...}         Serial ports
  /dev/cua/{0,1,...}         /dev/cua{0,1,...}          Call out devices
  /dev/vc/0                  /dev/tty                   Current virtual console
  /dev/vc/{1,2,...}          /dev/tty{1...63}           Virtual consoles
  /dev/vcc/{0,1,...}         /dev/vcs{1...63}           Virtual consoles
  /dev/pty/m{0,1,...}        /dev/ptyp??                PTY masters
  /dev/pty/s{0,1,...}        /dev/ttyp??                PTY slaves


RAMDISCS

The RAMDISCS are placed in their own directory, and are named thus:

  /dev/rd/{0,1,2,...}


Meta Devices

The meta devices are placed in their own directory, and are named
thus:

  /dev/md/{0,1,2,...}


Floppy discs

Floppy discs are placed in the /dev/floppy directory.

Loop devices

Loop devices are placed in the /dev/loop directory.

Sound devices

Sound devices are placed in the /dev/sound directory
(audio, sequencer, ...).


Devfsd Naming Scheme

Devfsd provides a naming scheme which is a convenient abbreviation of
the kernel-supplied namespace. In some
cases, the kernel-supplied naming scheme is quite convenient, so
devfsd does not provide another naming scheme. The convenience names
that devfsd creates are in fact the same names as the original devfs
kernel patch created (before Linus mandated the Big Name
Change). These are referred to as "new compatibility entries".

In order to configure devfsd to create these convenience names, the
following lines should be placed in your /etc/devfsd.conf:

REGISTER	.*		MKNEWCOMPAT
UNREGISTER	.*		RMNEWCOMPAT

This will cause devfsd to create (and destroy) symbolic links which
point to the kernel-supplied names.

SCSI Hard Discs

All SCSI discs are placed under /dev/sd (assuming devfs is
mounted on /dev). Hence, a SCSI disc with the following
parameters: c=1,b=2,t=3,u=4 would appear as:

	/dev/sd/c1b2t3u4	for the whole disc
	/dev/sd/c1b2t3u4p5	for the 5th partition
	/dev/sd/c1b2t3u4p5s6	for the 6th slice in the 5th partition


SCSI Tapes

All SCSI tapes are placed under /dev/st. A similar naming
scheme is used as for SCSI discs. A SCSI tape with the
parameters:c=1,b=2,t=3,u=4 would appear as:

	/dev/st/c1b2t3u4m0	for mode 0
	/dev/st/c1b2t3u4m1	for mode 1
	/dev/st/c1b2t3u4m2	for mode 2
	/dev/st/c1b2t3u4m3	for mode 3
	/dev/st/c1b2t3u4m0n	for mode 0, no rewind
	/dev/st/c1b2t3u4m1n	for mode 1, no rewind
	/dev/st/c1b2t3u4m2n	for mode 2, no rewind
	/dev/st/c1b2t3u4m3n	for mode 3, no rewind


SCSI CD-ROMs

All SCSI CD-ROMs are placed under /dev/sr. A similar naming
scheme is used as for SCSI discs. A SCSI CD-ROM with the
parameters:c=1,b=2,t=3,u=4 would appear as:

	/dev/sr/c1b2t3u4


SCSI Generic Devices

The generic (aka. raw) interface for all SCSI devices are placed under
/dev/sg. A similar naming scheme is used as for SCSI discs. A
SCSI generic device with the parameters:c=1,b=2,t=3,u=4 would appear
as:

	/dev/sg/c1b2t3u4


IDE Hard Discs

All IDE discs are placed under /dev/ide/hd, using a similar
convention to SCSI discs. The following mappings exist between the new
and the old names:

	/dev/hda	/dev/ide/hd/c0b0t0u0
	/dev/hdb	/dev/ide/hd/c0b0t1u0
	/dev/hdc	/dev/ide/hd/c0b1t0u0
	/dev/hdd	/dev/ide/hd/c0b1t1u0


IDE Tapes

A similar naming scheme is used as for IDE discs. The entries will
appear in the /dev/ide/mt directory.

IDE CD-ROM

A similar naming scheme is used as for IDE discs. The entries will
appear in the /dev/ide/cd directory.

IDE Floppies

A similar naming scheme is used as for IDE discs. The entries will
appear in the /dev/ide/fd directory.

XT Hard Discs

All XT discs are placed under /dev/xd. The first XT disc
would appear as /dev/xd/c0t0.


Old Compatibility Names

The old compatibility names are the legacy device names, such as
/dev/hda, /dev/sda, /dev/rtc and so on.
Devfsd can be configured to create compatibility symlinks so that you
may continue to use the old names in your configuration files and so
that old applications will continue to function correctly.

In order to configure devfsd to create these legacy names, the
following lines should be placed in your /etc/devfsd.conf:

REGISTER	.*		MKOLDCOMPAT
UNREGISTER	.*		RMOLDCOMPAT

This will cause devfsd to create (and destroy) symbolic links which
point to the kernel-supplied names.


-----------------------------------------------------------------------------


Device drivers currently ported

- All miscellaneous character devices support devfs (this is done
  transparently through misc_register())

- SCSI discs and generic hard discs

- Character memory devices (null, zero, full and so on)
  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>

- Loop devices (/dev/loop?)
 
- TTY devices (console, serial ports, terminals and pseudo-terminals)
  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>

- SCSI tapes (/dev/scsi and /dev/tapes)

- SCSI CD-ROMs (/dev/scsi and /dev/cdroms)

- SCSI generic devices (/dev/scsi)

- RAMDISCS (/dev/ram?)

- Meta Devices (/dev/md*)

- Floppy discs (/dev/floppy)

- Parallel port printers (/dev/printers)

- Sound devices (/dev/sound)
  Thanks to Eric Dumas <dumas@linux.eu.org> and
  C. Scott Ananian <cananian@alumni.princeton.edu>

- Joysticks (/dev/joysticks)

- Sparc keyboard (/dev/kbd)

- DSP56001 digital signal processor (/dev/dsp56k)

- Apple Desktop Bus (/dev/adb)

- Coda network file system (/dev/cfs*)

- Virtual console capture devices (/dev/vcc)
  Thanks to Dennis Hou <smilax@mindmeld.yi.org>

- Frame buffer devices (/dev/fb)

- Video capture devices (/dev/v4l)


-----------------------------------------------------------------------------


Allocation of Device Numbers

Devfs allows you to write a driver which doesn't need to allocate a
device number (major&minor numbers) for the internal operation of the
kernel. However, there are a number of userspace programmes that use
the device number as a unique handle for a device. An example is the
find programme, which uses device numbers to determine whether
an inode is on a different filesystem than another inode. The device
number used is the one for the block device which a filesystem is
using. To preserve compatibility with userspace programmes, block
devices using devfs need to have unique device numbers allocated to
them. Furthermore, POSIX specifies device numbers, so some kind of
device number needs to be presented to userspace.

The simplest option (especially when porting drivers to devfs) is to
keep using the old major and minor numbers. Devfs will take whatever
values are given for major&minor and pass them onto userspace.

This device number is a 16 bit number, so this leaves plenty of space
for large numbers of discs and partitions. This scheme can also be
used for character devices, in particular the tty devices, which are
currently limited to 256 pseudo-ttys (this limits the total number of
simultaneous xterms and remote logins).  Note that the device number
is limited to the range 36864-61439 (majors 144-239), in order to
avoid any possible conflicts with existing official allocations.

Please note that using dynamically allocated block device numbers may
break the NFS daemons (both user and kernel mode), which expect dev_t
for a given device to be constant over the lifetime of remote mounts.

A final note on this scheme: since it doesn't increase the size of
device numbers, there are no compatibility issues with userspace.

-----------------------------------------------------------------------------


Questions and Answers


Making things work
Alternatives to devfs
What I don't like about devfs
How to report bugs
Strange kernel messages
Compilation problems with devfsd



Making things work

Here are some common questions and answers.



Devfsd doesn't start

Make sure you have compiled and installed devfsd
Make sure devfsd is being started from your boot
scripts
Make sure you have configured your kernel to enable devfs (see
below)
Make sure devfs is mounted (see below)


Devfsd is not managing all my permissions

Make sure you are capturing the appropriate events. For example,
device entries created by the kernel generate REGISTER events,
but those created by devfsd generate CREATE events.


Devfsd is not capturing all REGISTER events

See the previous entry: you may need to capture CREATE events.


X will not start

Make sure you followed the steps 
outlined above.


Why don't my network devices appear in devfs?

This is not a bug. Network devices have their own, completely separate
namespace. They are accessed via socket(2) and
setsockopt(2) calls, and thus require no device nodes. I have
raised the possibilty of moving network devices into the device
namespace, but have had no response.


How can I test if I have devfs compiled into my kernel?

All filesystems built-in or currently loaded are listed in
/proc/filesystems. If you see a devfs entry, then
you know that devfs was compiled into your kernel. If you have
correctly configured and rebuilt your kernel, then devfs will be
built-in. If you think you've configured it in, but
/proc/filesystems doesn't show it, you've made a mistake.
Common mistakes include:

Using a 2.2.x kernel without applying the devfs patch (if you
don't know how to patch your kernel, use 2.4.x instead, don't bother
asking me how to patch)
Forgetting to set CONFIG_EXPERIMENTAL=y
Forgetting to set CONFIG_DEVFS_FS=y
Forgetting to set CONFIG_DEVFS_MOUNT=y (if you want devfs
to be automatically mounted at boot)
Editing your .config manually, instead of using make
config or make xconfig
Forgetting to run make dep; make clean after changing the
configuration and before compiling
Forgetting to compile your kernel and modules
Forgetting to install your kernel
Forgetting to install your modules

Please check twice that you've done all these steps before sending in
a bug report.



How can I test if devfs is mounted on /dev?

The device filesystem will always create an entry called
".devfsd", which is used to communicate with the daemon. Even
if the daemon is not running, this entry will exist. Testing for the
existence of this entry is the approved method of determining if devfs
is mounted or not. Note that the type of entry (i.e. regular file,
character device, named pipe, etc.) may change without notice. Only
the existence of the entry should be relied upon.


When I start devfsd, I see the error:
Error opening file: ".devfsd"   No such file or directory?

This means that devfs is not mounted. Make sure you have devfs mounted.


How do I mount devfs?

First make sure you have devfs compiled into your kernel (see
above). Then you will either need to:

set CONFIG_DEVFS_MOUNT=y in your kernel config
pass devfs=mount to your boot loader
mount devfs manually in your boot scripts with:
mount -t none devfs /dev



Mount by volume LABEL=<label> doesn't work with
devfs

Most probably you are not mounting devfs onto /dev. What
happens is that if your kernel config has CONFIG_DEVFS_FS=y
then the contents of /proc/partitions will have the devfs
names (such as scsi/host0/bus0/target0/lun0/part1). The
contents of /proc/partitions are used by mount(8) when
mounting by volume label. If devfs is not mounted on /dev,
then mount(8) will fail to find devices. The solution is to
make sure that devfs is mounted on /dev. See above for how to
do that.


I have extra or incorrect entries in /dev

You may have stale entries in your dev-state area. Check for a
RESTORE configuration line in your devfsd configuration
(typically /etc/devfsd.conf). If you have this line, check
the contents of the specified directory for stale entries. Remove
any entries which are incorrect, then reboot.


I get "Unable to open initial console" messages at boot

This usually happens when you don't have devfs automounted onto
/dev at boot time, and there is no valid
/dev/console entry on your root file-system. Create a valid
/dev/console device node.





Alternatives to devfs

I've attempted to collate all the anti-devfs proposals and explain
their limitations. Under construction.


Why not just pass device create/remove events to a daemon?

Here the suggestion is to develop an API in the kernel so that devices
can register create and remove events, and a daemon listens for those
events. The daemon would then populate/depopulate /dev (which
resides on disc).

This has several limitations:


it only works for modules loaded and unloaded (or devices inserted
and removed) after the kernel has finished booting. Without a database
of events, there is no way the daemon could fully populate
/dev


if you add a database to this scheme, the question is then how to
present that database to user-space. If you make it a list of strings
with embedded event codes which are passed through a pipe to the
daemon, then this is only of use to the daemon. I would argue that the
natural way to present this data is via a filesystem (since many of
the events will be of a hierarchical nature), such as devfs.
Presenting the data as a filesystem makes it easy for the user to see
what is available and also makes it easy to write scripts to scan the
"database"


the tight binding between device nodes and drivers is no longer
possible (requiring the otherwise perfectly avoidable
table lookups)


you cannot catch inode lookup events on /dev which means
that module autoloading requires device nodes to be created. This is a
problem, particularly for drivers where only a few inodes are created
from a potentially large set


this technique can't be used when the root FS is mounted
read-only




Just implement a better scsidev

This suggestion involves taking the scsidev programme and
extending it to scan for all devices, not just SCSI devices. The
scsidev programme works by scanning /proc/scsi

Problems:


the kernel does not currently provide a list of all devices
available. Not all drivers register entries in /proc or
generate kernel messages


there is no uniform mechanism to register devices other than the
devfs API


implementing such an API is then the same as the
proposal above




Put /dev on a ramdisc

This suggestion involves creating a ramdisc and populating it with
device nodes and then mounting it over /dev.

Problems:



this doesn't help when mounting the root filesystem, since you
still need a device node to do that


if you want to use this technique for the root device node as
well, you need to use initrd. This complicates the booting sequence
and makes it significantly harder to administer and configure. The
initrd is essentially opaque, robbing the system administrator of easy
configuration


insufficient information is available to correctly populate the
ramdisc. So we come back to the
proposal above to "solve" this


a ramdisc-based solution would take more kernel memory, since the
backing store would be (at best) normal VFS inodes and dentries, which
take 284 bytes and 112 bytes, respectively, for each entry. Compare
that to 72 bytes for devfs




Do nothing: there's no problem

Sometimes people can be heard to claim that the existing scheme is
fine. This is what they're ignoring:


device number size (8 bits each for major and minor) is a real
limitation, and must be fixed somehow. Systems with large numbers of
SCSI devices, for example, will continue to consume the remaining
unallocated major numbers. USB will also need to push beyond the 8 bit
minor limitation


simply increasing the device number size is insufficient. Apart
from causing a lot of pain, it doesn't solve the management issues
of a /dev with thousands or more device nodes


ignoring the problem of a huge /dev will not make it go
away, and dismisses the legitimacy of a large number of people who
want a dynamic /dev


the standard response then becomes: "write a device management
daemon", which brings us back to the
proposal above




What I don't like about devfs

Here are some common complaints about devfs, and some suggestions and
solutions that may make it more palatable for you. I can't please
everybody, but I do try :-)

I hate the naming scheme

First, remember that no naming scheme will please everybody. You hate
the scheme, others love it. Who's to say who's right and who's wrong?
Ultimately, the person who writes the code gets to choose, and what
exists now is a combination of the choices made by the
devfs author and the
kernel maintainer (Linus).

However, not all is lost. If you want to create your own naming
scheme, it is a simple matter to write a standalone script, hack
devfsd, or write a script called by devfsd. You can create whatever
naming scheme you like.

Further, if you want to remove all traces of the devfs naming scheme
from /dev, you can mount devfs elsewhere (say
/devfs) and populate /dev with links into
/devfs. This population can be automated using devfsd if you
wish.

You can even use the VFS binding facility to make the links, rather
than using symbolic links. This way, you don't even have to see the
"destination" of these symbolic links.

Devfs puts policy into the kernel

There's already policy in the kernel. Device numbers are in fact
policy (why should the kernel dictate what device numbers I use?).
Face it, some policy has to be in the kernel. The real difference
between device names as policy and device numbers as policy is that
no one will use device numbers directly, because device
numbers are devoid of meaning to humans and are ugly. At least with
the devfs device names, (even though you can add your own naming
scheme) some people will use the devfs-supplied names directly. This
offends some people :-)

Devfs is bloatware

This is not even remotely true. As shown above,
both code and data size are quite modest.


How to report bugs

If you have (or think you have) a bug with devfs, please follow the
steps below:



make sure you have enabled debugging output when configuring your
kernel. You will need to set (at least) the following config options:

CONFIG_DEVFS_DEBUG=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SLAB=y



please make sure you have the latest devfs patches applied. The
latest kernel version might not have the latest devfs patches applied
yet (Linus is very busy)


save a copy of your complete kernel logs (preferably by
using the dmesg programme) for later inclusion in your bug
report. You may need to use the -s switch to increase the
internal buffer size so you can capture all the boot messages.
Don't edit or trim the dmesg output




try booting with devfs=dall passed to the kernel boot
command line (read the documentation on your bootloader on how to do
this), and save the result to a file. This may be quite verbose, and
it may overflow the messages buffer, but try to get as much of it as
you can


if you get an Oops, run ksymoops to decode it so that the
names of the offending functions are provided. A non-decoded Oops is
pretty useless


send a copy of your devfsd configuration file(s)

send the bug report to me first.
Don't expect that I will see it if you post it to the linux-kernel
mailing list. Include all the information listed above, plus
anything else that you think might be relevant. Put the string
devfs somewhere in the subject line, so my mail filters mark
it as urgent




Here is a general guide on how to ask questions in a way that greatly
improves your chances of getting a reply:

http://www.tuxedo.org/~esr/faqs/smart-questions.html. If you have
a bug to report, you should also read

http://www.chiark.greenend.org.uk/~sgtatham/bugs.html.


Strange kernel messages

You may see devfs-related messages in your kernel logs. Below are some
messages and what they mean (and what you should do about them, if
anything).



devfs_register(fred): could not append to parent, err: -17

You need to check what the error code means, but usually 17 means
EEXIST. This means that a driver attempted to create an entry
fred in a directory, but there already was an entry with that
name. This is often caused by flawed boot scripts which untar a bunch
of inodes into /dev, as a way to restore permissions. This
message is harmless, as the device nodes will still
provide access to the driver (unless you use the devfs=only
boot option, which is only for dedicated souls:-). If you want to get
rid of these annoying messages, upgrade to devfsd-v1.3.20 and use the
recommended RESTORE directive to restore permissions.


devfs_mk_dir(bill): using old entry in dir: c1808724 ""

This is similar to the message above, except that a driver attempted
to create a directory named bill, and the parent directory
has an entry with the same name. In this case, to ensure that drivers
continue to work properly, the old entry is re-used and given to the
driver. In 2.5 kernels, the driver is given a NULL entry, and thus,
under rare circumstances, may not create the require device nodes.
The solution is the same as above.





Compilation problems with devfsd

Usually, you can compile devfsd just by typing in
make in the source directory, followed by a make
install (as root). Sometimes, you may have problems, particularly
on broken configurations.



error messages relating to DEVFSD_NOTIFY_DELETE

This happened because you have an ancient set of kernel headers
installed in /usr/include/linux or /usr/src/linux.
Install kernel 2.4.10 or later. You may need to pass the
KERNEL_DIR variable to make (if you did not install
the new kernel sources as /usr/src/linux), or you may copy
the devfs_fs.h file in the kernel source tree into
/usr/include/linux.




-----------------------------------------------------------------------------


Other resources



Douglas Gilbert has written a useful document at

http://www.torque.net/sg/devfs_scsi.html which
explores the SCSI subsystem and how it interacts with devfs


Douglas Gilbert has written another useful document at

http://www.torque.net/scsi/SCSI-2.4-HOWTO/ which
discusses the Linux SCSI subsystem in 2.4.


Johannes Erdfelt has started a discussion paper on Linux and
hot-swap devices, describing what the requirements are for a scalable
solution and how and why he's used devfs+devfsd. Note that this is an
early draft only, available in plain text form at:

http://johannes.erdfelt.com/hotswap.txt.
Johannes has promised a HTML version will follow.


I presented an invited 
paper
at the

2nd Annual Storage Management Workshop held in Miamia, Florida,
U.S.A. in October 2000.




-----------------------------------------------------------------------------


Translations of this document

This document has been translated into other languages.




The document master (in English) by rgooch@atnf.csiro.au is
available at

http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html



A Korean translation by viatoris@nownuri.net is available at

http://your.destiny.pe.kr/devfs/devfs.html




-----------------------------------------------------------------------------
Most flags courtesy of ITA's 
Flags of All Countries
used with permission.