Java剪贴板：在Linux上粘贴Firefox上的HTML

I have a strange problem when pasting HTML from Firefox into a Java6 app (only!) on Linux. Here is a minimal example:

在Linux上将HTML从Firefox粘贴到Java6应用程序（仅限！）时，我遇到了一个奇怪的问题。这是一个最小的例子：

import java.awt.Toolkit;
import java.awt.datatransfer.Clipboard;
import java.awt.datatransfer.DataFlavor;
import java.awt.datatransfer.Transferable;
import java.io.Reader;
import java.nio.ByteBuffer;

class ClipboardPrinter {
    public static void main( String args[] ) throws Exception
    {
        Clipboard systemClipboard = Toolkit.getDefaultToolkit()
                .getSystemClipboard();
        Transferable transferData = systemClipboard.getContents(null);
        if (transferData == null) {
            System.out.println("no content");
            return;
        }

//      final DataFlavor htmlFlavorString = new DataFlavor("text/html;class=java.lang.String");
//      String html = (String)transferData.getTransferData(htmlFlavorString);
//      System.out.println("html = '" + html + "'");

        final DataFlavor htmlFlavor = new DataFlavor("text/html;class=java.nio.ByteBuffer;charset=US-ASCII");
        if (!transferData.isDataFlavorSupported(htmlFlavor)) {
            System.out.println("no text/html reader content");
            return;
        }

        ByteBuffer bb = (ByteBuffer)transferData.getTransferData(htmlFlavor);
        byte[] bytes = bb.array();
        for (byte b: bytes)
        {
            System.out.format("%02x", b);
        }
        System.out.println();
        final int cutoff = 2;
        byte[] bytes2 = new byte[bytes.length - cutoff];
        for (int i = cutoff; i < bytes.length; i++)
            bytes2[i-cutoff] = bytes[i];
        final String htmlContent = new String(bytes2, "UTF-16LE");


        System.out.println("htmlContent = '" + htmlContent + "'");
    }
}

First I tried to use new DataFlavor("text/html;class=java.lang.String"), (code commented out in above snippet), but this results in an unusable String with 2 chars with value 65533 at the beginning (and it does not help to cut off those two characters).

首先，我尝试使用新的DataFlavor（“text / html; class = java.lang.String”），（上面代码片段中注释掉的代码），但是这会产生一个不可用的字符串，其中包含2个字符，其值为65533（并且切断这两个字符无济于事。

Next I used a ByteBuffer data flavor with charset=US-ASCII (I used ASCII on purpose!): charset=UTF-16LE (or UTF-16 or UTF-16BE) does not work at all. With the above charset=US-ASCII solution (along with new String(bytes2, "UTF-16LE")), 7bit characters work (but e.g. umlauts don't work, a '?' gets printed instead).

接下来我使用了charset = US-ASCII的ByteBuffer数据风格（我故意使用ASCII！）：charset = UTF-16LE（或UTF-16或UTF-16BE）根本不起作用。使用上面的charset = US-ASCII解决方案（以及新的字符串（bytes2，“UTF-16LE”）），7位字符可以工作（但是例如变音符号不起作用，而是打印'？'）。

I cut off two bytes because there seem to be two boms at the beginning (not sure, could be something else)?

我切断了两个字节，因为开头似乎有两个boms（不确定，可能是别的东西）？

I get a similar result with a data flavor with charset=UTF-8 and cutoff=6 (two three-byte "replacement characters" 0xEFBFBD at the beginning and umlaut encoded as two wrong characters). In both cases I used new String(bytes2, "UTF-16LE").

我得到一个类似的结果，数据风格为charset = UTF-8和cutoff = 6（两个三字节“替换字符”0xEFBFBD开头，变音符号编码为两个错误字符）。在这两种情况下，我都使用了新的String（bytes2，“UTF-16LE”）。

Do you have any suggestions about how to:

您对如何：有任何建议吗？

support non-ASCII characters in this solution (or find a better solution)?
在此解决方案中支持非ASCII字符（或找到更好的解决方案）？
determine whether it's UTF-16LE or UTF-16BE?
确定它是UTF-16LE还是UTF-16BE？

Thank you! Any hints are appreciated!

谢谢！任何提示都表示赞赏！

BTW: Here are the supported data flavors on my (Linux) system (from transferable.getTransferDataFlavors()):

BTW：以下是我（Linux）系统支持的数据风格（来自transferable.getTransferDataFlavors（））：

[java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.Reader]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.lang.String]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.CharBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[C]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=application/x-java-serialized-object;representationclass=java.lang.String]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.Reader]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.lang.String]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.CharBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[C]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=unicode]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/x-moz-url-priv;representationclass=java.io.InputStream]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlinfo;representationclass=java.io.InputStream]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlcontext;representationclass=java.io.InputStream]
java.awt.datatransfer.DataFlavor[mimetype=text/x-moz-url-priv;representationclass=java.nio.ByteBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlinfo;representationclass=java.nio.ByteBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlcontext;representationclass=java.nio.ByteBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/x-moz-url-priv;representationclass=[B]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlinfo;representationclass=[B]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlcontext;representationclass=[B]]

2 个解决方案

#1

I belive the problem is related due to the fact that he read from clipboard as US-ASCII, then convert to unicode and expect to leave German umlauts intact. As US-ASCII is a 7-bit charset German umlauts are not included and already lost after reading the clipboard as US-ASCII.

我相信问题是相关的，因为他从剪贴板中读取US-ASCII，然后转换为unicode并期望保持德语变音完整。由于US-ASCII是7位字符集，因此不包括德语变音符号，并且在以US-ASCII读取剪贴板后已经丢失。

public class CharsetDemo {
    public static void main(String[] args) throws Exception {
        byte[] bytes;

        // convert the German umlaut to bytes in US-ASCII charset
        bytes = "ö".getBytes("US-ASCII");
        System.out.println("US-ASCII");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + new String(bytes, "US-ASCII"));
        System.out.println();

        // create a unicode string from the US-ASCII bytes
        String utf8String = new String(bytes, "UTF-8");
        bytes = utf8String.getBytes("UTF-8");
        System.out.println("UTF-8");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + utf8String);
        System.out.println();

        // convert the German umlaut to bytes in ISO-8859-1 charset
        bytes = "ö".getBytes("ISO-8859-1");
        System.out.println("ISO 8859-1");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + new String(bytes, "ISO-8859-1"));
        System.out.println();

        // create a unicode string from the ISO-8859-1 bytes
        utf8String = new String(bytes, "UTF-8");
        bytes = utf8String.getBytes("UTF-8");
        System.out.println("UTF-8");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + utf8String);
        System.out.println();

        // bytes of the "REPLACEMET CHARACTER"
        System.out.println("replacement character bytes: " 
            + asHexString("\uFFFD".getBytes("UTF-8")));

    }

    static String asHexString(byte[] bytes) {
        StringBuilder sb = new StringBuilder();
        for (byte b : bytes) {
            sb.append(String.format("%X ", b));
        }
        return sb.toString();
    }
}

output

产量

US-ASCII
bytes : 3F 
string: ?  <--- the question mark represents here the "REPLACEMENT CHARACTER"

UTF-8
bytes : 3F 
string: ?

ISO 8859-1
bytes : F6 
string: ö

UTF-8
bytes : EF BF BD  <-- the "REPLACEMENT CHARACTER", as "F6" is not a valid UTF-8 codepoint
string: �

replacement character bytes: EF BF BD

#2

Java 6 is not supported any more. So, question is obsolete.

不再支持Java 6。所以，问题已经过时了。

#1

public class CharsetDemo {
    public static void main(String[] args) throws Exception {
        byte[] bytes;

        // convert the German umlaut to bytes in US-ASCII charset
        bytes = "ö".getBytes("US-ASCII");
        System.out.println("US-ASCII");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + new String(bytes, "US-ASCII"));
        System.out.println();

        // create a unicode string from the US-ASCII bytes
        String utf8String = new String(bytes, "UTF-8");
        bytes = utf8String.getBytes("UTF-8");
        System.out.println("UTF-8");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + utf8String);
        System.out.println();

        // convert the German umlaut to bytes in ISO-8859-1 charset
        bytes = "ö".getBytes("ISO-8859-1");
        System.out.println("ISO 8859-1");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + new String(bytes, "ISO-8859-1"));
        System.out.println();

        // create a unicode string from the ISO-8859-1 bytes
        utf8String = new String(bytes, "UTF-8");
        bytes = utf8String.getBytes("UTF-8");
        System.out.println("UTF-8");
        System.out.println("bytes : " + asHexString(bytes));
        System.out.println("string: " + utf8String);
        System.out.println();

        // bytes of the "REPLACEMET CHARACTER"
        System.out.println("replacement character bytes: " 
            + asHexString("\uFFFD".getBytes("UTF-8")));

    }

    static String asHexString(byte[] bytes) {
        StringBuilder sb = new StringBuilder();
        for (byte b : bytes) {
            sb.append(String.format("%X ", b));
        }
        return sb.toString();
    }
}

output

产量

US-ASCII
bytes : 3F 
string: ?  <--- the question mark represents here the "REPLACEMENT CHARACTER"

UTF-8
bytes : 3F 
string: ?

ISO 8859-1
bytes : F6 
string: ö

UTF-8
bytes : EF BF BD  <-- the "REPLACEMENT CHARACTER", as "F6" is not a valid UTF-8 codepoint
string: �

replacement character bytes: EF BF BD

#2

Java 6 is not supported any more. So, question is obsolete.

不再支持Java 6。所以，问题已经过时了。

秒客网

Java剪贴板：在Linux上粘贴Firefox上的HTML

2 个解决方案

#1

#2

#1

#2

相关文章