如何从VNClassificationObservation获取对象rect / coordinates

时间:2023-01-05 21:19:23

have an issue with getting from VNClassificationObservation.

从VNClassificationObservation获取问题。

My goal id to recognize the object and display popup with the object name, I'm able to get name but I can't get object coordinates or frame.

我的目标是识别对象并使用对象名称显示弹出窗口,我能够获得名称,但我无法获得对象坐标或框架。

Here is code:

这是代码:

let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: requestOptions)
do {
    try handler.perform([classificationRequest, detectFaceRequest])
} catch {
    print(error)
}

Then I handle

然后我处理

func handleClassification(request: VNRequest, error: Error?) {
      guard let observations = request.results as? [VNClassificationObservation] else {
          fatalError("unexpected result type from VNCoreMLRequest")
      }

    // Filter observation
    let filteredOservations = observations[0...10].filter({ $0.confidence > 0.1 })

    // Update UI
   DispatchQueue.main.async { [weak self] in

    for  observation in filteredOservations {
            print("observation: ",observation.identifier)
            //HERE: I need to display popup with observation name
    }
  }
}

UPDATED:

lazy var classificationRequest: VNCoreMLRequest = {

    // Load the ML model through its generated class and create a Vision request for it.
    do {
        let model = try VNCoreMLModel(for: Inceptionv3().model)
        let request = VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
        request.imageCropAndScaleOption = VNImageCropAndScaleOptionCenterCrop
        return request
    } catch {
        fatalError("can't load Vision ML model: \(error)")
    }
}()

3 个解决方案

#1


2  

That's because classifiers do not return objects coordinates or frames. A classifier only gives a probability distribution over a list of categories.

那是因为分类器不返回对象坐标或框架。分类器仅在类别列表上给出概率分布。

What model are you using here?

你在这里使用什么型号?

#2


8  

A pure classifier model can only answer "what is this a picture of?", not detect and locate objects in the picture. All the free models on the Apple developer site (including Inception v3) are of this kind.

纯粹的分类器模型只能回答“这是什么图片?”,而不是检测和定位图片中的对象。 Apple开发者网站上的所有免费模型(包括Inception v3)都属于此类。

When Vision works with such a model, it identifies the model as a classifier based on the outputs declared in the MLModel file, and returns VNClassificationObservation objects as output.

当Vision使用这样的模型时,它会根据MLModel文件中声明的输出将模型识别为分类器,并返回VNClassificationObservation对象作为输出。

If you find or create a model that's trained to both identify and locate objects, you can still use it with Vision. When you convert that model to Core ML format, the MLModel file will describe multiple outputs. When Vision works with a model that has multiple outputs, it returns an array of VNCoreMLFeatureValueObservation objects — one for each output of the model.

如果您找到或创建了经过训练以识别和定位对象的模型,您仍然可以将其与Vision一起使用。将该模型转换为Core ML格式时,MLModel文件将描述多个输出。当Vision与具有多个输出的模型一起工作时,它返回一个VNCoreMLFeatureValueObservation对象的数组 - 每个模型的输出一个。

How the model declares its outputs would determine which feature values represent what. A model that reports a classification and a bounding box could output a string and four doubles, or a string and a multi array, etc.

模型如何声明其输出将决定哪些特征值代表什么。报告分类和边界框的模型可以输出字符串和四个双精度,或字符串和多个数组等。

Addendum: Here's a model that works on iOS 11 and returns VNCoreMLFeatureValueObservation: TinyYOLO

附录:这是一个适用于iOS 11并返回VNCoreMLFeatureValueObservation的模型:TinyYOLO

#3


0  

For tracking and identifying objects, you’ll have to create your own model using Darknet. I have struggled the same problem, and used TuriCreate to train model, and instead of just providing images to the framework, you’ll have also to provide ones with bounding boxes. Apple have documented here, how to create those models: Apple TuriCreate docs

要跟踪和识别对象,您必须使用Darknet创建自己的模型。我一直在努力解决同样的问题,并使用TuriCreate来训练模型,而不仅仅是为框架提供图像,你还必须提供带有边界框的图像。 Apple已经在这里记录了如何创建这些模型:Apple TuriCreate docs

#1


2  

That's because classifiers do not return objects coordinates or frames. A classifier only gives a probability distribution over a list of categories.

那是因为分类器不返回对象坐标或框架。分类器仅在类别列表上给出概率分布。

What model are you using here?

你在这里使用什么型号?

#2


8  

A pure classifier model can only answer "what is this a picture of?", not detect and locate objects in the picture. All the free models on the Apple developer site (including Inception v3) are of this kind.

纯粹的分类器模型只能回答“这是什么图片?”,而不是检测和定位图片中的对象。 Apple开发者网站上的所有免费模型(包括Inception v3)都属于此类。

When Vision works with such a model, it identifies the model as a classifier based on the outputs declared in the MLModel file, and returns VNClassificationObservation objects as output.

当Vision使用这样的模型时,它会根据MLModel文件中声明的输出将模型识别为分类器,并返回VNClassificationObservation对象作为输出。

If you find or create a model that's trained to both identify and locate objects, you can still use it with Vision. When you convert that model to Core ML format, the MLModel file will describe multiple outputs. When Vision works with a model that has multiple outputs, it returns an array of VNCoreMLFeatureValueObservation objects — one for each output of the model.

如果您找到或创建了经过训练以识别和定位对象的模型,您仍然可以将其与Vision一起使用。将该模型转换为Core ML格式时,MLModel文件将描述多个输出。当Vision与具有多个输出的模型一起工作时,它返回一个VNCoreMLFeatureValueObservation对象的数组 - 每个模型的输出一个。

How the model declares its outputs would determine which feature values represent what. A model that reports a classification and a bounding box could output a string and four doubles, or a string and a multi array, etc.

模型如何声明其输出将决定哪些特征值代表什么。报告分类和边界框的模型可以输出字符串和四个双精度,或字符串和多个数组等。

Addendum: Here's a model that works on iOS 11 and returns VNCoreMLFeatureValueObservation: TinyYOLO

附录:这是一个适用于iOS 11并返回VNCoreMLFeatureValueObservation的模型:TinyYOLO

#3


0  

For tracking and identifying objects, you’ll have to create your own model using Darknet. I have struggled the same problem, and used TuriCreate to train model, and instead of just providing images to the framework, you’ll have also to provide ones with bounding boxes. Apple have documented here, how to create those models: Apple TuriCreate docs

要跟踪和识别对象,您必须使用Darknet创建自己的模型。我一直在努力解决同样的问题,并使用TuriCreate来训练模型,而不仅仅是为框架提供图像,你还必须提供带有边界框的图像。 Apple已经在这里记录了如何创建这些模型:Apple TuriCreate docs